#140 Taming BigQuery costs with Alvin.ai (with Martin Sahlen)
Martin Sahlen, CEO of Alvin AI, joins the Measure Pod to discuss automating BigQuery cost optimisation through billing model arbitrage, with no AI in the engine.
Google's running two in parallel - Marketing and Cloud. You'll likely still be using both for the next few years. So here's how to work with them and make them your own.
Three Google announcement events recently - Cloud Next in April, I/O on May 19-20, Google Marketing Live on May 20. The Marketing Live keynote actually overlapped with the second day of I/O, which struck me as not accidental - Gemini is running under all of it. And, in doing so, rebuilding your measurement stack.
Six pieces from Marketing Live are worth knowing about:
And these things aren't wholly new. Tag Gateway has been shipping with Cloudflare for about a year now. Data Manager API went live back in 2024. The Google Tag Manager and Google Tag unification I wrote about a couple of weeks back is rolling out properly now. Enhanced Conversions, Attributed Brand Searches, Lead Intent Scores - there's a long tail of incremental updates covered in Simo Ahava's round-up if you want the surface-area read. What's new is mostly the framing - Google is now telling all of this as a single coherent story, with Gemini behind it all.
Google is now positioning itself as a "measurement command centre" - all built on their stack. Their tagging. Their analytics. Their MMM. Their reporting. Their agents on top.
For some, that's a clear win. If you couldn't justify a separate MMM, Meridian-in-GA is going to bring value (albeit once it's out of alpha, sometime next year). If you didn't have an attribution model, the new stack hands you one. All much quicker than building from scratch.
The trade-off: as you adopt more of the stack, Google quietly becomes the answer; to infrastructure, to everything.
Elsewhere, vendors are making equal and opposite architecture choices for data and marketing teams. Anthropic are saying pay us for the AI, but your data infrastructure stays yours and so does the work the agent does on it. Snowflake had said something similar a fortnight earlier with their multi-tenant Cortex agents.
Google's running both bets in parallel. Three weeks before Marketing Live, at Cloud Next, they (re)launched Gemini Enterprise (formerly Vertex AI). Same architectural bet as Anthropic, opposite to Marketing Live. Your data lives in BigQuery, in your tenancy. You pick the model - 200+ in Model Garden, including Anthropic's Claude alongside Gemini. You build agents on your own data, with your own definitions. Google's Agentic Data Cloud gives those agents governed access. Google supplies the reasoning platform; you keep the meaning.
So there's a clear direction emerging - including from Google itself. Your data stays put. The vendor sells reasoning. You keep the meaning, the audit trail and the right to walk away with your definitions intact.
Google's Marketing stack is looking like the inverse bet. Bring your data into Google's tooling and Google will be both the reasoning and the place the answer lives. For teams already running inside Google Ads, that bet wins on convenience. It wins less well for anyone whose marketing has to cross over with the rest of the organisation - comms, finance, ops, the places where data and decisions flow across teams. And anywhere there are legal or governance questions to answer, or questions about leaning on a single vendor.Most teams will land somewhere in the middle, and that middle is itself a Google architecture - Google Cloud underneath, BigQuery for your data, Gemini Enterprise as the agent platform, your own definitions and reasoning on top. Use the marketing conveniences where they fit. Keep your ground truth in the stack you (probably already) run. The architecture decision gets made either way, deliberately or not.
Many marketing teams will still want to use most of Google's marketing stack over the next few years. It'll be quicker and more capable than what you'd build from scratch.
Book a free discovery call to discuss how we can help with your data and analytics challenges.
What I'd want alongside it: a way to reconcile Google's numbers against Meta's, against the CRM, against finance, and your own definitions of customer, conversion, margin, campaign that survive whichever AI agent happens to be querying the data this quarter.
Here's what we see happen, over and over. Marketing teams want to do more with their data than the traditional Google Marketing stack provides - and a clear way in is via BigQuery. Once the data's there, you can be doing all sorts - but you're also standing next to Gemini Enterprise and the agentic stuff on top of it. As soon as you start with that, you want to be connecting to other systems - your CRM, your ad platforms, your financial reporting, the bits sitting in Slack or Drive - and those need federating in, increasingly via MCPs. And then you're running away with semantics and governance. What does "revenue" actually mean? Which join produces the right answer? Who's allowed to see what? The question of meaning stops being a side concern and becomes the actual job.
There are two architectures that solve for this, and they often sit in parallel rather than one replacing the other. A centralised warehouse on BigQuery is the right call for long-term historical aggregation, machine learning, and the multi-year trend work that needs scale. A federated layer is the right call for cross-source reporting, attribution, dashboards, and ad-hoc questions in plain English over the sources you already have. Different jobs. Different time-to-value. Most of our Google Cloud Partner work sits on top of BigQuery these days, with the meaning layer running over it - and that meaning layer is the conversation we're having most often with clients right now.
The federated side is what we've been building at Measurelab. We're calling it SEAM, an independent layer that sits between your data and whichever agent is asking, without forcing you to move the data first. The principles behind the layer are written up in full if you want them. The audit log doubles as a feedback loop. Every ungoverned query becomes a candidate for governance, and the organisation builds a living map of where its data gaps actually are.
You don't need SEAM specifically. You do need an answer to where the meaning of your data lives when it's no longer just you reading the dashboard.
If your analytics partner isn't already walking you through this, it's a useful conversation to have. If you don't have one and want to think it through, then please get in touch and book a discovery session.
Is Tag Gateway the same as server-side tagging?
No. Tag Gateway is a CDN-level reverse proxy that rewrites Google tag requests so they originate from your own domain. It only works for Google tags. Server-side Google Tag Manager (sGTM) is a full processing layer running in your infrastructure that can route data to non-Google destinations like Meta, TikTok, and LinkedIn. The two are complementary - you can run Tag Gateway for your Google tags while running sGTM for everything else. We've written more on the distinction if you want it.
Should we centralise our data in a warehouse before adopting something like SEAM?
It depends on the job. A warehouse on BigQuery is the right call for long-term historical aggregation, machine learning and multi-year trend work. SEAM is the right call for cross-source reporting, attribution, dashboards and ad-hoc questions in plain English. Most clients we work with end up running both - the two sit in parallel, each doing the job they're best at. If you can't realistically build a warehouse in the next 12-18 months, SEAM lets you give AI governed access to the sources you already have without forcing centralisation first.
Will Ask Advisor be useful for our team?
Probably, for first-pass triage and routine diagnostics. Less so as the single source of truth on what your campaigns are doing. Use it as one input alongside whatever independent attribution and reporting work you already have. Plan to QA the outputs - natural-language interfaces over uncertain data can produce confident-wrong answers.
Book a free discovery call to discuss how we can help with your data and analytics challenges.
Our instant analytics audit scans your GA4 configuration and flags what's missing, broken or misconfigured.
Martin Sahlen, CEO of Alvin AI, joins the Measure Pod to discuss automating BigQuery cost optimisation through billing model arbitrage, with no AI in the engine.
Someone in your organisation has already connected an AI agent to a live data source. Probably several people have. They've pointed Claude at HubSpot, asked it to pull numbers from Google Analytics, had it read a Jira board or reconcile figures across two systems. The data never moved. Nobody waited for a pipeline. Nobody asked the data team. They did this because it was easy. And people will always go where the friction is least. This is happening everywhere, across every function, in organis
From internal experiments to client-facing data products Eighteen months ago, we started building AI agents internally. Not as a product play - as a way to stop doing repetitive work. Documentation, QA checklists, data validation. The kind of tasks that every analytics consultancy does hundreds of times a year and nobody enjoys. That experiment became our first internal AI assistant. And what we learned building it changed how we think about data products for clients. Here's what we know now