From internal experiments to client-facing data products
Eighteen months ago, we started building AI agents internally. Not as a product play - as a way to stop doing repetitive work. Documentation, QA checklists, data validation. The kind of tasks that every analytics consultancy does hundreds of times a year and nobody enjoys.
That experiment became our first internal AI assistant. And what we learned building it changed how we think about data products for clients.
Here's what we know now that we didn't know then.
The agent is the easy part
Building an agent that can query BigQuery and return an answer takes a weekend. Building an agent that returns the right answer, consistently, across different users and contexts - that takes months.
The gap isn't technical capability. It's governance. Our first version was impressively fluent and regularly wrong. It would generate SQL that was syntactically correct and logically plausible but based on assumptions about table structures, metric definitions, and business logic that it had no way to verify.
This is the problem every organisation will hit when they move from "let's try ChatGPT on our data" to "let's deploy agents in production." The model is powerful. The data is there. The missing layer is the one that tells the agent what things mean. (We covered this in depth in our post on semantic layers.)
What we built for Sanderson Design Group
Sanderson Design Group had a concrete problem: their commercial team needed answers from complex BigQuery datasets, and every question required an analyst to write SQL. The bottleneck wasn't data - it was access.
We built a GenAI data assistant that lets non-technical staff query their data in natural language. Ask a question, get an answer. No SQL required.
But the interesting part wasn't the natural language interface. It was what we had to build around it:
Schema documentation. The agent needs to know what's in each table, what each field means, and how tables relate to each other. This isn't optional metadata - it's the difference between a useful answer and a plausible hallucination.
Query validation. Before executing generated SQL, the system checks it against known patterns. Does the WHERE clause make sense for this metric? Is the date range reasonable? Are the joins correct?
Governed definitions. When the agent calculates "conversion rate," it uses the same definition the dashboards use. Not its own interpretation. This is where SEAM comes in - encoding metric definitions as infrastructure so every agent resolves them consistently.
Audit trails. Every query the agent runs is logged: what was asked, what SQL was generated, what was returned. If a number looks wrong, you can trace exactly how it was produced.
The result: the commercial team gets answers in seconds instead of days, and the analysts are freed up for higher-value work. Call volumes dropped 20% because the friction points the data revealed could finally be acted on quickly.
The governance lesson
The pattern we see across every agent deployment is the same:
- Week 1-2: excitement. The agent works. People ask questions. Answers appear.
- Week 3-4: doubt. Someone gets a number that doesn't match their dashboard. Someone else gets a different answer to the same question phrased differently. Trust erodes.
- Week 5+: either you solve governance or you abandon the agent.
Most organisations hit step 3 and retreat to dashboards. The ones that push through are the ones that invest in the definitions layer - the governed, machine-readable business logic that sits between the agent and the data.
This is why we built SEAM. Not because agents are hard to build, but because agents are hard to trust. SEAM mediates every tool call - resolving metrics against governed definitions, validating outputs against golden questions, and maintaining a full audit trail. The agent doesn't get to make assumptions about what "revenue" means. The definition is in YAML, it's version-controlled, and it's the same everywhere.
What we use internally now
Our internal tooling has evolved significantly since those early experiments. We've embedded MCP (Model Context Protocol) tools directly into our delivery process. Today, our agents:
- Generate technical documentation from GTM container exports and BigQuery schemas. What used to take a consultant two hours takes the agent two minutes. The consultant reviews and refines rather than writing from scratch.
- Run QA checks against implementation specifications. The agent compares what was specified with what was implemented and flags discrepancies.
- Monitor data pipelines and alert on anomalies. For EDF, we deployed BigQuery ML ARIMA models that run hourly and push alerts to Slack when data patterns break.
- Prepare client reports by pulling data, generating narrative summaries, and formatting outputs. The consultant adds judgement and context. The agent handles the assembly.
The throughput gain is real. But the bigger win is consistency - agents don't forget steps, don't skip checks, and don't have off days.
Five things we'd tell you before you start
1. Start with a single, well-scoped use case. "Put AI on our data" is not a use case. "Let the sales team ask questions about pipeline without waiting for an analyst" is. The narrower the scope, the faster you'll learn what governance you need.
2. Your data documentation is the bottleneck. If your tables don't have descriptions, your fields aren't documented, and your metric definitions live in someone's head, the agent will hallucinate. Fix the documentation first. The agent deployment will follow naturally.
3. Don't skip the golden questions. Before deploying any agent, define 20-30 questions where you know the correct answer. Run the agent against them. If it can't pass this test consistently, it's not ready for production.
4. Log everything. Every query, every response, every piece of generated SQL. You will need the audit trail. Not just for governance - for debugging, for improvement, and for building trust with stakeholders who are (rightly) sceptical.
5. Governance is not a phase - it's a layer. You don't "do governance" during implementation and then move on. It's infrastructure that runs continuously, evolves with your definitions, and mediates every interaction. Build it as a layer, not a checklist.
Where this is going
The organisations that will get the most from AI agents are the ones that treat governance as a first-class concern - not a constraint on innovation, but the thing that makes innovation trustworthy.
We're seeing this play out already. Clients who invested in governed data platforms (BigQuery + Dataform + semantic layer) are deploying agents in weeks. Clients who jumped straight to agents without that foundation are hitting the trust wall and retreating.
The path is: trusted data → governed definitions → reliable agents → organisational intelligence. Skip a step and you'll end up back at the beginning.
If you're thinking about deploying AI agents on your data, our agent readiness assessment is a good place to start. And if you want to understand how SEAM governs the agent layer specifically, see the product page.
You can also see our AI journey on the about page for the full timeline, or join Swarm - the community where practitioners are figuring out AI governance together.