BigQuery AI.GENERATE tutorial: turn SQL queries into AI-powered insights

BigQuery just got a major upgrade, you can now plug directly into Vertex AI using the new AI.GENERATE function. Translation: your analytics data and generative AI are now best friends, and they’re hanging out right inside SQL.
That opens up a whole world of new analysis options for GA4 data, but it also raises some questions:
- How do you actually set it up?
- What’s it good for (and when should you avoid it)?
- Why would you batch the query?
Let’s walk through it step by step.
Step 1: How to Set Up AI.GENERATE in BigQuery (Vertex AI connection)
Before you can start prompting Gemini from SQL, you’ll need to connect BigQuery ↔ Vertex AI. The process is a little fiddly, but once it’s done you’re good to go.
- Enable APIs: BigQuery Connection API + Vertex AI API.
- Create a Cloud Resource Connection in the GCP console (look for “Vertex AI” → “BigQuery Federation”).
- Go to + add data
- Search for vertex ai
- Click BigQuery federation
- Choose Vertex AI remote models, remote functions and BigLake
- Give it a connection_id (e.g. vertex_gemini_bigquery_link) and make sure the region matches your dataset.
- Copy the service account email from the new external connection.
- Grant IAM roles: add the Vertex AI User role to that service account.
Picking the right endpoint in your code: check which Gemini endpoints are available in your region (e.g. gemini-1.5-pro-002 in europe-west2). https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#europe
Step 2: Best use cases for AI.GENERATE in BigQuery (and what to avoid)
Think of it like this: SQL is your calculator, AI.GENERATE is your interpreter.
✅ Great for:
- Insights: “What does this referrer + landing page say about intent?”
- Classification: e.g. group queries into high vs low purchase intent.
- Summarisation: turn 1000 rows into 5 takeaways.
- Unstructured parameters: analyse feedback, search queries, chat logs.
❌ Not great for:
- Factual lookups (populations, exchange rates).
- Crunching metrics (conversion, bounce rate).
AI isn’t faster or more accurate than SQL for maths. Use it for context, not for sums.
Step 3: AI.GENERATE example in BigQuery (referrer + landing page analysis)
Say you want to quickly understand why users are landing on certain pages from different sources. We could ask AI to make its best guesstimate!
WITH session_referrers AS (
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') AS ld,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS pt
FROM `myproject.analytics_123456789.events_*`
WHERE _table_suffix BETWEEN '20240101' AND '20250101'
AND event_name = 'session_start'
AND (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') IS NOT NULL
)
SELECT
ld,
pt,
AI.GENERATE(
CONCAT(
'You are a digital marketing analyst. ',
'I will give you two pieces of information about a new session on a website: ',
'1) The referring domain where the visitor came from: ', ld, '. ',
'2) The landing page the visitor viewed: ', pt, '. ',
'Based on these two fields, provide a short marketing insight. ',
'Explain what this might suggest about the visitors intent or acquisition opportunity. ',
'Keep the answer to 2 sentences, focused on actionable insight.'
),
connection_id => 'projects/myproject/locations/europe-west2/connections/__default_cloudresource_connection__',
endpoint => 'gemini-1.5-pro-002'
).result AS insight
FROM session_referrers;
Code language: PHP (php)
Output might look like:
- “Visitors from LinkedIn landing on the Careers page are likely job seekers. Emphasise employer branding.”
- “Users from Google landing on Pricing show strong intent. Ensure CTAs are clear.”
Step 4: Batch queries in AI.GENERATE for faster performance
Calling AI row by row = slow and clunky.
Example: 1,500 rows = 1,500 separate model calls. You’ll be waiting a while.
Better way:
- Use STRING_AGG to group ~10 rows per prompt.
- Ask Gemini to return insights as a bulleted list.
Now 1,500 rows = 150 calls. That’s not cheaper (same total data processed), but it’s much faster and easier to work with.
So the above code could be written into batches like this:
-- 1) Pull referrer + landing page pairs from GA4 session_start events
WITH session_referrers AS (
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') AS ld,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS pt
FROM `myproject.analytics_123456789.events_*`
WHERE _table_suffix BETWEEN '20240101' AND '20250901'
AND event_name = 'session_start'
AND (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') IS NOT NULL
),
-- 2) Give each row a number so we can batch
numbered AS (
SELECT
ld,
pt,
ROW_NUMBER() OVER () AS rn
FROM session_referrers
WHERE ld IS NOT NULL
),
-- 3) Group into batches of 10 rows and build a prompt-friendly text block
batched AS (
SELECT
DIV(rn, 10) AS batch_id,
STRING_AGG(CONCAT('- Domain: ', ld, ', Landing page: ', pt), '\n') AS batch_text
FROM numbered
GROUP BY batch_id
)
-- 4) Call AI.GENERATE once per batch to reduce total model calls (faster; not cheaper)
SELECT
batch_id,
AI.GENERATE(
CONCAT(
'You are a digital marketing analyst. ',
'I will give you several referrer domain + landing page pairs. ',
'For each pair, provide 2 sentences of actionable insight. ',
'Format your output as a bulleted list. ',
'\n\nHere are the pairs:\n',
batch_text
),
connection_id => 'projects/myproject/locations/europe-west2/connections/__default_cloudresource_connection__',
endpoint => 'gemini-1.5-pro-002'
).result AS insights
FROM batched
ORDER BY batch_id;
Code language: PHP (php)
Step 5: Advanced AI.GENERATE examples in BigQuery (sentiment, classification, summaries)
GA4’s event and item parameters are a goldmine for unstructured data. With AI.GENERATE you can:
- Classify search queries as navigational / informational / transactional.
- Run sentiment analysis on feedback or chat logs.
- Summarise product descriptions into categories or value props.
Example: feedback sentiment analysis:
WITH feedback_events AS (
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'feedback') AS feedback_text
FROM `myproject.analytics_123456789.events_*`
WHERE event_name = 'feedback_submitted'
)
SELECT
feedback_text,
AI.GENERATE(
CONCAT(
'Analyse the sentiment of this customer feedback: "', feedback_text, '". ',
'Return only "Positive", "Neutral", or "Negative".'
),
connection_id => 'projects/myproject/locations/europe-west2/connections/__default_cloudresource_connection__',
endpoint => 'gemini-1.5-pro-002'
).result AS sentiment
FROM feedback_events
WHERE feedback_text IS NOT NULL
Code language: PHP (php)
Final thoughts
AI.GENERATE isn’t replacing SQL, it’s layering on top of it.
- Use SQL for facts.
- Use AI.GENERATE for interpretation.
- Combine them (smartly, with batching) and GA4 data turns into actionable insights.
If you’ve been meaning to try generative AI in a real analytics workflow, BigQuery just made it dead simple. You can write whatever prompt you’d like, the only limitation is your imagination.
For full details, check the BigQuery AI.GENERATE documentation.
Check out our other blog on AI.FORECAST
👉 The possibilities are endless, what’s the first BigQuery use case you’d try with AI.GENERATE?
Katie
Subscribe to our newsletter:
Further reading

How to start forecasting in BigQuery with zero training

Easy ways to prepare your BigQuery warehouse for AI
