Skip to main content

BigQuery AI.GENERATE tutorial: turn SQL queries into AI-powered insights

Katie Kaczmarek3 September 20255 min read
BigQuery AI.GENERATE tutorial: turn SQL queries into AI-powered insights

BigQuery just got a major upgrade, you can now plug directly into Vertex AI using the new AI.GENERATE function. Translation: your analytics data and generative AI are now best friends, and they’re hanging out right inside SQL.

That opens up a whole world of new analysis options for GA4 data, but it also raises some questions:

  • How do you actually set it up?
  • What’s it good for (and when should you avoid it)?
  • Why would you batch the query?

Let’s walk through it step by step.

Step 1: How to Set Up AI.GENERATE in BigQuery (Vertex AI connection)

Before you can start prompting Gemini from SQL, you’ll need to connect BigQuery ↔ Vertex AI. The process is a little fiddly, but once it’s done you’re good to go.

  • Enable APIs: BigQuery Connection API + Vertex AI API.
  • Create a Cloud Resource Connection in the GCP console (look for “Vertex AI” → “BigQuery Federation”).
    • Go to + add data
    • Search for vertex ai
    • Click BigQuery federation
    • Choose Vertex AI remote models, remote functions and BigLake
    • Give it a connection_id (e.g. vertex_gemini_bigquery_link) and make sure the region matches your dataset.
  • Copy the service account email from the new external connection.
  • Grant IAM roles: add the Vertex AI User role to that service account.

Picking the right endpoint in your code: check which Gemini endpoints are available in your region (e.g. gemini-1.5-pro-002 in europe-west2). https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#europe


Step 2: Best use cases for AI.GENERATE in BigQuery (and what to avoid)

Think of it like this: SQL is your calculator, AI.GENERATE is your interpreter.

✅ Great for:

  • Insights: “What does this referrer + landing page say about intent?”
  • Classification: e.g. group queries into high vs low purchase intent.
  • Summarisation: turn 1000 rows into 5 takeaways.
  • Unstructured parameters: analyse feedback, search queries, chat logs.

❌ Not great for:

  • Factual lookups (populations, exchange rates).
  • Crunching metrics (conversion, bounce rate).

AI isn’t faster or more accurate than SQL for maths. Use it for context, not for sums.


Step 3: AI.GENERATE example in BigQuery (referrer + landing page analysis)

Say you want to quickly understand why users are landing on certain pages from different sources. We could ask AI to make its best guesstimate!

WITH session_referrers AS (
  SELECT
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') AS ld,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS pt
  FROM `myproject.analytics_123456789.events_*`
  WHERE _table_suffix BETWEEN '20240101' AND '20250101'
    AND event_name = 'session_start'
    AND (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') IS NOT NULL
)
SELECT
  ld,
  pt,
  AI.GENERATE(
    CONCAT(
      'You are a digital marketing analyst. ',
    'I will give you two pieces of information about a new session on a website: ',
    '1) The referring domain where the visitor came from: ', ld, '. ',
    '2) The landing page the visitor viewed: ', pt, '. ',
    'Based on these two fields, provide a short marketing insight. ',
    'Explain what this might suggest about the visitors intent or acquisition opportunity. ',
    'Keep the answer to 2 sentences, focused on actionable insight.'
    ),
    connection_id => 'projects/myproject/locations/europe-west2/connections/__default_cloudresource_connection__',
    endpoint => 'gemini-1.5-pro-002'
  ).result AS insight
FROM session_referrers;
  • “Visitors from LinkedIn landing on the Careers page are likely job seekers. Emphasise employer branding.”
  • “Users from Google landing on Pricing show strong intent. Ensure CTAs are clear.”

Step 4: Batch queries in AI.GENERATE for faster performance

Calling AI row by row = slow and clunky.

Example: 1,500 rows = 1,500 separate model calls. You’ll be waiting a while.

Better way:

  • Use STRING_AGG to group ~10 rows per prompt.
  • Ask Gemini to return insights as a bulleted list.

Now 1,500 rows = 150 calls. That’s not cheaper (same total data processed), but it’s much faster and easier to work with.

So the above code could be written into batches like this:

-- 1) Pull referrer + landing page pairs from GA4 session_start events
WITH session_referrers AS (
  SELECT
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') AS ld,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS pt
  FROM `myproject.analytics_123456789.events_*`
  WHERE _table_suffix BETWEEN '20240101' AND '20250901'
    AND event_name = 'session_start'
    AND (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') IS NOT NULL
),

-- 2) Give each row a number so we can batch
numbered AS (
  SELECT
    ld,
    pt,
    ROW_NUMBER() OVER () AS rn
  FROM session_referrers
  WHERE ld IS NOT NULL
),

-- 3) Group into batches of 10 rows and build a prompt-friendly text block
batched AS (
  SELECT
    DIV(rn, 10) AS batch_id,
    STRING_AGG(CONCAT('- Domain: ', ld, ', Landing page: ', pt), '\n') AS batch_text
  FROM numbered
  GROUP BY batch_id
)

-- 4) Call AI.GENERATE once per batch to reduce total model calls (faster; not cheaper)
SELECT
  batch_id,
  AI.GENERATE(
    CONCAT(
      'You are a digital marketing analyst. ',
      'I will give you several referrer domain + landing page pairs. ',
      'For each pair, provide 2 sentences of actionable insight. ',
      'Format your output as a bulleted list. ',
      '\n\nHere are the pairs:\n',
      batch_text
    ),
    connection_id => 'projects/myproject/locations/europe-west2/connections/__default_cloudresource_connection__',
    endpoint => 'gemini-1.5-pro-002'
  ).result AS insights
FROM batched
ORDER BY batch_id;

Step 5: Advanced AI.GENERATE examples in BigQuery (sentiment, classification, summaries)

GA4’s event and item parameters are a goldmine for unstructured data. With AI.GENERATE you can:

  • Classify search queries as navigational / informational / transactional.
  • Run sentiment analysis on feedback or chat logs.
  • Summarise product descriptions into categories or value props.

Example: feedback sentiment analysis:

WITH feedback_events AS (
  SELECT
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'feedback') AS feedback_text
  FROM `myproject.analytics_123456789.events_*`
  WHERE event_name = 'feedback_submitted'
)
SELECT
  feedback_text,
  AI.GENERATE(
    CONCAT(
      'Analyse the sentiment of this customer feedback: "', feedback_text, '". ',
      'Return only "Positive", "Neutral", or "Negative".'
    ),
    connection_id => 'projects/myproject/locations/europe-west2/connections/__default_cloudresource_connection__',
    endpoint => 'gemini-1.5-pro-002'
  ).result AS sentiment
FROM feedback_events
WHERE feedback_text IS NOT NULL

Final thoughts

AI.GENERATE isn’t replacing SQL,  it’s layering on top of it.

  • Use SQL for facts.
  • Use AI.GENERATE for interpretation.
  • Combine them (smartly, with batching) and GA4 data turns into actionable insights.

If you’ve been meaning to try generative AI in a real analytics workflow, BigQuery just made it dead simple. You can write whatever prompt you’d like, the only limitation is your imagination.

For full details, check the BigQuery AI.GENERATE documentation.

Check out our other blog on AI.FORECAST


👉 The possibilities are endless, what’s the first BigQuery use case you’d try with AI.GENERATE?


Suggested content

Measurelab awarded Google Cloud Marketing Analytics Specialisation

At the start of the year, if you’d asked us whether Measurelab would be standing shoulder to shoulder with Europe’s biggest consultancies by September, we would've been surprised. Not because we don't believe in ourselves, but because these things feel so distant - until suddenly, they’re not. So, here it is: we’ve been awarded the Marketing Analytics Services Partner Specialisation in Google Cloud Partner Advantage. What’s the big deal? In Google’s own words (with the obligatory Zs): “Spec

Will Hayes11 Sept 2025

How to start forecasting in BigQuery with zero training

If you’d told me five years ago that I’d be forecasting product demand using a model trained on 100 billion time points… without writing a single line of ML code… I probably would’ve asked how many coffees you’d had that day ☕️ But its a brand new world. And it’s possible. Let me explain What is TimesFM? TimesFM is a new foundation model from Google, built specifically for time-series forecasting. Think of it like GPT for time, instead of predicting the next word in a sentence, it predicts t

Katie Kaczmarek14 Jul 2025

Easy ways to prepare your BigQuery warehouse for AI

You’ve probably heard that AI is coming to make our lives easier, especially in tools like BigQuery. But here’s the thing: AI isn't magic. If you want it to be accurate and useful, you need to set it up for success. One of the best ways to do that? Improve the metadata in your BigQuery warehouse. Metadata is like the index or contents page in a book, it quickly tells you exactly what’s inside and where to find it. Creating clear metadata means AI can more easily understand your data warehouse

Katie Kaczmarek23 Apr 2025