Clearing 2026: why UK universities need recruitment intelligence
UK universities face a deficit crisis. Student Recruitment Intelligence can transform Clearing from chaos to precision.

BigQuery just got a major upgrade, you can now plug directly into Vertex AI using the new AI.GENERATE function. Translation: your analytics data and generative AI are now best friends, and they’re hanging out right inside SQL.
That opens up a whole world of new analysis options for GA4 data, but it also raises some questions:
Let’s walk through it step by step.
Before you can start prompting Gemini from SQL, you’ll need to connect BigQuery ↔ Vertex AI. The process is a little fiddly, but once it’s done you’re good to go.
Picking the right endpoint in your code: check which Gemini endpoints are available in your region (e.g. gemini-1.5-pro-002 in europe-west2). https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#europe
Think of it like this: SQL is your calculator, AI.GENERATE is your interpreter.
✅ Great for:
❌ Not great for:
AI isn’t faster or more accurate than SQL for maths. Use it for context, not for sums.
Say you want to quickly understand why users are landing on certain pages from different sources. We could ask AI to make its best guesstimate!
WITH session_referrers AS (
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') AS ld,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS pt
FROM `myproject.analytics_123456789.events_*`
WHERE _table_suffix BETWEEN '20240101' AND '20250101'
AND event_name = 'session_start'
AND (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') IS NOT NULL
)
SELECT
ld,
pt,
AI.GENERATE(
CONCAT(
'You are a digital marketing analyst. ',
'I will give you two pieces of information about a new session on a website: ',
'1) The referring domain where the visitor came from: ', ld, '. ',
'2) The landing page the visitor viewed: ', pt, '. ',
'Based on these two fields, provide a short marketing insight. ',
'Explain what this might suggest about the visitors intent or acquisition opportunity. ',
'Keep the answer to 2 sentences, focused on actionable insight.'
),
connection_id => 'projects/myproject/locations/europe-west2/connections/__default_cloudresource_connection__',
endpoint => 'gemini-1.5-pro-002'
).result AS insight
FROM session_referrers;Calling AI row by row = slow and clunky.
Example: 1,500 rows = 1,500 separate model calls. You’ll be waiting a while.
Better way:
Now 1,500 rows = 150 calls. That’s not cheaper (same total data processed), but it’s much faster and easier to work with.
So the above code could be written into batches like this:
-- 1) Pull referrer + landing page pairs from GA4 session_start events
WITH session_referrers AS (
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') AS ld,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS pt
FROM `myproject.analytics_123456789.events_*`
WHERE _table_suffix BETWEEN '20240101' AND '20250901'
AND event_name = 'session_start'
AND (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') IS NOT NULL
),
-- 2) Give each row a number so we can batch
numbered AS (
SELECT
ld,
pt,
ROW_NUMBER() OVER () AS rn
FROM session_referrers
WHERE ld IS NOT NULL
),
-- 3) Group into batches of 10 rows and build a prompt-friendly text block
batched AS (
SELECT
DIV(rn, 10) AS batch_id,
STRING_AGG(CONCAT('- Domain: ', ld, ', Landing page: ', pt), '\n') AS batch_text
FROM numbered
GROUP BY batch_id
)
-- 4) Call AI.GENERATE once per batch to reduce total model calls (faster; not cheaper)
SELECT
batch_id,
AI.GENERATE(
CONCAT(
'You are a digital marketing analyst. ',
'I will give you several referrer domain + landing page pairs. ',
'For each pair, provide 2 sentences of actionable insight. ',
'Format your output as a bulleted list. ',
'\n\nHere are the pairs:\n',
batch_text
),
connection_id => 'projects/myproject/locations/europe-west2/connections/__default_cloudresource_connection__',
endpoint => 'gemini-1.5-pro-002'
).result AS insights
FROM batched
ORDER BY batch_id;GA4’s event and item parameters are a goldmine for unstructured data. With AI.GENERATE you can:
Example: feedback sentiment analysis:
WITH feedback_events AS (
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'feedback') AS feedback_text
FROM `myproject.analytics_123456789.events_*`
WHERE event_name = 'feedback_submitted'
)
SELECT
feedback_text,
AI.GENERATE(
CONCAT(
'Analyse the sentiment of this customer feedback: "', feedback_text, '". ',
'Return only "Positive", "Neutral", or "Negative".'
),
connection_id => 'projects/myproject/locations/europe-west2/connections/__default_cloudresource_connection__',
endpoint => 'gemini-1.5-pro-002'
).result AS sentiment
FROM feedback_events
WHERE feedback_text IS NOT NULLAI.GENERATE isn’t replacing SQL, it’s layering on top of it.
If you’ve been meaning to try generative AI in a real analytics workflow, BigQuery just made it dead simple. You can write whatever prompt you’d like, the only limitation is your imagination.
For full details, check the BigQuery AI.GENERATE documentation.
Check out our other blog on AI.FORECAST
👉 The possibilities are endless, what’s the first BigQuery use case you’d try with AI.GENERATE?
UK universities face a deficit crisis. Student Recruitment Intelligence can transform Clearing from chaos to precision.
AI's revolution is already here. We're too busy debating its future to harness the transformational power of today's models.
NL-SQL tools can transform how you query data - but only if your foundations are solid. Here's an honest look at what works and what doesn't.