Skip to main content

Automating BigQuery workflows with conditional logic and data readiness checks

Katie Kaczmarek20 February 20254 min read
Automating BigQuery workflows with conditional logic and data readiness checks

BigQuery offers powerful scripting capabilities that can simplify complex workflows. In this post, we’ll explore two essential techniques: using IF...THEN...END IF for procedural control and leveraging the metadata in project.dataset.__TABLES__ to ensure your Google Analytics 4 (GA4) data is ready before running queries. These strategies help you avoid unnecessary processing, reduce costs, and improve efficiency in your data pipelines.


1. Leveraging IF...THEN...END IF for procedural logic

While many SQL developers rely on the CASE WHEN expression for conditional logic within a SELECT statement, BigQuery’s scripting environment allows for more dynamic control with IF...THEN...END IF. This control-flow statement is especially useful in stored procedures and scripts where you need to:

  • Define and use variables
  • Dynamically set values
  • Control the execution flow based on complex conditions

Example: Running code on weekdays

The following example demonstrates how to execute a block of code only on weekdays:

DECLARE current_day STRING;

BEGIN
  -- Get today's day of the week
  SET current_day = FORMAT_TIMESTAMP('%A', CURRENT_TIMESTAMP());

  -- Only run the query if it's a weekday
  IF current_day NOT IN ('Saturday', 'Sunday') THEN
    SELECT 'Running weekday report...' AS status;
  ELSE
    SELECT 'Skipping execution. It is the weekend.' AS status;
  END IF;
END;

Benefits of Using IF...THEN...END IF:

  • Procedural Logic: Unlike CASE WHEN, this syntax allows for multi-step procedural operations.
  • Dynamic Execution: It lets you perform different actions based on variable conditions.
  • Improved Control: Ideal for scenarios such as dynamic table updates, looping constructs, or exception handling.

2. Ensuring GA4 Data readiness with modified time checks

When working with GA4 exports in BigQuery, timing is critical. GA4 tables (like events_YYYYMMDD) may not arrive at a predictable time each day, and rerunning the same transformation repeatedly is not only inefficient but can also lead to duplicate processing.

The challenge

  • Uncertain Arrival Times: GA4 data can be delayed, so a table might not be complete when your query runs.
  • Avoiding Redundancy: You want to process data only once per day, preventing unnecessary cost and load.

The Solution: Querying table metadata

BigQuery provides a system view (project.dataset.__TABLES__) that contains metadata for each table, including the last_modified_time. By querying this view, you can determine whether a table has been updated recently and decide if your processing should run.

Step-by-Step implementation

Step 1: Check if the Destination Table Was Updated Today

Before running your main query, check if the destination table has already been refreshed:

DECLARE alreadyRanToday BOOL;SET alreadyRanToday = (  SELECT COUNT(*) > 0   FROM `project.dataset.__TABLES__`   WHERE table_id = 'table_you_are_creating'    AND DATE(TIMESTAMP_MILLIS(last_modified_time)) = DATE(CURRENT_DATE()));
  • Purpose: This condition prevents the query from running multiple times in one day.

Step 2: Verify the GA4 table's readiness

Next, ensure that yesterday’s GA4 table exists and was modified recently. This step protects against processing incomplete or missing data.

DECLARE ga4TableExist BOOL;SET ga4TableExist = (  SELECT COUNT(*) > 0   FROM `project.dataset.__TABLES__`  WHERE table_id = CONCAT('events_', FORMAT_DATE('%Y%m%d', DATE_SUB(DATE(CURRENT_DATE()), INTERVAL 1 DAY)))    AND TIMESTAMP_DIFF(          TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 60 MINUTE),           TIMESTAMP_ADD(TIMESTAMP_MILLIS(last_modified_time), INTERVAL 60 MINUTE),           MINUTE        ) > 10);
  • Explanation: By ensuring that the table was modified within the last 10 minutes, you safeguard against querying an incomplete dataset.

Step 3: Execute the main query conditionally

Finally, combine these checks to run your query only when both conditions are met:

IF NOT alreadyRanToday THEN  IF ga4TableExist THEN    SELECT *     FROM `project.dataset.events_*`    WHERE _TABLE_SUFFIX = DATE_SUB(DATE(CURRENT_DATE()), INTERVAL 1 DAY);  END IF;END IF;

Key Advantages:

  • Prevents Unnecessary Processing: The script only executes when needed.
  • Ensures Data Accuracy: Queries run only when the GA4 data is confirmed to be complete.
  • Cost Efficiency: Scheduled checks (e.g., every 60 minutes) incur no cost if the conditions aren’t met.

Understanding project.dataset.__TABLES__ for metadata queries

The project.dataset.__TABLES__ view is an invaluable resource for managing your BigQuery workflows. Here’s why:

  • Table Metadata Access: This view stores important information such as table_id and last_modified_time.
  • Dynamic Decision-Making: By querying this view, you can build logic into your scripts that adapts to the state of your data.
  • Automation Friendly: It allows you to programmatically check the status of your tables and conditionally execute further processing, a critical aspect of automated data pipelines.

Use cases:

  • Verifying Data Freshness: Ensure that your queries run only after the data has been fully updated.
  • Avoiding Redundancy: Prevent reprocessing of data that has already been handled within the same day.
  • Optimizing Resource Usage: Reduce unnecessary query executions, saving both time and money.

For other types of information_schema that are available in BigQuery, you can have a look at this google documentation.


Conclusion

Combining the procedural flexibility of IF...THEN...END IF with the strategic use of metadata from project.dataset.__TABLES__ provides a robust framework for automating BigQuery workflows. This approach not only streamlines your operations but also ensures data accuracy and cost efficiency.


By integrating these advanced techniques, you can optimise your data pipelines and stay ahead in your data processing tasks. Let's discuss!


Suggested content

Measurelab awarded Google Cloud Marketing Analytics Specialisation

At the start of the year, if you’d asked us whether Measurelab would be standing shoulder to shoulder with Europe’s biggest consultancies by September, we would've been surprised. Not because we don't believe in ourselves, but because these things feel so distant - until suddenly, they’re not. So, here it is: we’ve been awarded the Marketing Analytics Services Partner Specialisation in Google Cloud Partner Advantage. What’s the big deal? In Google’s own words (with the obligatory Zs): “Spec

Will Hayes11 Sept 2025

BigQuery AI.GENERATE tutorial: turn SQL queries into AI-powered insights

BigQuery just got a major upgrade, you can now plug directly into Vertex AI using the new AI.GENERATE function. Translation: your analytics data and generative AI are now best friends, and they’re hanging out right inside SQL. That opens up a whole world of new analysis options for GA4 data, but it also raises some questions: * How do you actually set it up? * What’s it good for (and when should you avoid it)? * Why would you batch the query? Let’s walk through it step by step. Step 1: H

Katie Kaczmarek3 Sept 2025

How to start forecasting in BigQuery with zero training

If you’d told me five years ago that I’d be forecasting product demand using a model trained on 100 billion time points… without writing a single line of ML code… I probably would’ve asked how many coffees you’d had that day ☕️ But its a brand new world. And it’s possible. Let me explain What is TimesFM? TimesFM is a new foundation model from Google, built specifically for time-series forecasting. Think of it like GPT for time, instead of predicting the next word in a sentence, it predicts t

Katie Kaczmarek14 Jul 2025