Skip to main content

How to flatten tables with user-defined functions in BigQuery

Lace Rogers12 August 20214 min read
How to flatten tables with user-defined functions in BigQuery

As we all know Google is continuously improving and expanding the Google Cloud Platform and BigQuery has truly benefited from this. It's taken a simple scalable data warehouse into what is now a comprehensive and dynamic system. As you may guess, I’m a massive fan of BigQuery and it's user-defined functions.

BigQuery offers a huge amount of functionality including Data Definition Language (DDL) and Data Manipulation Language (DML), however, sometimes there simply isn’t the function to help you complete what you are doing. Often in the past, I would end up accessing BigQuery via Python or SAS and applying the missing function.

User-Defined Functions (UDFs) cover some of the current gaps, and since they are now persistent (I’ll explain this further later), you can share and re-use these with your colleagues.

What is a UDF?

A UDF simply is a function you create that applies a logic not available within the standard SQL function to your data. Generally, you will be using JavaScript, SQL, Simple Logic or base Python (albeit in a rather hacky approach).

How do you generate UDFs?

Using BigQuery’s comprehensive guides, you can see the general structure is very similar to some DDL statements, however, it can take a little getting used to. I go into this in more detail over on Medium with examples.

How to flatten tables with UDFs

Imagine a scenario where you have a Google Analytics 4 (GA4) or Firebase (FB) dataset where you need to flatten multiple fields. Normally you would need to write out the standard UNNEST script of:

(SELECT VALUE.INT_VALUE FROM UNNEST(EVENT_PARAMS) WHERE KEY = 'CCC')

However, that is super painful to repeat multiple times and likely to produce errors. So using a UDF, scripting and the fab SQL execute statement, I have generated the code below which will automatically UNNEST your GA4 data.

Be careful to select the specific dates and events you wish to review. You can even remove the SQL execute and simply copy the query into another window or a tool such as Python.

DECLARE
  SQLRUN STRING DEFAULT '';
CREATE TEMP FUNCTION
  ga4_firebase( key1 STRING,
    params ARRAY <STRUCT <key STRING,
    value STRUCT <string_value STRING,
    int_value INT64,
    float_value FLOAT64,
    double_value FLOAT64 >>>) AS ( (
    SELECT
      param.value
    FROM
      UNNEST(params) param
    WHERE
      param.key=key1) );
  # *****************************************
  # Set the tables names and define row numbers and update times
  # *****************************************
SET
  SQLRUN = 
  (
  SELECT
    CONCAT('SELECT EVENT_NAME , ', STRING_AGG(CONCAT("ga4_firebase('",key,"',    event_params).", event_parameter_value, " AS ", key, ',' ), '\n'), '  FROM  `project.analytics_yyyymmdd.events_*` WHERE _TABLE_SUFFIX = "20210801"  ORDER BY 1' )
  FROM (
    SELECT
      DISTINCT event_name,
      params.key AS key,
      (CASE
          WHEN params.value.string_value IS NOT NULL THEN 'string_value'
          WHEN params.value.int_value IS NOT NULL THEN 'int_value'
          WHEN params.value.double_value IS NOT NULL THEN 'double_value'
          WHEN params.value.float_value IS NOT NULL THEN 'float_value'
      END
        ) AS event_parameter_value,
    FROM
      `project.analytics_yyyymmdd.events_*`,
      UNNEST(event_params) AS params
    WHERE
      _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
      AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) )
     AND  EVENT_NAME = 'page_view'
    );
  # *************************************
  # Execute the query
  # *************************************
EXECUTE IMMEDIATE
  (SQLRUN);

Saving and persisting UDFs

I mentioned earlier that recently Google introduced persistent UDFs, ones that can be saved in your database and this is relatively simple to do. You simply need a destination dataset and the name of your UDF. This will then save in your dataset and include a function flag to make them easy to spot. You can then see the function values when you click on them and apply them to any code using the dataset.function_name.

CREATE FUNCTION  UDFS.ga4_firebase

In fact, a great UDF community has formed an open-sourced UDF project which you can actually apply to your own work, including many statistical techniques such as P-Values and much more simply by using the following syntax:

SELECT bqutil.fn.udf_name(variable)

Be sure to check them out!


Suggested content

Measurelab awarded Google Cloud Marketing Analytics Specialisation

At the start of the year, if you’d asked us whether Measurelab would be standing shoulder to shoulder with Europe’s biggest consultancies by September, we would've been surprised. Not because we don't believe in ourselves, but because these things feel so distant - until suddenly, they’re not. So, here it is: we’ve been awarded the Marketing Analytics Services Partner Specialisation in Google Cloud Partner Advantage. What’s the big deal? In Google’s own words (with the obligatory Zs): “Spec

Will Hayes11 Sept 2025

BigQuery AI.GENERATE tutorial: turn SQL queries into AI-powered insights

BigQuery just got a major upgrade, you can now plug directly into Vertex AI using the new AI.GENERATE function. Translation: your analytics data and generative AI are now best friends, and they’re hanging out right inside SQL. That opens up a whole world of new analysis options for GA4 data, but it also raises some questions: * How do you actually set it up? * What’s it good for (and when should you avoid it)? * Why would you batch the query? Let’s walk through it step by step. Step 1: H

Katie Kaczmarek3 Sept 2025

How to start forecasting in BigQuery with zero training

If you’d told me five years ago that I’d be forecasting product demand using a model trained on 100 billion time points… without writing a single line of ML code… I probably would’ve asked how many coffees you’d had that day ☕️ But its a brand new world. And it’s possible. Let me explain What is TimesFM? TimesFM is a new foundation model from Google, built specifically for time-series forecasting. Think of it like GPT for time, instead of predicting the next word in a sentence, it predicts t

Katie Kaczmarek14 Jul 2025