Skip to main content

How to build an interactive GA4 BigQuery data schema

Scott Hellmund23 March 20224 min read
How to build an interactive GA4 BigQuery data schema

What and why?

One of the great features of Google Analytics 4 (GA4) is the ability to pass data into BigQuery (BQ). There are many benefits to this which have already been covered in some of our previous blogs (i.e. 10 reasons to export your GA4 data to BigQuery). Passing data to BigQuery is no longer just available to enterprise GA360 customers, but to anyone using GA4 for free.

What I wanted to create was a simple interactive way to explore how GA4 data is saved in BigQuery. I found the GA360 interactive schema from E-Nor (now Merkle – Cardinal Path) by Justin Marciszewski to be a handy way to know where different elements can be found in the BQ schema, what I'd need to unnest to get to them, and at times found fields I hadn't noticed were available (most recently total.sessionQualityDim).

GA4 has a new schema, different to that of the current schema if you are used to GA360/BigQuery, so this is hopefully useful for new and old GA/BigQuery users.

Finally, as well as outputting an interactive schema I wanted to learn what R was all about and what else may be possible.

How?

The accompanying blog from Justin has a nice to follow process on how the GA360 interactive map was put together.

In the interest of saving time, I skipped the scraping and tidying of data and simply created a data frame manually using the GA4 documentation.

Below is a summary of the steps involved to create the interactive schema using R, along with links to documentation I found helpful and full code on our GitHub for you to be able to recreate this for your own GA4 dataset.

Step 1

Install RStudio on my laptop from a local CRAN Mirror. With R being open source, this is a network of servers to be able to download and install the latest versions of R.

Step 2

Once installed, I then needed to figure out the basics of R. This video tutorial was really helpful in getting a basic understanding of R, from variables to data frames and packages.

Step 3

After I had understood the basics I then needed to install the collapsible tree package. This only needs to be done once for your laptop or whenever you update the version of R.

install.packages("collapsibleTree")

Step 4

Once installed I needed to call the library associated with this package.

library(collapsibleTree)

Step 5

Then I needed to declare the data frame. The same as the previous steps, this was pulled together manually to save time. A future update I can make will be to look at ways this can be automated.

ga4_ <- data.frame(
  node = c(
    NA,'event_','event_',.....
  ),
  leaf = c(
    'event_','event_date',......
  ),
  variableType = c(
    'RECORD','STRING',.....
  )
)

The full data frame can be found on our GitHub.

Step 6

The next step was to call the package referencing the data frame. This has been developed on from earlier iterations to add a custom fill, height, width and ensure the tree starts in a collapsed state. Other fields can also be declared as part of this with more details here.

collapsibleTreeNetwork(ga4_, attribute = "variableType", collapsed = TRUE, zoomable = FALSE, fill = "#66E251", height = 700, width = 1200)

Step 7

The final step was to export this to html. This was simply done via the "Export > Save as Web Page" feature in the viewer tab of RStudio.

You can see the final interactive tree output that we have done over on the GA4 BigQuery Export Schema page.

Summary

At the moment, GA4 is still being developed so I fully expect the current schema to change a few times yet.

In the meantime, I hope this can help anyone who is either transitioning from GA360 to GA4, or to those who are completely new to GA4 and are keen to understand how the data from BigQuery is stored.

Please do get in touch with us if you have any suggestions for future updates, or to let us know how you are using it!


Suggested content

Measurelab awarded Google Cloud Marketing Analytics Specialisation

At the start of the year, if you’d asked us whether Measurelab would be standing shoulder to shoulder with Europe’s biggest consultancies by September, we would've been surprised. Not because we don't believe in ourselves, but because these things feel so distant - until suddenly, they’re not. So, here it is: we’ve been awarded the Marketing Analytics Services Partner Specialisation in Google Cloud Partner Advantage. What’s the big deal? In Google’s own words (with the obligatory Zs): “Spec

Will Hayes11 Sept 2025

BigQuery AI.GENERATE tutorial: turn SQL queries into AI-powered insights

BigQuery just got a major upgrade, you can now plug directly into Vertex AI using the new AI.GENERATE function. Translation: your analytics data and generative AI are now best friends, and they’re hanging out right inside SQL. That opens up a whole world of new analysis options for GA4 data, but it also raises some questions: * How do you actually set it up? * What’s it good for (and when should you avoid it)? * Why would you batch the query? Let’s walk through it step by step. Step 1: H

Katie Kaczmarek3 Sept 2025

How to start forecasting in BigQuery with zero training

If you’d told me five years ago that I’d be forecasting product demand using a model trained on 100 billion time points… without writing a single line of ML code… I probably would’ve asked how many coffees you’d had that day ☕️ But its a brand new world. And it’s possible. Let me explain What is TimesFM? TimesFM is a new foundation model from Google, built specifically for time-series forecasting. Think of it like GPT for time, instead of predicting the next word in a sentence, it predicts t

Katie Kaczmarek14 Jul 2025