Skip to main content

Dataform

All about Dataform (23 posts)

23 posts tagged with "Dataform"

GA4 Dataform integration: how Springer Nature modernised analytics with Measurelab

The Springer Nature Group is an academic publishing company, with brands dating back to 1842, that advances scientific discovery by publishing robust and insightful research, supporting the development of new areas of knowledge, making ideas and information accessible around the world, and leading the way on open access. The challenge Springer Nature needed to migrate their Universal Analytics dashboards to GA4 data. Their reporting relied on multiple stacked scheduled queries in BigQuery tha

Steven Elliott28 Aug 2025

Data pipeline optimisation with Google Cloud and Dataform

In our recent engagement with a client, we went on a journey to transform their data pipelines, tackling inefficiencies in performance and cost within their Google Cloud BigQuery environment. Our efforts culminated in a comprehensive optimisation strategy that used Dataform, improved SQL practices, and implemented tailored solutions for significant performance gains and cost savings. Here’s a deep dive into the highlights of our project. Identifying inefficiencies in BigQuery workflows We beg

Prasanna Venkatesan22 Apr 2025

Dataform for BigQuery: A basic end-to-end guide

Dataform is a powerful tool for managing your data workflows in a structured, version-controlled, and automated way. Whether you're a beginner or an experienced data engineer, Dataform simplifies SQL-based transformations while integrating seamlessly with Google BigQuery. Although this blog offers a basic introduction to Dataform's functionality, users can achieve significantly more with Dataform. From advanced scheduling, parameterised queries, and dependency management to complex data modelli

Prasanna Venkatesan18 Mar 2025

Behind the Cloud – Releases and scheduling in Dataform

In this episode of Behind the Cloud, Matthew dives into the details of releases and scheduling in Dataform. He breaks down how to manage different versions of your codebase in GitHub. From taking snapshots, to scheduling executions at various intervals daily, hourly, or monthly. By the end of the episode, you’ll have the know-how to confidently release and schedule your code, making it easier to build robust tables and models with Dataform. Video transcript Introduction to releases and sche

Matthew Hooson7 Mar 2025

Mastering data loading in BigQuery using Dataform

Efficient data loading is crucial for managing and updating tables in Dataform. Various strategies exist to handle different use cases, including truncate and load, appending data, and leveraging incremental tables with unique keys. This blog explores these primary methods and more: Truncate and Load In this method, all existing records in the target table are deleted and replaced with a fresh table. This approach works well when a full table refresh is necessary or if managing slowly changin

Prasanna Venkatesan7 Feb 2025

A step-by-step guide to migrating scheduled queries to Dataform

Managing scheduled queries in BigQuery often feels limiting — there’s no version control, no easy collaboration, and scaling can be difficult. If you’ve ever wondered how to make SQL workflows smoother, Dataform is your answer. In this post, I’ll show you how I migrated a BigQuery scheduled query to Dataform and how it transformed the way I manage my data pipelines. After all, we all want to know who’s been touching our queries, don’t we? Getting started in Dataform First thing you need to

Katie Kaczmarek27 Nov 2024

How to set up a Dataform repository with GitHub & Google Cloud integration

Setting up a Dataform repository can be challenging without the right steps. Whether you’re new to Dataform or want to optimise your workflow, this guide will show you how to seamlessly connect it with GitHub and Google Cloud (GC). What is Dataform and why use it? Dataform is a powerful tool for managing version-controlled SQL workflows in a collaborative way. GC incorporates BigQuery and GitHub integration, providing an efficient way to organise and maintain complex data pipelines. Let’s bre

Katie Kaczmarek27 Nov 2024

Integrating siloed data: Springer Nature marketing and sales case study

The Springer Nature Group is an academic publishing company, with brands dating back to 1842, that advances scientific discovery by publishing robust and insightful research, supporting the development of new areas of knowledge, making ideas and information accessible around the world, and leading the way on open access. The challenge The sales and marketing teams depended on incomplete data, which didn’t capture the entire customer journey due to different systems in use. Transactions and re

Mark Rochefort7 Aug 2024

Behind the Cloud: Setting up a Dataform project within BigQuery

In this episode of Behind the Cloud, Matthew demonstrates how to enable and set up a Dataform project within BigQuery, connect it to GitHub, and initialise the workspace for building a Dataform project. Matt walks us through enabling BigQuery, creating a repository, setting up the region, and using service accounts. Video transcript Introduction to Dataform in BigQuery [00:00:00] Matt: Hello and welcome to this week’s behind the cloud sticking with the practical theme today. We’re going to

Matthew Hooson4 Apr 2024

Behind the Cloud: Dataform, what is it and why does it matter?

In this episode of Behind the Cloud, Matt discusses Dataform, what it is, and why it matters. Video transcript Cloud Data Warehousing [00:00:00] Matt: Welcome to Behind the Cloud. Today we’re exploring data form, but first a little bit of scene setting. Over the past number of years, cloud computing, specifically cloud data warehousing, has advanced significantly. Huge amounts of data can be queried in seconds. The scalability of the platforms is near infinite from both a performance and a

Matthew Hooson26 Feb 2024

Behind the Cloud: The essentials of Google BigQuery

In this episode of Behind the Cloud, Matt discusses the essentials of Google Cloud’s BigQuery. Everything from project structure, data handling, to understanding the costs involved. Video transcript [00:00:00] Matt: Hello and welcome to Behind the Cloud. Today we’re going to be diving into the nuts and bolts of Google Cloud’s BigQuery and how it can help to revolutionise your marketing analytics. Whether you’re really familiar with the cloud or this is all new to you, this episode aims to gui

Matthew Hooson7 Feb 2024

Behind the Cloud: What Google Cloud tools should you be familiar with?

In this episode of Behind the Cloud, Matthew aims to answer the question: what are the Google Cloud Platform tools of the marketing analytics trade? And more specifically, what are the tools that you should care about in Google Cloud. For a more in-depth write up on Google Cloud tools, check out Matt’s blog post. Video transcript [00:00:00] Matt: Hello and welcome to today’s episode of Behind the Cloud. We’re going to try and answer the question, what are the GCP tools of the marketing analyt

Matthew Hooson13 Jan 2024