Skip to main content

Mastering data loading in BigQuery using Dataform

Prasanna Venkatesan7 February 20253 min read
Mastering data loading in BigQuery using Dataform

Efficient data loading is crucial for managing and updating tables in Dataform. Various strategies exist to handle different use cases, including truncate and load, appending data, and leveraging incremental tables with unique keys. This blog explores these primary methods and more:

Truncate and Load

In this method, all existing records in the target table are deleted and replaced with a fresh table. This approach works well when a full table refresh is necessary or if managing slowly changing data.

Implementation in Dataform:

  • Set the table type to table.

Example:

config {
  type: "table"
}

SELECT
  id,
  name,
  timestamp
FROM
  ${ref("source_table")}

By defining the table type as table, Dataform ensures that each run recreates the table, effectively performing a truncate and load operation.

Append or Insert-Only Loads

This method appends new data to an existing table while preserving historical records. It is ideal for use cases where past data must remain unchanged, and only new records are added.

Implementation in Dataform:

  • Set the table type to incremental.
  • Define an incremental condition to capture only new records.

Example:

config {
  type: "incremental"
}

SELECT
  id,
  name,
  timestamp
FROM
  ${ref("source_table")}
WHERE
  timestamp = CURRENT_DATE() - 2

This ensures that only records from two days ago, based on CURRENT_DATE() - 2, are appended to the target table.

Incremental Loads with Unique Keys

This method ensures that only one row per unique key is retained and updates existing records if changes occur. This is useful for deduplicating or updating records efficiently.

30-minute consultation


Book your free Dataform consultation

Infrastructure audit Transformation plan Efficiency analysis

Implementation in Dataform:

  • Set the table type to incremental.
  • Define an uniqueKey to update existing records with new data instead of inserting duplicates.

Example:

config {
  type: "incremental",
  uniqueKey: "id"
}

SELECT
  id,
  name,
  position,
  timestamp
FROM
  ${ref("source_table")}
WHERE
  timestamp = CURRENT_DATE() - 2

Dataform automatically handles the merge process—if a record with the same id already exists, it gets updated. If a new id appears, it is inserted as a new row.

Incremental Load with Rolling Delete

This method ensures that before inserting new data incrementally, the system deletes records from the last two days to accommodate any late-arriving updates. This is useful for ensuring data freshness while still maintaining an incremental approach.

Implementation in Dataform:

  • Set the table type to incremental.
  • Use a pre-operation delete step to remove data from the last two days before inserting new records.

Example:

config {
  type: "incremental"
}

pre_operations {
  DELETE 
  FROM 
    ${self()} 
  WHERE 
    timestamp >= CURRENT_DATE() - 2;
  }

  SELECT
    id,
    name,
    position,
    timestamp
  FROM
    ${ref("source_table")}
  WHERE
    timestamp >= CURRENT_DATE() - 2

This ensures that any late-arriving updates from the last two days are reflected correctly while keeping the rest of the data intact.

Choosing the Right Method in Dataform

MethodBest Use Case
Truncate and LoadWhen a full table refresh is needed.
Append or Insert-Only When historical records must be preserved.
Incremental Load with Unique KeysWhen deduplication and updates are required.
Incremental Load with Rolling DeleteWhen handling late-arriving updates for the last n number of days.

Understanding these approaches in Dataform allows you to optimise your ETL/ELT workflows and effectively manage data changes for various use cases in BigQuery and other Data Warehouses.

Have you used any of these methods in Dataform? Reach out to let us know, and contact us if we can help you with anything Dataform/BigQuery related! 🚀


Suggested content

Measurelab awarded Google Cloud Marketing Analytics Specialisation

At the start of the year, if you’d asked us whether Measurelab would be standing shoulder to shoulder with Europe’s biggest consultancies by September, we would've been surprised. Not because we don't believe in ourselves, but because these things feel so distant - until suddenly, they’re not. So, here it is: we’ve been awarded the Marketing Analytics Services Partner Specialisation in Google Cloud Partner Advantage. What’s the big deal? In Google’s own words (with the obligatory Zs): “Spec

Will Hayes11 Sept 2025

BigQuery AI.GENERATE tutorial: turn SQL queries into AI-powered insights

BigQuery just got a major upgrade, you can now plug directly into Vertex AI using the new AI.GENERATE function. Translation: your analytics data and generative AI are now best friends, and they’re hanging out right inside SQL. That opens up a whole world of new analysis options for GA4 data, but it also raises some questions: * How do you actually set it up? * What’s it good for (and when should you avoid it)? * Why would you batch the query? Let’s walk through it step by step. Step 1: H

Katie Kaczmarek3 Sept 2025

How to start forecasting in BigQuery with zero training

If you’d told me five years ago that I’d be forecasting product demand using a model trained on 100 billion time points… without writing a single line of ML code… I probably would’ve asked how many coffees you’d had that day ☕️ But its a brand new world. And it’s possible. Let me explain What is TimesFM? TimesFM is a new foundation model from Google, built specifically for time-series forecasting. Think of it like GPT for time, instead of predicting the next word in a sentence, it predicts t

Katie Kaczmarek14 Jul 2025