Data pipeline optimisation with Google Cloud and Dataform

Prasanna Venkatesan•22 April 2025•3 min read

Data pipeline optimisation with Google Cloud and Dataform

In our recent engagement with a client, we went on a journey to transform their data pipelines, tackling inefficiencies in performance and cost within their Google Cloud BigQuery environment. Our efforts culminated in a comprehensive optimisation strategy that used Dataform, improved SQL practices, and implemented tailored solutions for significant performance gains and cost savings. Here’s a deep dive into the highlights of our project.

Identifying inefficiencies in BigQuery workflows

We began by analysing the existing data architecture, identifying key areas of inefficiency:

Long view chains: Multiple layers of views increased complexity and degraded performance.
Lack of incremental logic: Transformations processed entire datasets unnecessarily, increasing costs.
Unoptimised tables: Absence of partitioning and clustering led to full table scans.

Transitioning to Dataform

To address these challenges, we transitioned from BigQuery Scheduled Queries to Dataform, unlocking the following benefits:

Improved maintainability: Modular SQL scripts made workflows easier to debug and maintain.
Version control: Seamless integration with Git allowed better collaboration and change tracking.
Automated dependency management: Dependencies were automatically managed, minimising manual intervention and errors.
Incremental processing capabilities: Focused data processing only on updated data, enhancing efficiency.
Assertions for data quality: Automated checks ensured consistency and reliability throughout the pipeline.

30-minute consultation

Book your free Dataform consultation

✓ Infrastructure audit ✓ Transformation plan ✓ Efficiency analysis

Get started →

Key optimisation strategies

Query structure improvements: Refactored subqueries into Common Table Expressions (CTEs), improving readability and debugging efficiency.
Consolidation of intermediate tables: Combined redundant views into fewer tables, reducing complexity and redundant operations.
Implementation of incremental tables: Processed only new or updated data, significantly reducing data scanned and associated costs.
Partitioning of tables: Partitioned tables based on relevant date fields, improving query performance.
Eliminating full table scans: Restructured incremental tables to prevent full table scans, ensuring more targeted data processing.

Cost savings analysis

Our optimisation efforts translated into substantial cost savings:

Data processed for key tables:
- BigQuery costs were reduced by over 40% annually due to a 92% reduction in data processed for some large output tables.
Dashboard optimisation:
- Dashboard redesigns also contributed to query optimisation. By separating lightweight and heavy charts onto different pages and consolidating scorecards, redundant queries and unnecessary filters were eliminated. This resulted in a substantial reduction - hundreds of GBs per dashboard load - from the final output tables.
Operational efficiencies:
- Reduced manual dependency management, debugging, and new model setup, saving approximately 32 engineer hours monthly, saving close to 20% in cost per month.

Conclusion

This project demonstrates how targeted optimisations can transform data pipelines, improving performance while dramatically reducing costs. Leveraging tools like Dataform and best practices in SQL and BigQuery, we delivered a smarter, more efficient solution tailored to the client’s needs.

Resources

Read our blog post on Dataform for BigQuery: A basic end-to-end guide.
Watch our Behind the Cloud series on YouTube.

Need help with your data platform?

We build intelligence platforms on BigQuery, Dataform and Google Cloud — from setup to ongoing optimisation.

See how we can help

Listen to the podcast

We explore AI, analytics and what's next for data teams - straight from the people building it.

Browse episodes

Suggested content

Your analytics stack is dying. Here's what's going to replace it

The monolithic martech stack is collapsing. An open source, warehouse-first intelligence platform is cheaper, more honest, and actually yours.

Steven Elliott•10 Mar 2026

Clearing 2026: why UK universities need recruitment intelligence

UK universities face a deficit crisis. Student Recruitment Intelligence can transform Clearing from chaos to precision.

Steven Elliott•6 Feb 2026

Measurelab awarded Google Cloud Marketing Analytics Specialisation

At the start of the year, if you’d asked us whether Measurelab would be standing shoulder to shoulder with Europe’s biggest consultancies by September, we would've been surprised. Not because we don't believe in ourselves, but because these things feel so distant - until suddenly, they’re not. So, here it is: we’ve been awarded the Marketing Analytics Services Partner Specialisation in Google Cloud Partner Advantage. What’s the big deal? In Google’s own words (with the obligatory Zs): “Spec

Will Hayes•11 Sept 2025