Webinar: Meet Dataform, the smart solution to fragile SQL setups
In this 30-minute webinar, learn how to bring structure and sanity to your SQL workflows, without adding complexity. We cover practical
The Google Cloud Platform (GCP) is vast. It is composed of over 100 services which serve purposes across countless industries and subsections of business. From the Internet of Things (IoT) to healthcare, there are tools within the GCP that are extremely powerful and invaluable.
That is all lovely, but it can be overwhelming. With the recent free connection between Google Analytics 4 (GA4) and BigQuery, this sense of overwhelm has been introduced to a whole new set of marketing analytics folks trying to filter out the noise and understand what is relevant to them. So I thought I would help by being that filter by producing a curated list of the tools and services you need to know about. Now you may not use all of them; there are levels of complexity to data transformation and pipelining that require some of the more advanced stuff here, but knowing what they are will at least help put this lump computing power into neater boxes.
Here's the list of GCP services we'll be covering:
BigQuery is the GCP’s flagship serverless data warehouse. It allows for the analysis of massive datasets in real time. Instead of managing database servers, users can focus on using SQL queries to extract insights. BigQuery automatically manages the underlying resources, scales with your needs, and ensures high availability of your data.
Google has done a lot to make BigQuery many people's gateway to the GCP, but don't expect something as user-friendly as the GA4 UI. You'll need to know some basic SQL to get started, although the introduction of generative Artificial Intelligence (AI) in the form of Duet AI is going to help there.
A simple example of how BigQuery might be used for marketing analytics is to extract your GA4 data and Google Ads data using the free connectors and build out some reporting tables for lightweight dashboards in Looker Studio.
Keep the following in mind:
Dataform ensures the maintenance of scalable and reliable data transformations by enabling software engineering practices within data teams. Integrating practices like version control, testing, and documentation directly into the data workflow, facilitates a new era of structured, maintainable, and error-free data operations.
To really get the most out of Dataform you will need to be familiar with GitHub or similar. This allows for all the versioning, backup and documentation features. Javascript knowledge is also nice to have to get the most out of ‘SQLx’, the version of SQL used in the service.
An example use case of Dataform in marketing analytics might be to build out a Dataform repository which creates reporting tables autonomously. Each stage of any transformation, staging table creation and joining is completed in order. You could also use ‘assertions’ to spot and alert you to any issues with data quality.
Keep the following in mind:
Cloud Functions allow developers to run backend code without managing servers. It's event-driven, meaning the code is executed in response to specific events (like changes in data, user actions or scheduled triggers). This serverless compute solution is especially useful for tasks like processing data, integrating with third-party systems, or even for automation purposes.
It's designed to be straightforward, especially for those familiar with writing code. The real beauty is you don't need to worry about the infrastructure. However, understanding event-driven architecture can be a plus.
A common example use case is to set up a Cloud Function to automatically extract data from an API and store it in BigQuery, where it can then be transformed and joined to other marketing analytics data using Dataform or basic SQL.
Keep the following in mind:
Think of Cloud Scheduler as a modern task scheduler in the cloud. It lets you run jobs on a defined schedule, akin to the Unix cron utility, but with the benefits of a cloud-native environment. From triggering HTTP endpoints to invoking Cloud Functions or sending Cloud Pub/Sub messages, Cloud Scheduler allows automation with reliability and fault-tolerance built-in.
It's pretty straightforward, especially if you have experience with cron jobs. The UI is user-friendly, but understanding the cron syntax is crucial, although there are plenty of resources to help.
Set up a Cloud Scheduler job to trigger the Cloud Function mentioned in the previous section every night at midnight, so the previous day's data is waiting for you in BigQuery and on your dashboard the next morning.
Keep the following in mind:
Cloud Composer is GCP’s managed orchestration service built on Apache Airflow. It's designed to create, schedule, and monitor workflows, integrating seamlessly with other GCP services. This means that instead of manually triggering tasks, you can set up a series of dependent tasks and let Cloud Composer manage their execution.
For those new to Apache Airflow, there is a steep learning curve. However, the managed nature of Cloud Composer alleviates many of the operational challenges.
When dealing with larger data workloads you may set up a workflow in Cloud Composer to extract marketing analytics data from multiple sources, transform it, and then load it into BigQuery daily.
Keep the following in mind:
Cloud Storage provides a durable and highly available storage solution for a variety of data types. It's more than just a place to store files; it's a powerful tool to integrate with big data and Machine Learning (ML) tools, serve websites, store backups, or even archive data for long-term storage.
It’s quite intuitive to use. If you've used other cloud storage services before, you'll find this familiar. The GCP console makes managing buckets and files easy.
You could use cloud storage as a data lake, storing raw unprocessed marketing analytics data before it is transformed and ready for BigQuery.
Keep the following in mind:
Pub/Sub stands for Publish/Subscribe. It's a messaging service designed to connect applications, ensuring the seamless flow of information between them in real-time. With its asynchronous messaging backbone, it can be used to decouple services, making them more scalable and maintainable.
For those unfamiliar with messaging systems, there may be a learning curve. However, its design is to provide a simple, scalable foundation for building event-driven systems.
A simple marketing analytics use case may be using Pub/Sub to help trigger particular cloud functions or workflows in pipelining. More complex uses may be to handle real-time data and pass it through services like Dataflow for transformation.
Keep the following in mind:
Cloud Workflows enable the automation and orchestration of GCP tasks, bridging various services together into a unified workflow. With its visual interface, users can design and implement complex processes, combining tools like Cloud Functions, Pub/Sub, and even third-party APIs into seamless operations.
It might require some understanding of workflow orchestration and YAML, but the visual representation and Google's documentation make it manageable.
Using Cloud Scheduler to trigger a workflow that runs a cloud function to extract and store marketing analytics data in BigQuery and then kick off a Dataform process that transforms the raw table data ready for reporting is a really simple yet powerful automated Extract Load Transform (ELT) pipeline.
Keep the following in mind:
Dataflow offers both stream (real-time) and batch data processing capabilities built on Apache Beam. This unified model makes it simpler to manage, build, and transform data at any scale, ensuring reliable and consistent data processing for analytic operations.
There is a steeper learning curve to contend with, especially if you're new to stream processing. Familiarity with Apache Beam will help a lot.
If you need to process a stream of real-time data, for example from IoT devices, you can use Dataflow to aggregate metrics and store the results in BigQuery for analysis. This offers real time analytics
Keep the following in mind:
Dataprep is all about simplifying the data preparation phase. Instead of writing code to clean and transform data, Dataprep provides a visual interface, suggesting intelligent transformations and helping users get their data analytics-ready in a fraction of the usual time.
It's designed for analysts and business users, making it more user-friendly. The UI is intuitive, but understanding your data is crucial.
An example use case may be to clean up, enrich and prepare imported marketing analytics data with inconsistencies and missing values. Maybe for use in Auto ML or BQML.
Keep the following in mind:
Dataplex is a holistic solution for managing and governing data across data lakes and warehouses like BigQuery. It provides a centralised view, ensuring consistent policy enforcement, data security, and discoverability across various datasets.
I would describe its learning curve as intermediate. If you have experience managing data lakes and understand data governance principles, it'll be smoother.
An obvious example is helping to manage a large data lake/data warehouse with multiple datasets. With Dataplex, you centralise the metadata, enforce security policies, and provide a single point of discovery for marketing analytics analysts.
Keep the following in mind:
Vertex AI is an end-to-end platform for ML, designed to simplify the process of building, training, and deploying ML models. With pre-built algorithms, a unified interface, and integration with other GCP services, it offers a streamlined approach for both novice and experienced ML practitioners.
There's a steeper curve, especially if you're new to ML. However, it's designed to be more user-friendly than many other ML platforms, providing tools for both novices and experts.
Use it to train an ML model to predict customer churn. Using Vertex AI, you can refine, test, and deploy this model, then use it to make real-time predictions based on new customer data.
Keep the following in mind:
When looking at this list, there are a few things to remember. First, it is not exhaustive; there are other services not listed above (Dataproc, Cloud Run, BigLake, Firestore etc) that you may come across on your journey, but I think these may just confuse things further to talk about here. There are also services I would term ‘utility’ services, like IAM, Cloud Monitoring and Cloud Logging, that deserve their own dedicated blog. Second, there are levels to GCP use in marketing or product analytics. For many of you first starting out, a Cloud Function, Cloud Scheduler and BigQuery will be all you touch. Others, as they grow in confidence, will take in the other services mentioned, like Dataform, BQML and Cloud Composer.
I would encourage you to think backwards from your goal when deciding on the tools you need. Have a data product in mind, be it a dashboard, a chart or a regular report and figure out the most efficient, scalable and secure way to design that using the tools at your disposal. Here at Measurelab, we are a Google Cloud Service partner, so if you have any questions or would like us to help you flesh out your approaches, please get in touch.
In this 30-minute webinar, learn how to bring structure and sanity to your SQL workflows, without adding complexity. We cover practical
In this episode of The Data Intervention, Mac delves into the significance of first-party data in modern marketing.
In this 30-minute webinar, we use three real-world examples to demonstrate how a composable approach can help you do more with the data and tech you already have - delivering value at speed. We cover three practical examples of how your marketing data warehouse can support better insights and experiences - without the need for more martech. Learn how you can: * Score wins quickly and regularly * Streamline your martech stack * Stop planning and start building Watch the replay 👇 Have que