The Google Cloud tools of the marketing analytics trade

The Google Cloud Platform (GCP) is vast. It is composed of over 100 services which serve purposes across countless industries and subsections of business. From the Internet of Things (IoT) to healthcare, there are tools within the GCP that are extremely powerful and invaluable.

That is all lovely, but it can be overwhelming. With the recent free connection between Google Analytics 4 (GA4) and BigQuery, this sense of overwhelm has been introduced to a whole new set of marketing analytics folks trying to filter out the noise and understand what is relevant to them. So I thought I would help by being that filter by producing a curated list of the tools and services you need to know about. Now you may not use all of them; there are levels of complexity to data transformation and pipelining that require some of the more advanced stuff here, but knowing what they are will at least help put this lump computing power into neater boxes.

Here’s the list of GCP services we’ll be covering:


BigQuery

BigQuery is the GCP’s flagship serverless data warehouse. It allows for the analysis of massive datasets in real time. Instead of managing database servers, users can focus on using SQL queries to extract insights. BigQuery automatically manages the underlying resources, scales with your needs, and ensures high availability of your data.

Google has done a lot to make BigQuery many people’s gateway to the GCP, but don’t expect something as user-friendly as the GA4 UI. You’ll need to know some basic SQL to get started, although the introduction of generative Artificial Intelligence (AI) in the form of Duet AI is going to help there.

A simple example of how BigQuery might be used for marketing analytics is to extract your GA4 data and Google Ads data using the free connectors and build out some reporting tables for lightweight dashboards in Looker Studio.

Keep the following in mind:

  • Cost: While BigQuery charges for the amount of data processed, without careful query management and partitioning, costs can escalate quickly.
  • Streaming Data: Ingesting streaming data can lead to additional costs.
  • Complexity: For very large datasets, optimising for performance might require a deeper understanding of how BigQuery works internally.

Dataform

Dataform ensures the maintenance of scalable and reliable data transformations by enabling software engineering practices within data teams. Integrating practices like version control, testing, and documentation directly into the data workflow, facilitates a new era of structured, maintainable, and error-free data operations.

To really get the most out of Dataform you will need to be familiar with GitHub or similar. This allows for all the versioning, backup and documentation features. Javascript knowledge is also nice to have to get the most out of ‘SQLx’, the version of SQL used in the service.

An example use case of Dataform in marketing analytics might be to build out a Dataform repository which creates reporting tables autonomously. Each stage of any transformation, staging table creation and joining is completed in order. You could also use ‘assertions’ to spot and alert you to any issues with data quality.

Keep the following in mind:

  • Complexity: Requires familiarity with version control systems like Git. This might be intimidating for those without a software development background.
  • Dependency: Being too dependent on a tool like Dataform can become an issue if you need to transition to another system.

Cloud Functions

Cloud Functions allow developers to run backend code without managing servers. It’s event-driven, meaning the code is executed in response to specific events (like changes in data, user actions or scheduled triggers). This serverless compute solution is especially useful for tasks like processing data, integrating with third-party systems, or even for automation purposes.

It’s designed to be straightforward, especially for those familiar with writing code. The real beauty is you don’t need to worry about the infrastructure. However, understanding event-driven architecture can be a plus.

A common example use case is to set up a Cloud Function to automatically extract data from an API and store it in BigQuery, where it can then be transformed and joined to other marketing analytics data using Dataform or basic SQL.

Keep the following in mind:

  • Cold Starts: Initial invocation after a period of inactivity might take longer to execute.
  • State Management: Being serverless, managing state across invocations can be challenging.
  • Limitations: There are limits on execution time, memory, and deployment package size. (although these are better with generation 2 functions)

Cloud Scheduler

Think of Cloud Scheduler as a modern task scheduler in the cloud. It lets you run jobs on a defined schedule, akin to the Unix cron utility, but with the benefits of a cloud-native environment. From triggering HTTP endpoints to invoking Cloud Functions or sending Cloud Pub/Sub messages, Cloud Scheduler allows automation with reliability and fault-tolerance built-in.

It’s pretty straightforward, especially if you have experience with cron jobs. The UI is user-friendly, but understanding the cron syntax is crucial, although there are plenty of resources to help.

Set up a Cloud Scheduler job to trigger the Cloud Function mentioned in the previous section every night at midnight, so the previous day’s data is waiting for you in BigQuery and on your dashboard the next morning.

Keep the following in mind:

  • Overlapping Jobs: If jobs take longer than expected, there’s a risk of overlapping executions.
  • Complexity: While the UI is intuitive, understanding cron syntax and setting up secure invocations might be challenging for some.

Cloud Composer

Cloud Composer is GCP’s managed orchestration service built on Apache Airflow. It’s designed to create, schedule, and monitor workflows, integrating seamlessly with other GCP services. This means that instead of manually triggering tasks, you can set up a series of dependent tasks and let Cloud Composer manage their execution.

For those new to Apache Airflow, there is a steep learning curve. However, the managed nature of Cloud Composer alleviates many of the operational challenges.

When dealing with larger data workloads you may set up a workflow in Cloud Composer to extract marketing analytics data from multiple sources, transform it, and then load it into BigQuery daily.

Keep the following in mind:

  • Cost: Given it’s a managed service, costs can be higher than self-managed solutions, especially if not monitored.
  • Customisation: Some advanced Airflow configurations might be limited given the managed nature of the service.

Cloud Storage

Cloud Storage provides a durable and highly available storage solution for a variety of data types. It’s more than just a place to store files; it’s a powerful tool to integrate with big data and Machine Learning (ML) tools, serve websites, store backups, or even archive data for long-term storage.

It’s quite intuitive to use. If you’ve used other cloud storage services before, you’ll find this familiar. The GCP console makes managing buckets and files easy.

You could use cloud storage as a data lake, storing raw unprocessed marketing analytics data before it is transformed and ready for BigQuery.

Keep the following in mind:

  • Access Management: Without proper bucket policies and Identity and Access Management (IAM) roles, there’s a risk of unauthorised data access.
  • Data Retrieval: Retrieving large amounts of data, especially from cold storage classes, can be time-consuming and costly.

Pub/Sub

Pub/Sub stands for Publish/Subscribe. It’s a messaging service designed to connect applications, ensuring the seamless flow of information between them in real-time. With its asynchronous messaging backbone, it can be used to decouple services, making them more scalable and maintainable.

For those unfamiliar with messaging systems, there may be a learning curve. However, its design is to provide a simple, scalable foundation for building event-driven systems.

A simple marketing analytics use case may be using Pub/Sub to help trigger particular cloud functions or workflows in pipelining. More complex uses may be to handle real-time data and pass it through services like Dataflow for transformation.

Keep the following in mind:

  • Message Duplication: Ensuring exactly once processing can be complex.
  • Subscription Management: Unattended subscriptions can accumulate a backlog of unprocessed messages leading to increased costs.

Cloud Workflows

Cloud Workflows enable the automation and orchestration of GCP tasks, bridging various services together into a unified workflow. With its visual interface, users can design and implement complex processes, combining tools like Cloud Functions, Pub/Sub, and even third-party APIs into seamless operations.

It might require some understanding of workflow orchestration and YAML, but the visual representation and Google’s documentation make it manageable.

Using Cloud Scheduler to trigger a workflow that runs a cloud function to extract and store marketing analytics data in BigQuery and then kick off a Dataform process that transforms the raw table data ready for reporting is a really simple yet powerful automated Extract Load Transform (ELT) pipeline.

Keep the following in mind:

  • Complexity: Building and maintaining complex workflows might require deeper understanding and careful error handling.
  • Limitations: There are restrictions on execution time and payload size

Dataflow

Dataflow offers both stream (real-time) and batch data processing capabilities built on Apache Beam. This unified model makes it simpler to manage, build, and transform data at any scale, ensuring reliable and consistent data processing for analytic operations.

There is a steeper learning curve to contend with, especially if you’re new to stream processing. Familiarity with Apache Beam will help a lot.

If you need to process a stream of real-time data, for example from IoT devices, you can use Dataflow to aggregate metrics and store the results in BigQuery for analysis. This offers real time analytics

Keep the following in mind:

  • Cost: Real-time stream processing can become costly if not optimised properly.
  • Complexity: Building and optimising pipelines, especially for real-time data, requires a good understanding of both Dataflow and Apache Beam.

Dataprep

Dataprep is all about simplifying the data preparation phase. Instead of writing code to clean and transform data, Dataprep provides a visual interface, suggesting intelligent transformations and helping users get their data analytics-ready in a fraction of the usual time.

It’s designed for analysts and business users, making it more user-friendly. The UI is intuitive, but understanding your data is crucial.

An example use case may be to clean up, enrich and prepare imported marketing analytics data with inconsistencies and missing values. Maybe for use in Auto ML or BQML.

Keep the following in mind:

  • Performance: Transformations on very large datasets can be time-consuming.
  • Advanced Customization: For very specific transformations, the visual interface might be limiting.

Dataplex

Dataplex is a holistic solution for managing and governing data across data lakes and warehouses like BigQuery. It provides a centralised view, ensuring consistent policy enforcement, data security, and discoverability across various datasets.

I would describe its learning curve as intermediate. If you have experience managing data lakes and understand data governance principles, it’ll be smoother.

An obvious example is helping to manage a large data lake/data warehouse with multiple datasets. With Dataplex, you centralise the metadata, enforce security policies, and provide a single point of discovery for marketing analytics analysts.

Keep the following in mind:

  • Complexity: Proper setup and configuration require an understanding of data lakes and data governance.

Vertex AI

Vertex AI is an end-to-end platform for ML, designed to simplify the process of building, training, and deploying ML models. With pre-built algorithms, a unified interface, and integration with other GCP services, it offers a streamlined approach for both novice and experienced ML practitioners.

There’s a steeper curve, especially if you’re new to ML. However, it’s designed to be more user-friendly than many other ML platforms, providing tools for both novices and experts.

Use it to train an ML model to predict customer churn. Using Vertex AI, you can refine, test, and deploy this model, then use it to make real-time predictions based on new customer data.

Keep the following in mind:

  • Model Management: Proper versioning and management of models is crucial to ensure consistent performance.
  • Cost: Training complex models or using advanced services within Vertex AI can quickly escalate costs.

Conclusion

When looking at this list, there are a few things to remember. First, it is not exhaustive; there are other services not listed above (Dataproc, Cloud Run, BigLake, Firestore etc) that you may come across on your journey, but I think these may just confuse things further to talk about here. There are also services I would term ‘utility’ services, like IAM, Cloud Monitoring and Cloud Logging, that deserve their own dedicated blog. Second, there are levels to GCP use in marketing or product analytics. For many of you first starting out, a Cloud Function, Cloud Scheduler and BigQuery will be all you touch. Others, as they grow in confidence, will take in the other services mentioned, like Dataform, BQML and Cloud Composer.

I would encourage you to think backwards from your goal when deciding on the tools you need. Have a data product in mind, be it a dashboard, a chart or a regular report and figure out the most efficient, scalable and secure way to design that using the tools at your disposal. Here at Measurelab, we are a Google Cloud Service partner, so if you have any questions or would like us to help you flesh out your approaches, please get in touch.

Share:
Written by

Matthew is the Engineering Lead at Measurelab and loves solving complex problems with code, cloud technology and data. Outside of analytics, he enjoys playing computer games, woodworking and spending time with his young family.

Subscribe to our newsletter: