Behind the Cloud: The essentials of Google BigQuery

In this episode of Behind the Cloud, Matt discusses the essentials of Google Cloud’s BigQuery. Everything from project structure, data handling, to understanding the costs involved.

Watch below 👇 or head over to our YouTube channel

Transcript

[00:00:00] Matt: Hello and welcome to Behind the Cloud. Today we’re going to be diving into the nuts and bolts of Google Cloud’s BigQuery and how it can help to revolutionise your marketing analytics. Whether you’re really familiar with the cloud or this is all new to you, this episode aims to guide you through the essentials of the Google BigQuery project from structure to advanced data handling.

Structure

[00:00:23] Matt: In BigQuery, everything starts with an organisation and a GCP or Google Cloud Platform project. Within a project there are services and BigQuery is one of those services. Inside of BigQuery you’ll find datasets which are going to be your main containers and they’ll hold tables and views.

[00:00:42] Matt: Tables store your raw data while a view is a virtual table defined by a SQL query. This hierarchy is crucial for efficient data management and access management levels. All tables and datasets can be labelled and tagged to help with further organisation and security. Adding data to BigQuery ranges from very simple to the more involved.

Importing data

[00:01:03] Matt: You can upload data in various formats like CSV and JSON, or you can use other methods of autonomously pipelining data into the platform. This can be an off the shelf solution like Funnel or Skyviewer. Or it can be more bespoke built solutions like Cloud Functions and Dataflow, which are all services within GCP.

Advanced features

[00:01:24] Matt: BigQuery also supports advanced features like partition tables, which are divided into segments typically around dates called partitions, and they make it easier for you to manage and query data over large data sets. There’s also clustering for faster, more efficient queries. 

[00:01:41] Matt: You also need to set data regionality for compliance and to help reduce latency. And remember, structuring your data wisely can help you save a lot of time and resources and security headaches. 

Querying data

[00:01:51] Matt: To query data in BigQuery, you can use the UI to write standard SQL queries. There are lots of nuances and differences between the different flavours of SQL, and BigQuery is no different in that regard. Check out the docs for more information on syntax and capabilities of all different functions that BigQuery SQL can perform. 

[00:02:10] Matt: Understanding your query size helps manage costs and efficiency, and this can be seen in the top right of a query window. For complex analytics, it’s wise to build out reporting tables to speed up recurring queries, but also to keep larger queries over larger data sets at a minimum to reduce costs.

Exporting query results

[00:02:27] Matt: Also consider scheduling queries for regular reporting and look into tools like Dataform for workflow automations and transformations. Once you’ve got your query results, exporting them for further analysis is really straightforward. You can export to Google sheets, you can download it as a CSV. You can connect to visualisation tools like Looker and Looker Studio, and these will give you really impactful insights. This is the sort of step that really takes your raw data into actionable marketing insights.

Access management

[00:02:48] Matt: BigQuery’s IAM roles allow for granular control over who can access what. It’s essential for maintaining data and security governance. In all things GCP, you should always be following the principles of least privilege.

[00:03:05] Matt: That is, only give users or applications the access they require to carry out their function. You can assign specific roles to team members, ensuring that they only have the right level of access. You can even apply role level access restrictions to filter out results based on user conditions. 

Costs

[00:03:21] Matt: Finally, let’s talk about costs. So BigQuery charges for data storage, streaming inserts, and queries. So keeping an eye on query size and frequency helps control costs. If you want to keep a really close eye on costs, you should consider adding in access and billing tables into BigQuery, which can be set up in ways we’ll talk about in future videos. Remember, efficient data structure and smart querying are going to really lead to significant savings. 

[00:03:49] Matt: And that’s it for today’s Behind the Cloud. We sped through a lot here and it was only really meant to be a whistle stop tour of the platform. We’re going to go into more detail on most of the individual talking points within this video in future episodes.[00:04:02] Matt: As always, thank you for watching. Notes are going to be below the video. If you haven’t done so yet, please like and subscribe to be notified when the next episode is released.

Share:
Written by

Matthew is the Engineering Lead at Measurelab and loves solving complex problems with code, cloud technology and data. Outside of analytics, he enjoys playing computer games, woodworking and spending time with his young family.

Subscribe to our newsletter: