What is a ‘User’ in GA4?
First thing first, users are not people!
Often we get asked “how many people did X on the site yesterday” or something like that, which is not possible to answer in Google Analytics 4 (GA4). GA4 doesn’t track people, it tracks users. But the reality is that users are the closest proxy we have to people, so we end up just using it anyway. And even rename it to ‘people’ in our reports and dashboards to prevent having this conversation each and every time…
Active Users vs. Total Users
GA4 defines an ‘active user’ as a person who has an engaged session, or when Analytics collects the first_visit from a website or first_open event from an app. So every new user and non-bouncing returning users are counted as ‘active’.
On the other hand, ‘total users’ is a total count of all users to the website and apps, regardless of their engagement or whether or not it’s their first time.
One of the major differences from Universal Analytics (UA) is that active users now take centre stage in Google Analytics 4. While most data from UA concentrates on total users, GA4 concentrates on active users.
Wherever you see the metric ‘Users’ in GA4, it is showing ‘Active Users’.
GA4 has four different ways to identify users:
User ID is the process of identifying the user yourself, and then sending that data into GA4 to use. This is only applicable if you have some way for a user to identify themselves, for example if they log into an account.
When a user registers for an account or logs in to an existing account, you can pass that user ID into GA4 to link all of their subsequent website or app events. As a result, you get a more full picture of how customers behave and interact with your company over time across different platforms (app and web), as well as across different browsers and devices.
You must manually implement User ID in your website and/or app tracking in order to use it with GA4. A user’s user ID is passed into GA4 via the ‘user_id’ user property. Sending this in with every event, GA4 connects all activities associated with that user ID to a single user profile, providing a more complete view of user behaviour.
Google Signals is an optional feature in GA4 that is used to connect your website and app data to Google’s own data. Part of this feature is used for identifying users by utilising data from logged-in Google users who have given permission to share their information.
Data thresholding may result when Google Signals data is used for user identification. In circumstances where Google Signals is enabled, you can avoid data thresholding by using the ‘device-based’ reporting identity.
Every user or device that interacts with your website or app is given a special identity called a ‘Device ID’ in GA4. It makes it possible for GA4 to distinguish between several users and sessions and to monitor user behaviour across various devices.
For websites, the Device ID is a first-party cookie called _ga which is placed on the user’s browser and contains the Client ID. The Client ID is thereafter provided to GA4 with each succeeding hit, enabling it to monitor the user’s behaviour over time for that browser.
In apps, it works almost exactly the same, but the Device ID is the App Instance ID.
This is GA4’s way to account for non-consented user data by using machine learning to ‘fill in the gaps’. We have written up a deep-dive all about how this works if you want to learn more.
Reporting identities in GA4
GA4 allows you to define what a user actually is. That is, what of the different identification methods should be used to deduplicate against. These are called Reporting Identities, and there are three available to use:
It is important to note, each of the three options will result in different counts for user-based metrics within the interface and in the Data API (i.e. in your Looker Studio dashboards). Changing the reporting identity does not permanently alter the underlying data in GA4, you’re free to change this as many times as you like and the data will be updated instantly across all the reports.
The blended reporting identity option loops between all four methods of user ID:
In order to distinguish between users and group events for reporting and exploration, it first employs the user ID. GA4 takes data from Google signals if it is available in the absence of a user ID. Analytics uses the device ID if neither the User-ID nor the Google Signals information is available. In the absence of a device ID (where consent if not granted), GA4 uses its machine learning modelling where possible.
The observed reporting identity utilises all trackable IDs, but excludes modelled data:
If a user ID is collected, it is used first. GA4 takes data from Google signals if it is available in the absence of a user ID. If neither are available, GA4 then uses the device ID.
The device-based reporting identity ignores any other IDs that are gathered, and only uses the device ID:
This approach aligns most closely with user calculation in Universal Analytics.