- What is Snowplow
- Enable Snowplow tracking
- Snowplow request flow
- Structured event taxonomy
- Related topics
This page provides an overview of how Snowplow works and how to enable it.
Snowplow is an enterprise-grade marketing and Product Intelligence platform that tracks how users engage with our website and application.
Snowplow consists of several loosely-coupled sub-systems:
- Trackers fire Snowplow events. Snowplow has twelve trackers that cover web, mobile, desktop, server, and IoT.
- Collectors receive Snowplow events from trackers. We use different event collectors that synchronize events to Amazon S3, Apache Kafka, or Amazon Kinesis.
- Enrich cleans raw Snowplow events, enriches them, and puts them into storage. There is a Hadoop-based enrichment process, and a Kinesis-based or Kafka-based process.
- Storage stores Snowplow events. We store the Snowplow events in a flat file structure on S3, and in the Redshift and PostgreSQL databases.
- Data modeling joins event-level data with other data sets, aggregates them into smaller data sets, and applies business logic. This produces a clean set of tables for data analysis. We use data models for Redshift and Looker.
- Analytics are performed on Snowplow events or on aggregate tables.
- Snowplow data structure
- Our Iglu schema registry
- List of events used in our codebase (Event Dictionary)
Tracking can be enabled at:
- The instance level, which enables tracking on both the frontend and backend layers.
- The user level. User tracking can be disabled on a per user basis. GitLab respects the Do Not Track standard, so any user who has enabled the Do Not Track option in their browser is not tracked at a user level.
Snowplow tracking is enabled on GitLab.com, and we use it for most of our tracking strategy.
To enable Snowplow tracking on a self-managed instance:
On the top bar, select Menu > Admin, then select Settings > General. Alternatively, go to
admin/application_settings/generalin your browser.
Select Enable Snowplow tracking and enter your Snowplow configuration information. For example:
Name Value Collector hostname
Select Save changes.
The following example shows a basic request/response flow between the following components:
- Snowplow JS / Ruby Trackers on GitLab.com
- GitLab.com Snowplow Collector
- The GitLab S3 Bucket
- The GitLab Snowflake Data Warehouse
Click events must be consistent. If each feature captures events differently, it can be difficult to perform analysis.
Each click event provides attributes that describe the event.
|category||text||true||The page or backend section of the application. Unless infeasible, use the Rails page attribute by default in the frontend, and namespace + class name on the backend.|
|action||text||true||The action the user takes, or aspect that’s being instrumented. The first word must describe the action or aspect. For example, clicks must be |
|label||text||false||The specific element or object to act on. This can be one of the following: the label of the element, for example, a tab labeled ‘Create from template’ for |
|property||text||false||Any additional property of the element, or object being acted on.|
|value||decimal||false||Describes a numeric value or something directly related to the event. This could be the value of an input. For example, |
* If you choose to omit the category you can use the default.
** Use property for variable strings.
SELECT session_id, event_id, event_label, event_action, event_property, event_value, event_category, contexts FROM legacy.snowplow_structured_events_all WHERE event_label = 'reply_comment_button' AND event_action = 'click_button' -- AND event_category = 'projects:issues:show' -- AND event_value = 1 ORDER BY collector_tstamp DESC LIMIT 20
SELECT -- page_url, -- page_title, -- referer_url, -- marketing_medium, -- marketing_source, -- marketing_campaign, -- browser_window_width, -- device_is_mobile * FROM legacy.snowplow_page_views_30 ORDER BY page_view_start DESC LIMIT 100