BigQuery: Snowplow Schema

Prerequisites

You must grant 'bigquery.dataViewer' access to Indicative's service account for your BigQuery project. 

In order to perform the following steps you must have administrative access to the BigQuery console as well as your BigQuery database.

If there are additional enrichments required, such as joining with user property tables or deriving custom user_ids, please contact us.

Instructions

Adding a Data Source In Indicative

  1. In Indicative, click on Settings and select Data Sources

    mceclip0.png

  2. Click on New Data Source

    mceclip1.png

  3. Select Connect via Data Warehouse or Lake
    mceclip0.png
  4. Select BigQuery as your data connection and Snowplow as the connection schema and click  Connect
    mceclip0.png
  5. You should see this Google + Snowplow Overview screen. Click Next
    mceclip2.png


Connection Informationmceclip3.png

  1.  Open the BigQuery console on Google Cloud Platform and Select a project.
  2. Enter the GCP Project ID containing your Snowplow data.
    mceclip2.png

  3. Enter the Dataset Name
    mceclip4.png
  4. Enter the Table Name and click Next in Indicative.mceclip5.png

Grant Permissions
mceclip4.png

This integration works by sharing the dataset with Indicative's service account and only requires read-only access to that dataset. Indicative takes on the cost of the query and caches this data in Indicative's proprietary analytics engine.

  1. Within the BigQuery Console, select your Project and your dataset from the previous section.
  2. Click on Share Dataset

  3. In the Dataset Permissions panel, in the Add Members field, place the user below.
    integrations@indicative-988.iam.gserviceaccount.com
  4. In the Select a Role dropdown, select BigQuery Data Viewer and click Add.
    mceclip5.png

Event Modeling
mceclip6.png

 

  1. In the Structured Event Name section, select the field that should be used to derive Indicative event names. Typically, most customers use the se_action field, but it completely depends on your implementation. 
    We will first look at this field's value to use as the event name in Indicative. If this value is null, then we will use the event_name field. If this field's value is also null, we will then use the event field.

  2. For Timestamp, select the field that represents the time that the event was performed. Indicative will use this field to run its queries. If unsure, leave as derived_tstamp.
    1. collector_tstamp - Timestamp for the event recorded by the collector.
    2. dvce_created_tstamp - Timestamp for the event recorded on the client device.
    3. dvce_sent_tstampWhen the event was actually sent by the client device.
    4. etl_tstamp - Timestamp for when the event was validated and enriched. Note: the name is historical and does not mean that the event is loaded at this point (this is further downstream).
    5. derived_tstamp - Timestamp making allowance for inaccurate device clock.
    6. true_tstamp - User-set "true timestamp" for the event.

  3. For Vendor Name, input the Snowplow vendor names used so we can simplify your event property names. 

User Identification (Aliasing)

mceclip0.png

For more information on User Identification (Aliasing), please refer to this article.

*NoteIf aliasing is not preferred, please set the Authenticated ID Type to None and press Next

  1. Select the Type for the Unauthenticated ID
    1. Atomic - This will allow you to choose between the domain_userid and network_userid fields that are part of the standard Snowplow event structure. 
      We typically recommend domain_userid since this uses a 1st party cookie. Click here for more information.
    2. Context - If the unauthenticated ID is part of a Snowplow context, choose this option. Enter the values for Vendor, Name, Version, and Field.
    3. Other - If the unauthenticated field is not either of the options, please specify where we can find the unauthenticated ID in the data.

  2. Select the Type for Authenticated ID
    1. Atomic - Enter the field name that should be used for known users. Typically, it is the user_id field in the raw enriched event archive data.
    2. Context - If the authenticated ID is part of a Snowplow context, choose this option. Enter the values for Vendor, Name, Version, and Field.
    3. Other - If the authenticated field is not either of the options, please specify where we can find the authenticated ID in the data and a Solutions Engineer will take a look.
    4. None - choose this option to skip aliasing.

Scheduling

mceclip1.png

  1. Select the Schedule Interval to adjust the frequency at which new data is available in Indicative.
  2. Set the Schedule Time for when the data should be extracted from your BigQuery environment. It is critical that 100% of the data is available by this time to avoid loading partial data.
  3. Select Next

Waiting for Data

mceclip3.png

Advanced Settings

For additional advanced settings such as excluding certain events and properties, please refer to this page

 

 

0 users found this helpful