Amazon S3: Snowplow Schema

Prerequisites:

The Snowplow Unified Log is stored in an S3 bucket and you is required to write an IAM policy to grant Indicative programmatic access to the respective S3 bucket.

If there are additional enrichments required, such as joining with user property tables or deriving custom user_ids, please contact us.

Instructions:

Adding a Data Source In Indicative

  1. In Indicative, click on Settings and select Data Sources

    mceclip0.png

  2. Click on New Data Source

    mceclip1.png

  3. Select Connect via Data Warehouse or Lake
    mceclip0.png
  4. Select S3 as your data connection and Snowplow as the connection schema and click  Connect
    mceclip0.png
  5. You should see this S3 + Snowplow Overview screen. Click Next
    mceclip2.png


Connection Information
mceclip3.png

  1. Sign in to the AWS Management Console and open your IAM console.
  2. Under the Services dropdown, select S3 under Storage.
    s1.png

  3. Click on the bucket that contains your Snowplow data.
  4. Enter in the Bucket Name into the Indicative UI.
    mceclip3.png
  5. Click on your bucket and refer to the bucket structure. Enter that into File Path field in the Indicative UI.
    mceclip4.png
    In this example, the File Path to put into the Indicative field  is /main/enriched/good
  6. Click Next


Grant Permissions

mceclip4.png

  1. In this section click on the box that contains the policy to copy to your clipboard. You will need to use this in step 4 of this section.
    mceclip5.png
  2. Go back to the AWS Console. Select the bucket and click on the Permissions tab.
    s2.png
  3. Click on Bucket Policy
    s3.png
  4. Enter the copied policy from step 1 into the editor and click Save
  5. Click Next in Indicative. 

Event Modeling
mceclip0.png

  1. In the Structured Event Name section, select the field that should be used to derive Indicative event names. Our logic will first look at this field, an if this value is null, it will try to use the event_name field. If that value is also null, then we will look at the event field.
    1. se_action
    2. se_category
    3. se_label 
    4. None - Select this option if you're not using Snowplow's structured events 
  2. For Timestamp, select the field that represents the time that the event was performed. If unsure, leave as derived_tstamp
  3. For Vendor Name, input the Snowplow vendor names used so we can simplify your event property names

User Identification (Aliasing)

mceclip1.png

For more information on User Identification (Aliasing), please refer to this article.

*NoteIf aliasing is not preferred, please set the Authenticated ID Type to None and press Next

  1. Select the Type for the Unauthenticated ID
    1. Atomic - This will allow you to choose between the domain_userid and network_userid fields that are part of the standard Snowplow event structure.
      We typically recommend domain_userid since this uses a 1st party cookie. Click here for more information.
    2. Context - If the unauthenticated ID is part of a Snowplow context, choose this option. Enter the values for Vendor,Name,Version, and Field.
    3. Other - If the unauthenticated field is not either of the options, please specify where we can find the unauthenticated ID in the data.

  2. Select the Type for Authenticated ID
    1. Atomic - Enter the field name that should be used for known users. Typically, it is the user_id field in the raw enriched event archive data.
    2. Context - If the authenticated ID is part of a Snowplow context, choose this option. Enter the values for Vendor,Name,Version, and Field.
    3. Other - If the authenticated field is not either of the options, please specify where we can find the authenticated ID in the data.
    4. None - choose this option to skip aliasing.

Scheduling

mceclip2.png

  1. Select the Schedule Interval to adjust the frequency at which new data is available in Indicative.
  2. Set the Schedule Time for when the data should be extracted from your S3 bucket. It is critical that 100% of the data is available by this time to avoid loading partial data.
  3. Select Next

Waiting for Data
mceclip3.png

Advanced Settings

For additional advanced settings such as excluding certain events and properties, please refer to this page

0 users found this helpful