Snowplow – Integration Overview

Snowplow Data Model

Snowplow Overview

Snowplow is an open-source platform that allows businesses to capture granular  and event-level data on user behavior from across multiple touchpoints and store it in a single location. The platform is designed to function at enterprise levels and Snowplow events can be plugged into the majority of analytics tools.

Because Snowplow collects event-level data already characterized by events and properties, Snowplow data needs virtually zero adjusting before being connected to Indicative for analysis.

After the Snowplow Enrich step, Snowplow events are stored in an AWS S3 bucket or streamed to AWS Kinesis.  Indicative reads Snowplow data from either source (we will refer to both as the Unified Log). Users can extend their data model with other data sources at the Data Modeling step (see Advanced data Modeling) before conducting analysis in Indicative.

spp1.png

Snowplow Canonical Event Model

An implementation of Snowplow allows for tracking a range of predefined events, modeled structured events, or unstructured events that users can custom model.  

By default, a wide range of common properties are logged with any implementation of Snowplow. In addition, customers can define both structured contexts and more flexible custom contexts.

Deriving Indicative Events and Properties

Overview

Because Snowplow inherently uses an event-based model, there is no transformation needed to plug Snowplow data into Indicative for analysis.  See below for how events and properties are derived from specific types Snowplow events.

By default, for all event generation the ‘domain_userid’ field in the Snowplow Unified Log is used as the unique user identifier in Indicative.  The collector_tstamp’ in the unified log is the timestamp used by Indicative.

Event and Properties Summary Table 

Snowplow Entity

Unique Identifier Used

Timestamp

Properties

Predefined events (page views, page pings, ecommerce transactions, errors)

‘domain_userid’

collector_tstamp’

Common, Platform-specific fields, and applicable Custom Contexts

Structured Events

‘domain_userid’

collector_tstamp’

Common and Platform-specific fields, ‘se_category’, ‘se_label’, ‘se_property’, ‘se_value’ and applicable Custom Contexts

Unstructured Events

‘domain_userid’

collector_tstamp’

Common, Platform-specific fields, and applicable Custom Contexts

 

Predefined Events

Snowplow has a set of predefined event types that can be instrumented:

  • Page views
  • Page pings
  • Ecommerce transactions
  • Errors

If instrumented, the integration will generate these events, where the ‘event’ field in the Unified Log is used as the event name in Indicative.

Common and Platform-Specific Properties

For all Snowplow events, a range of datetime, user and device fields are recorded along with the event.  If instrumented, all of these fields are generated as Indicative properties.  Additionally, any platform-specific fields will be recorded as well, such as page referer and url information for a web-specific instrumentation.

Structured Events

Snowplow custom structured events are generated using the ‘se_action’ field as the event name for Indicative (or the ‘event_name’ if ‘se_action’ is not populated).  The ‘se_category’, ‘se_label’, ‘se_property’ and ‘se_value’ fields are added as Indicative properties in addition to all common and platform-specific properties.

Unstructured Events

Snowplow allows customers to model flexible custom unstructured events as needed.  The ‘event_name’ field is used for the event name in Indicative (or it defaults to ‘unstruct’ if ‘event_name’ is not populated).

Custom Contexts

Snowplow allows customers to define their own context around events, such as extra user properties for a customer (membership information, age, etc.) or extra properties about a product for a purchase event (SKU, tags, product name, etc.).

Aliasing (User Identifier Unification)

If the customer platform has both authenticated and unauthenticated sessions for their users, Indicative automatically aliases these sessions by tying the ‘user_id’ field to associated ‘domain_userid’in order to have a shared user history for analysis.  For further reading on Indicative aliasing process, please see Aliasing Documentation.

Advanced Data Modeling

Overview

In addition to generating events from predefined events, custom structured and custom unstructured snowplow events, the Snowplow integration with Indicative allows customers to apply another layer of data modeling using all fields available in the Snowplow Unified Log and custom data tables provided by the customer.  

For example, customers can generate a ‘Splash Page View’ if the Snowplow ‘page view’ event is generated and the page_url contains ‘landing’.  Customers can take any subset of Snowplow events/properties and logical operators to generate new Indicative events and properties.

Also, if customers have extra data tables to join against Snowplow data, new events and properties can be modeled and generated at this step.

Generation Logic for New Events and Properties

If a customer needs a set of events and properties that are not captured in the default Snowplow integration, the customer can provide flexible logic-based rules to generate new events and properties based on Snowplow data or custom data tables.  For more information see the IndicativeIO documentation for details.

0 users found this helpful