Snowplow – Real Time Kinesis Setup

To connect your real time Snowplow data to Indicative, follow the instructions below:

Create Your Indicative Account

1. If you do not have an Indicative account, go to http://app.indicative.com/login/#/register to create an account.

sp1.png

Create an IAM Role for the Lambda

Your AWS Lambda needs to have an Execution Role that allows it to use the Kinesis Stream and CloudWatch. (For more information on setting up IAM Roles, please see the official AWS tutorial.)

1. Go to IAM Management in the Console, choose Roles from the sidebar, then click Create role.

2. As shown in the screenshot below, for the type of trusted entity select AWS Service and for the service that will use this role choose Lambda.

sp2.png


3. Now you need to choose a permission policy for the role. The Lambda needs to have read access to Kinesis and write access to CloudWatch logs - for that we will choose AWSLambdaKinesisExecutionRole.

sp3.png


4. On the next screen provide a name for the newly created role, then click Create role to finish the process.

Create the Lambda Function

The Lambda function can be created either directly through AWS Console or through other tools like the AWS CLI. For this integration, the recommended memory setting is 256 MB and because the JVM has to cold start when the function is called for the first time on a new instance, you should set a high timeout value; 90 seconds should be safe.

As with the IAM Role, we will be using the AWS Console to get our Lambda function up and running.

1. On the Console navigate to the Lambda section and click Create a function (runtime should be Java 8). In the Role dropdown pick Choose an existing role; then in the dropdown below choose the name of the role you created in the previous step. Click Create function.

sp4.png

2. The Lambda has been created, although it does not do anything yet. We need to provide the code and configure the function:

a. Take a look at the Function code box. In the Handler textbox paste: com.snowplowanalytics.indicative.LambdaHandler::recordHandler


b. From the Code entry type dropdown pick Upload a file from Amazon S3. A textbox labeled S3 Link URL will appear. We are hosting the code through our hosted assets. You will need to choose the S3 bucket in the same region as your AWS Lambda function: for example if your Lambda is us-east-1 region, paste the following URL: s3://snowplow-hosted-assets-us-east-1/relays/indicative/snowplow-indicative-relay-0.1.0.jar in the textbox. Take a look at this table to pick the right bucket name for your region.

sp5.png

3. Below Function code settings you will find a section called Environment variables. In the first row, first column (the key), type INDICATIVE_API_KEY. In the second column (the value), paste your API Key obtained in the beginning of this guide.

sp6.png

4. Scroll down a bit and take a look at the Basic settings box. There you can set memory and timeout limits for the Lambda.

As mentioned earlier, we recommend setting 256 MB of memory or higher (on AWS Lambda the CPU performance scales linearly with the amount of memory) and a high timeout time of 90 seconds.

sp7.png


5. Now let's add our enriched Kinesis stream as an event source for the function. From the list of triggers in the Designer configuration up top, choose Kinesis.

sp8.png


Take a look at the Configure triggers section which just appeared below. Choose your Kinesis stream that contains Snowplow enriched events. Set the batch size to your liking - 100 is a reasonable setting. Note that this a maximum batch size, the function can be triggered with fewer records. For the starting position we recommend Trim horizon, which starts processing the stream from an observable start. Click the Add button to finish the trigger configuration. Make sure Enable trigger is selected.

sp9.png

6. Save the changes by clicking the Save button in the top-right part of the page.

Upload the Code

We host a jar file on S3 through Snowplow hosted assets. For example:

s3://snowplow-hosted-assets/relays/indicative/indicative-relay-0.1.0.jar

You will need to use a S3 bucket in the same region as your lambda. The above URL is for the eu-west-1 region. Buckets in other regions have their region names added to the base of the URL, like this:

s3://snowplow-hosted-assets-us-east-1/relays/indicative/indicative-relay-0.1.0.jar
s3://snowplow-hosted-assets-eu-central-1/relays/indicative/indicative-relay-0.1.0.jar

In the Handler textbox paste com.snowplowanalytics.indicative.LambdaHandler::recordHandler

Configure Your Indicative API key

1.  Copy your Indicative API Key

a. If you are a new Indicative user, go to https://app.indicative.com/#/onboarding/snowplow

sp10.png

2. If you want to send data to an existing project, go to https://app.indicative.com/#/account/projects

sp11.png

2. Paste the Indicative API key

You will need to provide the Indicative API Key as an environment variable INDICATIVE_API_KEY.

Add Kinesis Stream as an Event Source

As final step, add your Snowplow enriched Kinesis stream as an event source for the lambda function. You can follow the official AWS tutorial if you are using AWS CLI or do it directly from the AWS Console by choosing Kinesis Stream from the list of triggers.

Validate Your Data

Go to your Indicative project to check if you are receiving data.  You can also go to the debug console to troubleshoot the relay in real time.

sp12.png

 

0 users found this helpful