Prerequisites
- You have connected Segment to MadKudu.
- You are not a Business plan with Segment
- You have access to an AWS account to create / manage access to an S3 bucket
To export data from a non-Segment Redshift warehouse, please follow these instructions.
How to transfer the data
There are two options to give us access to your historical Segment data:
- Option 1: You are using the Segment S3 destination
- Option 2: You are using the Segment Redshift warehouse
(If you use another type of Segment Warehouse (Postgres, BigQuery), you can still integrate with us. Please follow the instructions to export data to S3)
If you did not enable any of those destinations before integrating with MadKudu, we might not be able to get your historical data immediately. Make sure to connect Segment to MadKudu so we can start accumulating behavioral data.
Option 1: Segment S3 destination
If you've been sending Segment data to a S3 destination in your AWS account, the easiest solution is to grant us access to that bucket. Please follow the instructions on how to share access to a S3 bucket here.
Option 2: Segment Redshift warehouse
If you're using a Redshift warehouse but not an S3 destination, you will first need to "unload" the data from Redshift into an S3 bucket then grant MadKudu access to the bucket with these instructions.
The idea is to use a UNLOAD
statement in Redshift to transfer the data to S3, from where we can read the data it.
If you're using your bucket, use the following UNLOAD where the credentials are the ones you maintain internally.
UNLOAD ("SELECT ... FROM [your_table]")
TO 's3://your_bucket/your_folder/project/track.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
Object: Track
For unloading Track data, there are two options:
- If the event properties are not relevant to share with MadKudu, just UNLOAD the track table from Redshift to S3:
UNLOAD ("SELECT id, anonymous_id, user_id, event, event_text, sent_at FROM [source].tracks")
TO 's3://madkudu-data-in-[your_org_id]/[source]/track.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
- If the event properties are relevant (path, url ...) , you'll need to do an UNLOAD per event table:
UNLOAD ("SELECT id, anonymous_id, user_id, event, event_text, sent_at, [properties] FROM [source].[event]")
TO 's3://madkudu-data-in-[your_org_id]/[source]/[event].csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
where [properties] is the list of properties you want to send us, separated by commas.
For example: SELECT id, ..., product_name, referrer FROM <source>.added_to_cart
Object: Identify
Use the following UNLOAD function to send Identify data from Redshift to S3 where [traits] is the list of properties you want to send us, separated by commas.
UNLOAD ("SELECT id, anonymous_id, user_id, sent_at, [traits] FROM <source>.identifies")
TO 's3://madkudu-data-in-[your_org_id]/[source]/identify.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
Object: Group
Use the following UNLOAD function to send Group data from Redshift to S3 where [traits] is the list of properties you want to send us, separated by commas.
UNLOAD ("SELECT id, anonymous_id, user_id, group_id, sent_at, [traits] FROM <source>.groups")
TO 's3://madkudu-data-in-[your_org_id]/[source]/page.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
Object: Page [optional]
Pages can be quite voluminous. Only send them to us if your page tracking contains important information that is not included in track calls. For [properties], we usually recommend at least title
and url
.
UNLOAD ("SELECT id, anonymous_id, user_id, group_id, sent_at, [properties] FROM <source>.<event>")
TO 's3://madkudu-data-in-[your_org_id]/[source]/identify.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;