Objective of this article
This article describes how you can send us a one-off drop of historical data from Segment, if you're not on the Segment's Business Plan.
If you are on the Business plan or have not connected Segment yet, please refer to the documentation of our Segment integration.
To export data from a non-Segment Redshift warehouse, please follow these instructions.
How to transfer the data
There are two options to give us access to your historical Segment data:
- Option 1: Your are using the Segment S3 destination
- Option 2: You are using the Segment Redshift warehouse
(If you use another type of Segment Warehouse (Postgres, BigQuery), you can still integrate with us. Please follow the instructions to export data to S3)
If you did not enable any of those destinations before integrating with MadKudu, we might not be able to get your historical data immediately. Make sure to connect Segment to MadKudu so we can start accumulating behavioral data.
Option 1: Segment S3 destination
If you've been sending Segment data to a S3 destination in your AWS account, the easiest solution is to grant us access to that bucket.
Option 2: Segment Redshift warehouse
If you're using a warehouse but not an S3 destination, you will first need to "unload" the data from Redshift into an S3 bucket.
For the bucket, there are two options:
- You host the bucket yourself, in which case, follow these instructions to grant us access.
- We create a bucket for you, in which case the credentials will be available at app.madkudu.com > Integrations > S3
Regardless of the option, you choose, the idea is then to use an UNLOAD
statement in Redshift to transfer the data to S3, from where we can reload it.
If you're using your bucket:
UNLOAD ("SELECT ... FROM [your_table]")
TO 's3://your_bucket/your_folder/project/track.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
(where the credentials are the ones you maintain internally)
If you're using our bucket:
UNLOAD ("SELECT ... FROM [your_table]")
TO 's3://madkudu-data-in-[your_org_id]/project/track.csv'
CREDENTIALS 'aws_access_key_id=[our_access_key];aws_secret_access_key=[our_secret_key]'
Tracks
For unloading track data, there are also two options:
- The event properties, are not important, in which case you can just UNLOAD the track table
UNLOAD ("SELECT id, anonymous_id, user_id, event, event_text, sent_at FROM [source].tracks")
TO 's3://madkudu-data-in-[your_org_id]/[source]/track.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
2. The event properties are important, in which case, you need to do an UNLOAD per event table:
UNLOAD ("SELECT id, anonymous_id, user_id, event, event_text, sent_at, [properties] FROM [source].[event]")
TO 's3://madkudu-data-in-[your_org_id]/[source]/[event].csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
where [properties] is the list of properties you want to send us, separated by commas.
For example: SELECT id, ..., product_name, referrer FROM <source>.added_to_cart
Identifies
UNLOAD ("SELECT id, anonymous_id, user_id, sent_at, [traits] FROM <source>.identifies")
TO 's3://madkudu-data-in-[your_org_id]/[source]/identify.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
where [traits] is the list of properties you want to send us, separated by commas.
Groups
UNLOAD ("SELECT id, anonymous_id, user_id, group_id, sent_at, [traits] FROM <source>.groups")
TO 's3://madkudu-data-in-[your_org_id]/[source]/page.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
where [traits] is the list of properties you want to send us, separated by commas.
Pages [optional]
Pages can be quite voluminous. Only send them to us if your page tracking contains important information that is not included in track calls.
UNLOAD ("SELECT id, anonymous_id, user_id, group_id, sent_at, [properties] FROM <source>.<event>")
TO 's3://madkudu-data-in-[your_org_id]/[source]/identify.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
For [properties], we usually recommend at least title
and url
.
Comments
0 comments
Please sign in to leave a comment.