You'd like to use product usage or CRM data from a source MadKudu does not currently have an integration with? No worries, we can easily set up a transfer using Amazon S3 from Redshift or flat files (JSON or CSV). MadKudu's preferred way is to pull data from your S3 bucket where the data is formatted as described below, and from which MadKudu has access through an IAM role.
- Please refer to this documentation to give MadKudu access to your bucket.
- For transfer from Redshift, please refer to this documentation.
Pre-requisites
- You have access to an AWS account to create/manage an S3 bucket
How to format your data
MadKudu works with 4 types of objects:
- Event: what are users doing?
- Contact: who is the user?
- Account: what accounts my users belong to?
- Opportunity: what deals do I open? (when you are not using Salesforce Opps or HubSpot Deals)
If you send events, please note that MadKudu needs to receive individual events, not aggregations.
Meaning MadKudu needs to receive this:
Event key | Event text | Event timestamps | |
100 | Email click | 1/6/2023 0:00:00 | john@madkudu.com |
101 | Email click | 1/6/2023 0:00:00 | john@madkudu.com |
103 | Email click | 1/8/2023 0:00:00 | john@madkudu.com |
104 | Email click | 1/8/2023 0:00:00 | john@madkudu.com |
105 | Email click | 1/8/2023 0:00:00 | john@madkudu.com |
Instead of this:
Event key | Event text | Event timestamps |
Number of email clicks
|
|
100 | Number of email clicks | 1/6/2023 0:00:00 | john@madkudu.com | 2 |
101 | Number of email clicks | 1/8/2023 0:00:00 | john@madkudu.com | 3 |
Depending on what type of segmentation you will want to build (based on behavioral activity, based on lead/account attributes, or based on both), you may send us data from 1 or more objects.
- If you track behavioral activity in a system that does not integrate with MadKudu and would like to build a behavioral model, you would need to send at least Events. If you plan on having account scoring, please send us your Accounts as well.
- If you have a homemade CRM or CRM which does not integrate with MadKudu, you would need to send at least Contacts and Opportunities for MadKudu to understand who are your contacts and who converts. If you plan on having account scoring, please send us your Accounts as well.
You want to send...
Event
To send behavioral data (product usage, web activity, marketing activity...), create a file named event with the following attributes (with headers included):
Attribute | Format | Example | Description | |
event_key |
required | String | "abc123" | A unique key identifying the event. If you do not have one, we suggest creating a combination of event_text + contact_key + event_timestamp |
event_text |
required | String | “signup”, “login”, “invited a friend” | The action taken by the user. |
event_timestamp |
required | Unix time | “1436172703” | The time at which the event happened |
contact_key |
required | String | "abc123" or "paul@madkudu.com" | The unique identifier of the user who performed the action. This needs to be the same as the contact_key field in the identify file. |
event_* |
optional | String or Numeric | properties describing the event (e.g. event_url for the url of visited page, event_form_title for the title of form submitted...) |
Example in JSON format
{"event_key": "abcd1234", "event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234"}
{"event_key": "abcd2345", "event_text":"visit web page", "event_timestamp":1234567890, "contact_key":"paul@madkudu.com", "event_url":"http://www.domain.com/pricing"}
If you plan on sending event data from 2 or more sources, both event streams should be in the same file.
If you plan to have MadKudu pull your S3 data on a recurring basis, all custom properties columns (event_*
) must be communicated prior to setting up the recurring pull.
Contact
To send enrichment or CRM data on the contact level (CRM, demographic, firmographic traits ...), create a file named contact with the following attributes (with headers included):
Attribute | Format | Example | Description | |
contact_key |
required | String | “abc123”, “paul@madkudu.com” | Unique identifier for the user in your database. It can be the email. It must be the same as used in the Event file if Event file sent as well. |
email |
required | String | "paul@madkudu.com" | Email of the user. Pass it even if the contact_key already contains the email |
created_date |
required | Unix time | 1436172703 | Creation date of the contact (required if MadKudu does not have an integration with your CRM) |
contact_* |
optional | String or Numeric | Enrichment traits you know about the user (examples: contact_title, contact_country, contact_subscription_plan...) |
Example in JSON format
{"contact_key":"abc1234", "email":"paul@madkudu.com"}
{"contact_key":"432535", "email":"paul@madkudu.com", "contact_title":"cto"}
If you plan to have MadKudu pull your S3 data on a recurring basis, all custom enrichment columns (contact_*
) must be communicated prior to setting up the recurring pull.
Account
To send enrichment or CRM data on the account level (CRM, demographic, firmographic traits ...), create a file named account with the following attributes (with headers included):
Attribute | Format | Example | Description | |
account_key |
required | String | "def456", “madkudu.com” | a unique identifier for the account the user belonged to. It can be the domain of the account |
domain |
required | String | "madkudu.com" | Web domain of the account |
created_date |
required | Unix time | 1436172703 | Creation date of the account (required if MadKudu does not have an integration with your CRM) |
conversion_date |
optional | Unix time | 1436172703 | Conversion of the account into paying customer. |
ARR |
optional (highly recommended) | Numeric | $20,000 | Annual Recurring Revenue of the account |
account_* |
optional | String or Numeric | Enrichment attributes you know about the account (examples: account_industry, account_ARR, account_subscription_plan...) |
Our system supports one account per contact. If there are several, we’ll use the latest one. If you have a use case for having a user belonging to several accounts, we’d love to hear about it.
Please submit a support ticket to our support team.Example in JSON format
{"contact_key":"abc1234", "account_key":"madkudu.com", "domain": "madkudu.com", "name": "madkudu"}
{"contact_key":"abc4983", "account_key":"ibm.com", "domain": "ibm.com", "account_ARR":"3000"}
If you plan to have MadKudu pull your S3 data on a recurring basis, all custom enrichment columns (account_*
) must be communicated prior to setting up the recurring pull.
Customer Fit training data
If you are not able to send your Opportunities as described above, MadKudu still needs to understand who converts from your historical data to configure a customer fit model. You can send us a unique flat file extracted from your CRM that tells us who has converted among your leads, with the following fields (with headers included):
Attribute | Format | Example | Description | |
email |
required | String | “paul@madkudu.com” | the unique identifier of the user who performed the action |
target |
required | Boolean | 1 | indicate if the lead converted with your conversion definition (Opp created, Opp stage 2…) |
amount |
required | Numeric | 2,300 | if target =1, amount of the opportunity converted (as defined by the conversion definition). 0 otherwise. |
target_closed_won |
required | Boolean | 1 | indicate if the lead converted into a Closed Won opp (paying customer) |
amount_closed_won |
required | Numeric | 2300 | amount generated from the first closed won opp. |
created_date |
optional | Unix time | created date of the email | |
properties |
optional | String or Numeric | Numeric self-input information at time of lead creation or any other field that you’ve augmented your leads with and want MadKudu to evaluate (example: team_size, industry...) |
Example in JSON format
{"email":"elon@tesla.com", "target":"1", "amount": "2499", "target_closed_won":"0", "amount_closed_won":"0", "created_date": "1234567890" }
{"email":"paul@madkudu.com", "target":"0", "amount": "0", "target_closed_won":"0", "amount_closed_won":"0", "created_date": "1234567810", "team_size":"5"}
If you plan to have MadKudu pull your S3 data on a recurring basis, all custom properties columns (properties
) must be communicated prior to setting up the recurring pull.
Points of attention
All files should have a header. The bracket { } and single quote ' characters are not supported. Make sure to delete any of these before creating your files.
How to format the files
MadKudu currently supports two file formats:
- Newline-delimited JSON (preferred)
- CSV
Newline-delimited JSON
Our preferred format for upload is newline-delimited JSON, which is more standardized and less error-prone than CSV.
In this format, the different records are separated by the newline \n
character. Each line is a valid JSON object:
{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234"}
{"event_text":"added a friend", "event_timestamp":1234567890, "contact_key":"paul@madkudu.com", "some_other_event_field":"some_value"}
{"contact_key":"abc1234", "email":"paul@madkudu.com"}
{"contact_key":"432535", "email":"paul@madkudu.com", "some_other_contact_field":"some value"}
Escape any double quote "
in your data with a \
(e.g. replace "
with \"
) Incorrect
{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234", "key": "val"ue"}
Correct
{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234", "key": "val\"ue"}
CSV
We also support the .csv format, with the recommended format:
- column names (header) in the first line
-
separator:
~
→ separate the value with~
(ex:abc~def~
) Please do not use,
or-
as it easily creates parsing issues -
delimiter:
"
→ this adds quotes around the values (abc -> "abc"
) -
line separator: line-break
\n
Points of attention
- Delimit your values with " "
- Remove all line break characters (for example
\n
) from your fields. - Make sure the number of fields is the same for each line.
- Escape your
"
characters by adding a second"
character in front of it (see here for details)
Incorrect
Values are not delimited by "
abc,cde,ef
Correct
"abc","cde","efg"
Incorrect
The "e is wrongfully formatted. A second " should be added before.
"abc","cd"e","efg"
Correct
"abc","cd""e","efg"
Data validation
JSON line and CSV are relatively easy to corrupt (for example with "
or ,
characters in the data).
We will validate the data on our side and warn you of any corruption issues, but it helps a lot if you follow the format requested above.
Compression
Please note that the maximum size for a single JSON object is 4 MB.
To speed up the data upload part, we highly recommend that you compress your file with GZIP before uploading them to S3.
You can call your file whatever you want it (we recommend event, contact and account). However, please make sure to add the correct extension depending on your file format:
- .json.gz for compressed JSON (recommended)
- .json for uncompressed JSON
- .csv.gz for compressed CSV
- .csv for uncompressed CSV
Whichever format you choose, if you plan on having MadKudu pull your S3 data on a recurring basis, the file format has to remain the same.
How to store your file
We recommend that the files you want to share with MadKudu are in a dedicated folder and that you create an IAM policy and role for MadKudu to access these files.
You will also need to set up a recurring push of your data to this folder for MadKudu to score fresh data. This is done by creating distinct files, as described below.
File naming
In the S3 bucket, please upload data into separate folders by date and by objects
{object}/{year}/{month}/{day} where the objects are
- event
- contact
- account
- opportunity
MadKudu will pull the files on the date from the folder name. Files in a folder containing /2020/11/20/
will be pulled on November the 20th, 2020.
If you use the S3 API, simply “prefix” your destination file name. For example, uploading to "contact/2020/11/20/11:00:00/name_of_file.csv"
will add a file name name_of_file.csv to the contact folder.
Please use this recommended file naming and storing system in the bucket for MadKudu to be able to automatically pull any new file.
s3://bucket_name/object/year/month/day/name_of_file.csv
Compression
To speed up file transfer, you can compress files locally before transferring them to Amazon S3. If you want to compress your files, please use the GZIP compression method and use .gz or .gzip as your file extension (we currently don’t support other methods or other extensions).
Frequency: setting up a recurring push of data to MadKudu
Depending on how your data will be used in the MadKudu platform, we recommend providing fresh data through that bucket
- every 4 hours or at least twice a day if used in a behavioral model
- in one shot if used for a punctual model training or analysis
For setting up a recurring push of data, please upload a new file for each batch of new records, naming the files as described in File naming.
Please open a ticket here if you have a doubt, we'll be happy to assist you.
If you plan on having MadKudu pull your S3 data on a recurring basis, the file folder and the file naming have to remain the same.
Frequently Asked Questions
I'm having an issue with S3 / I don't know how to use S3
Please open a ticket here and we will be happy to assist you.
Your file format doesn’t work for me. What do I do?
If you’re having any issues with the file format, please open a ticket here and we’ll be happy to help.
How often is the data refreshed?
As soon as you drop data into the S3 bucket, expect results to be updated in the Madkudu platform within 6 hours.
What would happen if I send the same event more than once - will it appear twice in MadKudu?
Our system will deduce the events based on contact_key / event_text / timestamp
. If you send the same event twice, only one will be kept:
- If sent in two separate batches, only the most recent will be kept.
- If sent in the same data batch, the first one in the file.
Can we add other attributes to the Contact records?
Yes, please send any attributes you have stored in your user table (except sensitive ones (password, cc number)).
In particular, it is always helpful to get the following:
created_date
lead source
- current plan/value of the plan