You and your team may have questions about how to set up an Amazon S3 integration with MadKudu, the data format, the security controls etc. If you don't find answers to your questions, please reach out to firstname.lastname@example.org and we would be happy to assist.
You can find the main documentations on the S3 integration here:
Q: How does the S3 integration work with MadKudu?
Amazon S3 is a storage service that MadKudu uses to transfer data from data warehouses like Redshift, BigQuery, Snowflake, or from your systems with which we don't have a direct integration.
You would transfer data to an S3 bucket from which MadKudu would be pulling this data to use for scoring purposes.
Q: Should I host the S3 bucket or does MadKudu have one?
We recommend that you host the S3 bucket on your side for control and security reasons and provide MadKudu access to your S3 bucket with an IAM role (see How to create an S3 bucket and give MadKudu access ).
Q: What are the requirements to set up an S3 integration with MadKudu ?
You will need an AWS account and to ask a quick favor from someone, who is very often a data engineer ;), that can stream data from your source to the S3 bucket.
Q: What type of data should I send MadKudu?
MadKudu works with 4 types of objects
- Event: what are users doing?
- Contact: who is the user?
- Account: what accounts do my users belong to?
- Opportunity: what deals do I open? (when you are not using Salesforce Opps or HubSpot Deals)
Depending on what type of segmentation you will want to build with MadKudu (based on behavioral activity, based on lead/account attributes or based on both), you may send us data of 1 or more objects.
Behavioral segmentation (Likelihood to Buy)
If you track behavioral activity in a system that does not integrate with MadKudu and would like to build a behavioral segmentation, you would need to send at least Events. If you plan on having account scoring, please send us your Accounts as well.
What Events specifically? You can refer to this article to know What type of events can be used in a behavioral segmentation
Firmographic segmentation (Customer Fit)
If you have a homemade CRM or CRM which does not integrate with MadKudu, you would need to send at least Contacts and Opportunities for MadKudu to understand who are your contacts and who converts. If you plan on having account scoring, please send us your Accounts as well.
Q: Should the data be transformed before sending it to S3?
Yes, we recommend the instructions in How to format data and file in the S3 bucket to format the data from your system or data warehouse to the S3 bucket.
Q: How should the data be transferred?
We would need 9 months of historical data to train the predictive models plus fresh data every ~4 to 12 hours.
The data should be uploaded as JSON or CSV files in the S3 bucket:
- either each file contains only the most recent data, then load each files separately
- either the file contains all the data with any recent data, then you can replace the existing file at each upload
The idea is to have in the bucket fresh and historical data at all times, not just the most recent.
Q: How fast should the data be loaded?
Q: What is the volume of data to be loaded?
Q: Is a historical upload of data required?
Q: For which time period should the data be loaded (specific month, week, year, etc.)?
If sending events:
- 1 record = 1 event with timestamp and user id/email
- The data should be sent as events with timestamps, with the oldest timestamp 9 months back.
If sending contacts or accounts with enrichment attributes:
- 1 record = 1 contact or account with its creation date and any attributes (job title, industry, revenue ...)
- to train a model we would need contacts or accounts created in the last 9 months
Q: Should the data be transferred on a periodic schedule?
Yes, at least once a day is the recommended frequency to provide MadKudu with fresh data. Historical data can be loaded only once with a fixed timeframe of 9 months.
Q: Can you pull data directly from our Snowflake?
Q: Can you pull data directly from our BigQuery / GCS?
Q: Can you push the scores to an S3 bucket for us to load into Snowflake, BigQuery, etc.?
Not today, but this is something we could develop depending on the demand. Depending on where you would need to consume those scores, we may not need to go through an S3 bucket but rather one of the existing solutions:
- We can push scores to Salesforce, HubSpot, Marketo, Segment and Kissmetrics and some other CRMs.
- You can use our API or a webhook to score leads to sync between your data warehouse and your CRM (note that the API only includes Customer Fit scores and not yet Likelihood to Buy scores).