MadKudu's Data Science Studio, called "Springbok", allows users to easily build and edit predictive models and segmentations. This platform was originally built for our internal data scientists to build models for our customers and we are excited to now open this "black box" to external users. The platform is accessible:
- in read-only mode for users with the role "User"-> they can see how the model is configured, but cannot modify it.
- in edit mode for users with the role Architect and Admin (learn more about user roles and permissions)
Please note that this access is in beta version and we are working on improving the Data Science Studio based on customer feedback. Don't hesitate to share to email@example.com!
MadKudu’s likelihood to buy (or PQL) models learn from historical patterns to uncover specific behaviors that separate leads who were on the path to conversion from others. MadKudu continuously scores all of your active leads based on their behavior (in-app behaviors, marketing & sales interactions, etc.) to determine which ones are showing a lot of interest or engagement with your product/website/company.
At a high level, the platform allows you to:
- create a training dataset from your CRM
- understand which events are correlated with conversion
- adjust the weights and decay of each event in the scoring
- adjust the thresholds between the Likelihood to Buy segment (very high, high, medium, low)
- remove some activities from the signals of the model
- validate the performance of the model on a validation dataset
- preview a sample of scored leads
Now let's get into the details.
Before we start
If you are not familiar with the concept of the Likelihood to Buy model or the different models offered by MadKudu we recommend reading this article.
How to access your Likelihood to Buy model?
Either from your account in app.madkudu.com > Predictions, click on View model in the Likelihood to Buy section.
- Or from the home page of the Data Studio springbok.madkudu.com, you will find the list of models live in your CRM or just draft models.
Likelihood to Buy models are identified with the model type "pql"
To view the model, click on Open
Within the model page,
- the Customer Profile section allows you to access the list of aggregations available to include in the Likelihood to Buy model.
- the Data Science Studio allows you to see the relevance of each event, as well as how the model is built.
Data Science Studio
Below is a description of what each section is used for.
In the Data page, you can see the size and conversion rate of the training set and validation set uploaded to train and validate the LTB model. A training dataset is essentially a table with 2 columns
- email: the email of the lead
- target: 1 if the lead has converted, 0 otherwise.
A training set is usually created to obtain at least 10- 20% of conversion rate to avoid a class imbalance problem.
The Feature evaluation tab is used to configure the importance(weight of each event) and the Lifespan of the events to use in the model. In this page, you can also see which events are performing best in terms of lift to conversion. How to read the Feature evaluation graph and tables.
MadKudu does not work with default scoring rules but instead automatically suggests custom Likelihood to Buy scoring rules based on the analysis of the behavior of your past conversions.
However, we may want to customize the weight of some events to improve the performance of the model and to suit your business need. See how to edit your Likelihood to Buy scoring model.
Setting thresholds for the different segments would be done in the Ensembling tab. On this page, you are also able to see the performance of the model on the training dataset.
- Left graph: total population in the training dataset, scored and their distribution displayed by segment. We want to get close to a distribution of:
- ~10% very high
- ~20% high
- ~30% medium
- ~40% low
- Right graph: converters in the training dataset, scored and distributed by their segment. we want to have a distribution of:
- very high + high the largest as possible (an "ok" result would be ~ 55%, a really really excellent result would be > 80%), called the "Recall".
→ we want to achieve the ~20/80 rules of "20% of highly active people account for 80% of people who converted"
- the thresholds allow us to adjust this distribution, but mostly the weight of each event are fine tuned to improve the performance of the model and get closer to that 20/80 rule
The second metric to look at is the Precision: is the model identifying correctly the leads who convert at a higher rate than others? Ideally, we want to have at least a 10x difference in conversion rate between the very good and the low. This means the very goods will actually have a higher probability to convert than the lows.
Note: the conversion rates here should not be taken in absolute as the training dataset has been engineered (downsampled) to reach a 20% conversion rate.
- Recall refers to the percentage of total relevant results correctly classified by the model. It's calculated from the number of True Positive divided by (True Positive + False Negative).
- The False Negative here are the leads scored low and medium but who converted anyway.
- Precision refers to the percentage of results that are relevant.
The Validation tab reflects the performance of the model (similar to the charts in Ensembling tab) but on the validation dataset.
A model needs to be validated on a validation dataset that does not have overlap with the training dataset. For that, we usually take more recent leads than the training dataset and check the performance of the model on this dataset in the Validation tab. The same metrics of Recall and Precision can be extracted from the graph and a model is "valid" if we are still close to 65%+ Recall and 10x Precision.
Every mapped event will appear in the Signals field pushed to your CRM field. However, there are certain events that may not be helpful to show to your Sales team as they evaluate new MQLs, for example: non-user activities.
The Signals tab in the Data Science Studio allows to remove some activity types and events from the Signals displayed to your Sales in your CRM.
Check out a sample of scored leads from the validation dataset in the spot check page. You will find for each email its
- score (normalized between 0 and 100)
- signals (last activities)
on the date of the validation dataset. What does this mean? To validate the model we "simulate" that we are on a certain date, say Oct, 1st 2021, and look at who are the people who were active and then converted after that date, versus people who were active but didn't convert.
Therefore the score, segment and signals in the Spot check would correspond to the one for this person on Oct 1st, 2021, and not today.
To know what date this corresponds to, check out the date parameter of the validation dataset, by clikcling on "View validation dataset"
The Lead Grade is the combination of the Fit score and the Likelihood to Buy model.
- The Grade A, B, C, D or E is defined by the matrix configured in the tab
- Within the Grade, the score is weighted between the Fit and the Likelihood to Buy scores
The configuration of this matrix is informed by the historical analysis of conversions.
- What is the difference between the Likelihood to Buy model and PQL model?
- None, this is just a different naming that we use.
- How the graph in the Feature evaluation sorted?
- First are displayed events with enough statistical significance (events done by at least 100 people the whole population of the training dataset)
- Then less to no significance (done by 10 to 100 people)
- Then no significance (done by less than 10 people)
- How are the “Factor loading” values determined? They don’t seem to perfectly correlate with the Lift factor.
- Short answer: Combination of statistical analysis and business sense
- Long answer:
- The platform applies a formula based on the lift and the average number of occurrence per person of that event to define what should be the score
- But the weights and decays are adjusted to
- improve the performance of the model
- match business context and expectations: If the historical data shows that "requested a demo" does not have the highest lift, we would still want to put a heavy weight to make sure the leads requested a demo are flagged a high /very high
- Why do some factors with negative lift have positive factor loading?
- we would assign negative points to negative user actions
- we assign low weights to positive user events with a low lift to differentiate people who have done nothing versus people who have performed some activities.
- If we were to change either the “factor loading” or “decay” input for any of the fields, is there a way to see what the effect would potentially be?
- You would see the effect on the Ensembling page overall but not at the event level
- Is there a way to see the confidence interval for any of these factors?
- Our proxy to have statistical significance is to look at the “Did X”. If Did X >100 -> we estimate that are looking at a population large enough of people who did this event to derive a conversion rate and a lift.
- Low sample sizes could cause some inaccuracies (ex. Several factors have -1 lift which is an oddly round number)
- Correct, this is why we also manually change some weight.
- -1 usually means that this event was not performed by any person who converted, and only by non-converters
- Why are we getting compute errors? Why is the validation set blank?
- When you get errors, please don't hesitate to share with us a screenshot or Loom video when that happens to help us fix bugs you may be facing.
- Depending on the volume of data to be processed (aka large volume of events), our platform can show delays in displaying results. We are working on improving that.
- Would validation allow us to see what the effect of changes in the model would be prior to implementing a change?
- No, the validation dataset allows to validate the model on a different set of population
- What does “with parameters” mean within the Validation tab?
- The parameters are a reminder of the different thresholds set in the ensembling tab to define the point threshold between what make a lead very high / high / medium /low
- What is I weigh the behavior side more heavily than fit in the Lead Grade
- In the lead grade?
- Note that the Lead Grade is primarily built from the Matrix of segment, and then the score within the segment is adjusted with the formula weighting the fit and the behavior.
- By putting more weight on the behavioral model than the customer fit, you would score higher, within the segment, people more active than more qualified (from a demo/firmo/technographics perspective)
- How do we test the impact of making a change?
- On the home page with the list of model, click on "duplicate" the model.
- on the duplicated model you can load a new validation dataset (without touching the training dataset) and see how the model performs against another validation dataset)
Seeing the effect of changes of the model prior to implementing this change at a more granular level of which lead will change score is a manual process done outside of the Data Science studio today by our Solution Engineering team but that we want to productize very soon.
Any more questions? Feedback? Please send us an email to firstname.lastname@example.org