MadKudu's Data Science Studio called "Springbok" is the proprietary platform that allows users to easily build predictive models and segmentations. This platform was originally built for our internal data scientists to build models for our customers and we are excited to now open this "black box" to external users. The platform is accessible
- in read-only mode for users with the role "user" and "admin"-> they can see how the model is configured, but cannot modify it.
- in edit mode for user with the role "architect"
(Admins can manage users in app.madkudu.com > Settings > Users)
Please note that this access is in beta version and we are working on improving the Data Science Studio based on customer feedback. Don't hesitate to share to firstname.lastname@example.org!
MadKudu’s likelihood to buy (or PQL) models learns from historical patterns to uncover specific behaviors that separate leads who were on the path to conversion from others. MadKudu continuously scores all your active leads based on their behavior (in-app behaviors, marketing & sales interactions…) to determine which are showing a lot of interest or engagement with your product / website/ company.
High level, the platform allows you to
- create a training dataset from your CRM
- understand which events are correlated with their conversion
- adjust the weight and decay of each event in the scoring
- adjust the thresholds between the Likelihood to Buy segment (very high, high, medium, low)
- remove some activities from the signals of the model
- validate the performance of the model on a validation dataset
- preview a sample of scored leads
Now let's get into the details.
The Feature evaluation tab is used to set the weights and decay of the events used in the model. In this page, you can also see which events are performing best in terms of lift to conversion.
To read the "Feature" table:
- Activity type: when building the event mapping we categorize events ("meta events") in different segments (Web Activity, Marketing Activity, Product Usage, Sales Activity, Email Activity...)
- Negative User Activity: when building the event mapping we also define if the event is more of a "negative action" (deleted account, unsubscribe from newsletter, declined invitation ...) than a positive event (showing that the user is engaging with the product / company).
- We would typically assign negative weights to negative user activities
- Meta Event: event performed by a user, mapped from your original event. We use the term "meta" to reflect the layer of mapping that we have done on top of the events you send us, which can just be a renaming but also a grouping of different events.
To understand what is behind those meta events, please go to app.madkudu.com > Mapping > Event mapping
- Factor Loading: weight of the event
- Decay: the decay of the event in days
Historical analysis, based on the people in the training dataset and their activity 3 months before.
- Average for converted: how many times was this event performed on average by a person that converted
- Average for non-converted: how many times was this event performed on average by a person that did not convert
- Did X: how many people performed this event
- If Did X >= 100 we estimate that the sample of people is large enough to derive conclusions (statistically significant)
- If Did X < 100 we estimate that the sample of people is too small to derive conclusions
- Did not do X: how many people did not perform this event
- Did X conversion rate: conversion rate of people who did this event
- Did not do X conversion rate: conversion rate of people who did not do this event
- Lift: Ratio between the conversion rate of people who did the event to the overall average conversion rate of the training dataset
- when lift > 0 it means that someone performing this event is more likely to convert
- when lift < 0 it means that someone performing this event is less likely to convert
- Recall conversions: proportion of conversions that did this event
- Recall non-conversions: Proportion of non-converters that did this event
- Average for converted * factor loading: multiplication of the event weight and the average number of occurrences of this event per conversion. This gives us an idea of how many points would be assigned to someone who usually does this event with that many occurrences
Setting thresholds for the different segments would be done in the Ensembling tab. On this page, you are also able to see the performance of the model on the training dataset.
- Left graph: total population in the training dataset, scored and their distribution displayed by segment: we want to get close to a distribution of ~
- 10% very high
- 20% high
- 30% medium
- 40% low
- Right graph: converters in the training dataset, scored and distributed by their segment. we want to have a distribution of
- very high + high the largest as possible (an "ok" result would be ~ 55%, an really really excellent result would be > 80%), called the "Recall"
→ we want to achieve the ~20/80 rules of "20% of highly active people account for 80% of people who converted"
- the thresholds allow us to adjust this distribution, but mostly the weight of each event are fine tuned to improve the performance of the model and get closer to that 20/80 rule
The second metric to look at is the Precision: is the model identifying correctly the leads who convert at a higher rate than others? Ideally, we want to have at least a 10x difference in conversion rate between the very good and the low. This means the very goods will actually have a higher probability to convert than the lows.
Note: the conversion rates here should not be taken in absolute as the training dataset has been engineered (downsampled) to reach a 20% conversion rate.
- Recall refers to the percentage of total relevant results correctly classified by the model. It's calculated from the number of True Positive divided by (True Positive + False Negative).
- The False Negative here are the leads scored low and medium but who converted anyway.
- Precision refers to the percentage of results that are relevant.
The Validation tab reflects the performance of the model (similar to the charts in Ensembling tab) but on the validation dataset.
A model needs to be validated on a validation dataset that does not have overlaps with the training dataset. For that, we usually take more recent leads than the training dataset and check the performance of the model on this dataset in the Validation tab. The same metrics of Recall and Precision can be extracted from the graph and a model is "valid" if we are still close to 65%+ Recall and 10x Precision.
Every mapped event will appear in the Signals field pushed to your CRM field. However, there are certain events that may not be helpful to show to your Sales team as they evaluate new MQLs, for example: non-user activities.
The Signals tab in the Data Science Studio allows to remove some activity types and events from the Signals displayed to your Sales in your CRM.
Check out a sample of scored leads from the validation dataset in the spot check page. You will find for each email its score, segment, and signals as you would see it in your CRM.
The Lead Grade is the combination of the Fit score and the Likelihood to Buy model.
- The Grade A, B, C, D or E is defined by the matrix configured in the tab
- Within the Grade, the score is weighted between the Fit and the Likelihood to Buy scores
The configuration of this matrix is informed by the analysis above.
- What is the difference between the Likelihood to Buy model and PQL model?
- None, this is just a different naming that we use.
- How the graph in the Feature evaluation sorted?
- First are displayed events with enough statistical significance (events done by at least 100 people the whole population of the training dataset)
- Then less to no significance (done by 10 to 100 people)
- Then no significance (done by less than 10 people)
- How are the “Factor loading” values determined? They don’t seem to perfectly correlate with the Lift factor.
- Short answer: Combination of statistical analysis and business sense
- Long answer:
- The platform applies a formula based on the lift and the average number of occurrence per person of that event to define what should be the score
- But the weights and decays are adjusted to
- improve the performance of the model
- match business context and expectations: If the historical data shows that "requested a demo" does not have the highest lift, we would still want to put a heavy weight to make sure the leads requested a demo are flagged a high /very high
- Why do some factors with negative lift have positive factor loading?
- we would assign negative points to negative user actions
- we assign low weights to positive user events with a low lift to differentiate people who have done nothing versus people who have performed some activities.
- If we were to change either the “factor loading” or “decay” input for any of the fields, is there a way to see what the effect would potentially be?
- You would see the effect on the Ensembling page overall but not at the event level
- Is there a way to see the confidence interval for any of these factors?
- Our proxy to have statistical significance is to look at the “Did X”. If Did X >100 -> we estimate that are looking at a population large enough of people who did this event to derive a conversion rate and a lift.
- Low sample sizes could cause some inaccuracies (ex. Several factors have -1 lift which is an oddly round number)
- Correct, this is why we also manually change some weight.
- -1 usually means that this event was not performed by any person who converted, and only by non-converters
- Why are we getting compute errors? Why is the validation set blank?
- When you get errors, please don't hesitate to share with us a screenshot or Loom video when that happens to help us fix bugs you may be facing.
- Depending on the volume of data to be processed (aka large volume of events), our platform can show delays in displaying results. We are working on improving that.
- Would validation allow us to see what the effect of changes in the model would be prior to implementing a change?
- No, the validation dataset allows to validate the model on a different set of population
- What does “with parameters” mean within the Validation tab?
- The parameters are a reminder of the different thresholds set in the ensembling tab to define the point threshold between what make a lead very high / high / medium /low
- What is I weigh the behavior side more heavily than fit in the Lead Grade
- In the lead grade?
- Note that the Lead Grade is primarily built from the Matrix of segment, and then the score within the segment is adjusted with the formula weighting the fit and the behavior.
- By putting more weight on the behavioral model than the customer fit, you would score higher, within the segment, people more active than more qualified (from a demo/firmo/technographics perspective)
- How do we test the impact of making a change?
- On the home page with the list of model, click on "duplicate" the model.
- on the duplicated model you can load a new validation dataset (without touching the training dataset) and see how the model performs against another validation dataset)
Seeing the effect of changes of the model prior to implementing this change at a more granular level of which lead will change score is a manual process done outside of the Data Science studio today by our Solution Engineering team but that we want to productize very soon.
Any more questions? Feedback? Please send us an email to email@example.com