The Univariate Analysis section in the Data Science Studio analyzes what is the relevance of each trait on the conversion to help you understand
- your core Ideal Customer Profile (ICP), niche ICP and anti ICP
- which computations and traits to use in the scoring model or in overrides
Looking at this section allows you to answer questions like: In which industry do we perform best? what is the DNA of our customers? are Australian leads really never converting like my sales reps are telling me?
You can analyze each trait on the whole training dataset which is a representation of your leads in your CRM, or on a subset of populations by going through the trees.
Use the Search bar to find a computation out of the 100+ available either out of the box or that you created (e.g. Company Size, Funding, Role, Company Country etc), or just explore by scrolling down. To see the definition of each computation, go to Customer Profile at the top of the page then the Computations tab.
How to read the univariate analysis graph?
The left hand side bar represent the overall population of leads (in the training dataset) by the different values of the computation. In the image below, we are looking at the computation Company Size, showing that Enterprise companies (1000+ employees)(in green) represent 18% of the leads.
The right hand side bar represent the population of leads who converted (in the training dataset). In the example below, Enterprise companies represent 23% of the conversions.
Because Enterprise companies represent a larger portion of conversions than they represent of the population of leads, it means the segment of Enterprise companies is converting at a higher rate than the average (of the training dataset).
=> Size = Enterprise is a trait with a positive impact on conversion.
On the opposite side, the population of vSMB (1-10 employees) companies represent 11% of your leads but only 2% of your conversions, meaning they are converting at a lower rate than the average. => Size = vSMB is a trait with a negative impact on conversion.
How to understand your ICP?
- Core - your bread and butter traits, represent the majority of your leads and typically convert like the average or better pretty well.
- A core trait is probably not an eye-opener to you as this is most of your conversions. This is what you expect, it is predictable. This is what you want the model to predict - those who convert predictably based on the traits of those who have historically converted.
- Niche - leads with these traits convert well, you just don't have that many
An anti trait is probably not an eye-opener either to you - as this is probably who you already do not target - whilst there may be some conversions, it's an even bigger time consumer.
Who to avoid? It can be predictable - though, not everyone has the ability(or time, or energy or tools) to drill down into the numbers this way, this type of analysis may reveal some new insights. And likewise, you want the model to be able to predict which leads will not be the best use of your time(and energy and ability).
- Anti - traits that are indicative of someone who don't convert
A niche trait is when there is a small population of leads with this trait, but comparatively, they convert pretty well. When you see a niche ICP, ask yourself, can you generate more leads of that type? we may have just overlooked them so far.
How to read the Lift graph?
What is the Lift?
Let's start indeed by a quick definition.
Lift = the difference when comparing one category's conversion rate with that of the average conversion of the entire population.
It's a way to measure what we were saying in the first part of this article about "if a population represent more conversions that represent leads then it has a positive impact" The beginning of this sentence it the same as "if lift is positive then ..."
How to read the lift graph?
- when lift > 0 it means that a lead with this trait is more likely to convert
- when lift < 0 it means that a lead with this trait is less likely to convert
- when lift = 0 it means that the lead is as likely to convert as any other lead with or without this trait
The lift graph should not be read as an aboslute measure but need to be compared to the volume of population we are looking at. Yes we are talking about statistical significance. Here is why: the lift is based on conversion rates, and conversion rates depends on volumes. The smaller the volume, the more fluctuante may be the conversion rate and therefore lift.
Say you have
- 3 conversions from Australia out of 5 leads from Australia -> 60% conversion rate while the conversion rate of the training dataset is at 20%, this mean lift of (country = australia) = 2
- 300 conversions from the US out of 1,000 from the US -> 30% conversion rate -> lift of (country = united states) = 0.5
you'd say lift = 2 is much higher than lift = 0.5 so let's bet all our $10M marketing budget on Australia instead of the US... hmm but wait a minute, we are talking about 5 Australian leads, this is not statistically significant to really conclude australian leads are more likely to convert, these could just be outliers. However on the observation of 1,000 US lead we can say that a lift of 0.5 definitely means US leads convert better than the average.
This is why we display the next table to look into this "statistical significance".
How to read the univariate analysis table?
This table essentially shows the data where the two graphs come from and the volume of leads and conversions in absolute (being the percentages and lift).
- Population: number of leads (in the training set) where company size = [value]
- Conversion: number of conversion (in the training set) where company size = [value]
- Conversion rate: conversion rate of leads where company size = [value]
- Lift factor: see this section
- Lift 7 factor: it should be read "lift factor of non converters". Same as the lift described above but looking at the people without the trait.
- % population: number of leads with the trait out of the total population. The sum of the colum is 100
- % conversion: number of conversions with the trait out of the total population. The sum of the colum is 100
How to see these graphs but filtered on a specific population?
- You understand how the Decision trees work
The Univariate Analysis tab does not allow to add filters. However, if you would like to look into the univariate analysis for a specific population, you would need to isolate this population. And you can do that with the Trees and look at the univariate analysis of each node.
For example: you'd like to knnow what company sizes perform the best but within the leads from US only, and not the whole world.
- Go to the Tree tab
- Create the following tree (on your test model, but not your live model)
- Split condition node 1: is_personal = 0
- Split condition node 2: company_country = United States
- Go to node 3
- Click on See univariate analysis for this node
How can I see this analysis on a training set more recent?
Just follow this guide about uploading a new training dataset