Pre-requisites
- You have access to the data studio
- You have the permissions of the Architect or Admin role
- You know what a computation is
- You know how to read the insights page
- You know how to read a tree
How to create a tree from scratch
How to start: the first split
In the Studio, access the tree section of the model you want to edit.
Node number 1 is the starting node. It contains the whole population of your leads (that made it into your training dataset).
The goal is to divide this node in 2 subpopulations with very different average conversion rates. For this, you'll have to create a splitting condition using a computation.
An easy way to find a relevant and discriminating computations is by looking at the insights of this node. Click on Node 1 and then click 'View Insights for this node'.
Toggle 'Include unknown values' and 'Include personal email' on in order to see the Insights for all of your leads. Then, sort computations by relevance. This will help you save time when searching for a discriminating computation to use in your splitting condition to split your node in two.
Now it's time to look for a computation to use in your splitting condition!
Look for a computation that divide your population in 2 having at least 100 leads in each category (having a subpopulation smaller than 100 leads will lead to an overfitted model). In order to have a population large enough, you may group together several categories:
For example : Industry = 'Communications Equipment' isolates 60 leads, 'Electronic Equipment, Instruments & Components' isolates 50 leads and 'Technology Hardware, Storage & Peripherals' isolates 20 leads. > Creating a splitting condition Industry is any of Communications Equipment, Electronic Equipment, Instruments & Components, Technology Hardware, Storage & Peripherals isolates 130 leads!
We recommend you to do this when the categories are related. For example: different countries but in the same geographical region, different countries with similar GDP per capita, different industries grouped under the same sector. It amounts to looking at your leads' categorization at a higher level.
A discriminating computation would also:
- Have most converters in one category or categories group
- Or have very different conversion rates between categories
- Or have a lift > 1
Visually in the bar charts, it looks like the population is distributed very differently between the population side and the conversion side.
Have a look at which computations we recommend you to use!
Once you have chosen which computation to use for your splitting condition, go back to your tree, click the node you want to split (Node 1), and click 'Split this node'.
A window appear where you can write the logic of your condition:
- In the search bar, type in the API name of the computation you chose. Example: Let's say we split Node 1 on 'Is personal', type 'is_personal' and click on it when it appears in the list.
- Then select the logic operator (1) and fill in the value (2). Example: if we want to select business emails, let's write is_personal is 0
- Add as many conditions as you need and write the condition logic (using AND/OR) in the bar at the top of the window, using parenthesis if needed (5)
- When you're finished, click on 'Save conditions' (6)
Congratulations! You made the first split in your tree!
You can see the 2 resulting sub-nodes.
Click on both sub-nodes and check their statistics to validate your assumptions and see the impact of the traits used in the splitting condition on the conversion rate.
How to iterate
Now the goal is to follow the same method to split the 2 sub-nodes (Nodes 2 and 3) you just created.
- Click on node 2
- Click 'View insights for this node'
- Toggle 'Include unknown values' and 'Include personal email' on
- Look for a discriminating computation
- Go back to your tree
- Click 'Split this node'
- Create your splitting condition
- Save it and check the statistics of both resulting sub-nodes
You can keep splitting nodes this way on all branches of the tree if they have more than 200 leads.
When to stop?
- Don't split nodes with less than 200 leads, that would result in end nodes with less than 100 leads.
- Don't split nodes with 0 conversions. That would be useless because it will result in a conversion rate of 0% in both resulting end nodes, which would both get the lowest score anyway.
- If you don't find any discriminating computation for a subpopulation of an end node, it means all these leads have a similar conversion rate and it is fine to leave them together.
- When you're happy with the performance. Learn more about tree performance
How to edit an existing tree
You might not need to build a brand new tree, but just to edit a pre-configured tree or update an existing tree for a new model.
Go deeper into a tree
You can add splitting conditions on end nodes that contain more than 200 leads. Learn more about overfitting.
To do so, click on a node suitable to be split in 2 and follow the process described above:
- Click on the node to split
- Click 'View insights for this node'
- Toggle 'Include unknown values' and 'Include personal email' on
- Look for a discriminating computation
- Go back to your tree
- Click 'Split this node'
- Create your splitting condition
- Save it and check the statistics of both resulting sub-nodes
- When to stop splitting?
If you wish to use more computations that you know or suspect to have a significant impact on conversion rate, but you can't go deeper in the decision tree without overfitting the model, you can create a new tree from scratch, using any empty slot you might have in the tabs 'Tree 1', 'Tree 2', 'Tree 3' above the tree visualization.
Learn more about multiple trees models.
Change the splitting conditions
When editing an existing tree, if you decide to change the condition used to split a node, simply click on this node and then click on 'Replace' on the right.
The window to create a splitting condition will appear. Write the logic of your conditions and click 'Save conditions'.
Best practices
How to start your tree
Computations can be about a company (firmographic trait) or a person (demographic trait). Learn more about which computations are available on the platform.
That is why we recommend you to first divide your tree between personal emails and business emails.
This way, you can explore the personal emails on one branch of the tree with demographic computations, and explore the business emails on the other branch of the tree using firmographic and technographic computations.
Computations to use in your tree
We recommend that you use traits that make business sense to you. The subpopulation of your leads that are regrouped in the same end node share a number a similar traits, making them a persona that is defined by their end node definition.
Node 7 contains leads that are business emails from companies between 100 and 1000 employees in the Internet Software & Services industry.
The goal is that these personas also make sense for your Sales team.
In order to achieve this, what are the first computations to look at?
- Firmographic : Size, Industry, Country, Alexa Global Bucket, Predicted Revenue Segment, Has Raised Capital, Company Type, Maturity Range, Tags, Tag Is B2b, Tag Is B2c, or Tech Cnt Bucket, Predictleads Is Hiring
- Demographic : Is Personal, Is Spam, Is Student, Is Personal Biz Email, Pers Has Full Name, Pers Country, Pers Has title, Pers Has Linkedin, Domain Tld (that can be a proxy for detecting a lead's country)
Some other computations might only be relevant depending on your business:
- Computations detecting certain technologies : how do these technologies relate to your business? Would it make sense to identify leads with a statistically better conversion rate than average who are detected to use a certain email service provider, when this has not much to do with your business?
- Computations detecting if a company is hiring for certain roles: how do these roles relate to your business? Would it make sense to identify leads with a statistically better conversion rate than average who work for a company that is detected to hire HR roles, when this has not much to do with your business?
If one of these less straightforward computations comes up first in the Insights when you sort computations by relevance, take the time to think about what these statistics mean in the context of your business!
Performance of your tree
In order to understand the impact of your edits on the performance of your tree (i.e. its ability to distinguish converters from non-converters), check the AUC curve at the bottom of the page (hover over it to display more information).
Learn how to read the AUC curve here
Assembling multiple trees
lf your model is made of several trees, the next step is now to adjust the thresholds!