CART Algorithm for Decision Tree

Classification and Regression Tree (CART) is one of commonly used Decision Tree algorithms. In this post, we will explained the steps of CART algorithm using an example data.

Decision Tree is a recursive partitioning approach and CART split each of the input node into two child nodes, so CART decision tree is Binary Decision Tree.   At each level of decision tree, the algorithm identify a condition - which variable and level to be used for splitting input node (data sample) into two child nodes.

CART Algorithm Steps

Decision Tree building algorithm involves a few simple steps and these are:

1. Take Labelled Input data - with a Target Variable and a list of Independent Variables
2. Best Split: Find Best Split for each of the independent variables
3. Best Variable: Select the Best Variable for the split
4. Split the input data into Left and Right Nodes
5. Continue step 2-4 on each of the nodes until meet stopping criteria
6. Decision Tree Pruning : Steps to prune Decision Tree built

In the previous blog, we have explained the Gini Index calculation.

Data

We have considered a sample data for explaining the Decision Tree building process using CART algorithm. The data has a few variables and below is description of these variables.

Data Sample

Parent Node has below split of Target Variable.

Best Split for a Variable

We are considering Last Month Spend, Gender, Education and Last 3 month average spend for the decision tree building.  Let's first consider variable "Last Month Spend" for finding out the "Best Split Cut Off" for this variables.

List of potential cut values have to be identified first. One of the common approach is to find splits /cut off point is to take middle values. For this variable, the distinct values and then finding middle points.

Below is an example of find unique values and then taking average value of two adjacent distinct values  to find cut off point.

For each of these "Cut off Point" , we need to calculate Gini Index and Gini Index for a Split. The Gini Index Calculations are explained with worked out example - here.

Now we need to find cut off which has the highest Gini Gain(highest drop in Gini Index for a split or highest GiniGain) and pick up the split point as "Best Split Point" for the variable "Last Month Spend".

Similarly, we need to find the best split for each of the input independent variables.

On the next page, we will explain steps to find best split for a character /categorical variable.

2 thoughts on “CART Algorithm for Decision Tree”

1. opensam

"Now we need to find cut off which has the highest Gini Index and pick up the split point as "Best Split Point" for the variable "Last Month Spend"."

Shouldn't you chose the split with lowest GINI index?

• Thank you Shubham. I have updated the language. It is highest Gini Gain to select the best split and variable.