CART is one of the decision tree algorithms and it uses GINI index based impurity measure. A detailed explanation on CART algorithm - CART Decision Tree: Gini Index Explained

### Binary Target Variable: Worked out Example

Consider an example where target variable is binary, the summary table for such example will be similar to below table.

Gini Index = 1 - 0.74^{2 }- 0.26^{2}

^{ } = 1-0.5476 -0.0676

= 0.3848

### Nominal Target Variable: Worked out Example

When target variable is nominal variable with different levels, we can calculate Gini Index in a similar way. When target variable is nominal the summary table looks similar to the below table. In India, there are number of different cuisines available and people have different food preferences. We have considered a few options and the table below shows proportion of people with their food preferences. This is an illustrative example.

Gini Index = 1 – 0.08^{2} - 0.13^{2} - 0.29^{2} -0.5^{2}

= 1 - 0.006 - 0.018 - 0.083 - 0.250

_{ }= 0.643

**GINI of a split**

GINI (s,t) = GINI (t) – P_{L} GINI (t_{L}) – P_{R} GINI (t_{R})

Where

s : split

t : node

GINI (t) : Gini Index of input node t

P_{L} : Proportion of observation in Left Node after split, s

GINI (t_{L}) : Gini of Left Node after split, s

P_{R} : Proportion of observation in Right Node after split, s

GINI (t_{R}) : Gini of Right Node after split, s

**Example**

Example, banks and financial institutions grant credit facility after evaluating credit risk involved. Credit risk involved in credit decisions is evaluated using Credit Scorecard [Credit Score: What is it and how is it developed?]. Also, there are a few additional decisions involved in credit underwriting [Credit Underwriting: Minimize credit risk losses using Data Science and Analytics].

Decision Tree: Non Technical Explanation

The last 2 years of customer performance on meeting credit obligations is available with us. We want to understand the variable(s) explains high risk of customers who defaulted on a credit facility given to them.

The sample has 24 customers. And for making it simple, only customer age and gender are considered. Age is a continuous variable and Gender is nominal variable.

Input sample has 12 customers who have defaulted on the credit facility. So, default rate is 50%.

We have an example in which input node, parent node, has equal number of Target variable values- “Yes” and “No”. Overall number of observations are 24.

Gender variable is considered to split the node. Gini Split value is calculated as below.

Gini index for this node will be

= 1- (1/2)^{2}–(1/2)^{2}

= 1- 0.25 -0.25

= 0.5

Now we want to split the code based on Gender Variable. After the split we will have following summary.

Now, let’s calculate GINI index of the split using Gender variable.

GINI (s,t) = GINI (t) – P_{L} GINI (t_{L}) – P_{R} GINI (t_{R})

GINI (t_{L}) = 1- (6/8)^{ 2} – (2/8)^{2}

= 1- 0.5625 -0.0625

= 0.375

GINI (t_{R}) = 1- (6/16)^{ 2} – (10/16)^{2}

= 1 - 0.140625 -0.390625

= 0.469

GINI (s,t) = 0.5 – (8/24)*0.375 – (16/24)*0.469

= 0.5 - 0.125 -0.313

= 0.0625

Similarly, we need to find GINI index value for all the split points and select the best split for a variable. Also the best split points are calculated for all the variables. The best variable and the split is selected to split the input node.

Please can you send to me a sample exemple using CART Algorithm for a classification tree and another about regression tree

There are a long list of examples

Classification Tree:

1: Approve a credit card applications or not

2: If a customer will take up a product or not

3: If a transaction is fraudulent or not

4: If a customer will close a telephone connection

Regression Tree

1: Estimating Spend Value

2: Estimating house/car price