A decision tree is a hierarchical or tree like representation of decisions. Decision Tree is a technique to iteratively break input data (or node) into two or more data samples (or nodes). And this recursive partitioning of input data (or node) continue until it meets specified condition(s).

**How decision tree is built?**

There are different impurity measures are used in a Decision Tree building. Top 3 impurity measures are

- Logworth: CHAID based Decision Tree
- Information Gain: ID or C6.0 Decision Tree
- Gini Index: CART Decision Tree

In the previous blog, we have explained Gini Index impurity measure. In this blog, the focus will be on explaining Information Gain and Entroy based impurity measure.

## Entropy

**Entropy** is calculated based on proportion of Target Values. And formula is as follow

One point to note is that log is of base 2.

Now we want to show you Entropy calculation using an example. We have an input data and mix of target value is as follow. This is an example of Binary Classification (target value has two levels).

In this target variable Yes has 74% values and No has 26%.

P (Target=Yes) = P1 = - 0.74 *log (0.74)

P1 = (-)0.74*-0.434402824

P1 = 0.32145809

P (Target=No) = P2 = - 0.26 *log (0.26)

P2 = (-)0.26*-1.943416472

P2 = 0.505288283

Now entropy for the node is calculated as

Entropy = P1+P2

Entropy = 0.826746372

For Binary Target Variable maximum value of Entropy can be 1. And minium value of Entroy is 0. Entropy is maximum when a node is most impure (in binary 50% observations for each of the target value).

1 2 3 4 5 6 7 8 9 10 11 |
entropy <- function(target){ freq <- data.frame(prop.table(table(target))) freq$entropy <- -log2(freq$Freq)*freq$Freq sum(freq$entropy) } bino <- c(rep(1,16),rep(0,14)) entropy(bino) ## [1] 0.9967916 |

## Information Gain

Entropy gives measure of impurity in a node. In a decision tree building process, two important decisions are to be made - what is the best split(s) and which is the best variable to split a node.

Information Gain criteria helps in making these decisions. Using a independent variable value(s), the child nodes are created. We need to calculate Entropy of Parent and Child Nodes for calculating the information gain due to the split. A variable with highest information gain is selected for the split.

Example: As discussed in the case of Gini Index calculation, we have an independent variable as Gender which takes two values -“Male” and “Female”. So for Gender variable, there will be two code nodes for each of the genders.

For calculating information gain for Gender, we need to find out Entropy for Parent and Children nodes.

**Entropy (Parent)**

P0 = p(target=No) = Proportion of Target values No P0 = 0.5

P1 = p(target=Yes) = Proportion of Target values Yes P1 = 0.5

Entropy (Parent) = -P0 log(P0) -P1 log (p1) = 0. 5 + 0.5 = 1 Considering 50% proportion of both target level in Binary example, we have got the value directly.

**Entropy (Gender=Male)**

Count of Target value Yes:2 Count of Target value No: 6

P0 = p(target=Yes) = 0.25 P1 = p(target=No) = 0.75

Entropy (Gender=Male)

1 |
= -0.25log(0.25) + -0.75log(0.75) = 0.5623351 |

**Entropy (Gender=Female)**

Count of Target value Yes:6 Count of Target value No: 10

P0 = p(target=Yes) = 0.375 P1 = p(target=No) = 0.625

Entropy (Gender=Female)

1 |
= -0.375log(0.375) + -0.625log(0.625) = 0.6615632 |

**Information Gain** = Entropy of Parent - Weighted Entropy of Children Nodes = Entropy of Parent - % Females in Parent Node * Entropy of Female Node - % Males in Parent Node * Entropy of Male Node

In the parent node has 66.67% Males and 33.33% Females.

1 2 |
= 1 -0.6667*0.5623351 - 0.3333*0.6615632 = 0.4045922 |

So gain for Gender variable is 0.4045922. Similarly, we need to calculate gains for other independent variables to select the best variable for a split in a decision tree building process.

One of the limitation with Information Gain is that it will typically select a categorical variable which has more number of levels.So, a revised impurity measure is developed and that is called Gain’s Ratio.

Advantage of Entropy or Information based impurity measure is that it can be used for multi-class dependent variable.

**References**

http://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf

Nicely explained, Entropy and Information gain is very fundamental concept but quit helpful for beginners.

How to calculate entropy for continuous variables ?

Hello,

Good post!

Don't you have swapped the Male and Female terms in Information Gain explanation? Look at the picture and title of the explanations.

Regards