How decision trees work?

Madhuresh Gupta
Analytics Vidhya
Published in
4 min readMay 17, 2020

--

Today I will be explaining one of the most commonly used data classification algorithm, decision trees. In this type of classification we split the data based on features and then further more with another features until we reach the end. I am going to explain using ID3 (Iterative Dichotomiser 3), that takes help of Entropy and Information gain metrics to calculate the node (features) and leaf (end output).

The main idea is to divide the whole dataset in a form of an inverted tree and reach a single outcome at the end of every branch.

So how do we build?

To start with the tree, we need to take one attribute (feature) which can give maximum split in the data (or most information gain) so that this way we can reach the end of branch in the least amount of possible iterations.

Let me explain by taking a classic “Play tennis” data example:

Here we need to decide whether we can play tennis or not based on the weather conditions given for any particular day.

As we can see here we have 4 attributes (Outlook, Temperature, Humidity and Windy) based on which we decide if we can go to play tennis or not. Thus let’s build a tree using the ID3 algorithm!

To start with the root node, we need to take the best attribute/ feature which would classify them.

How to select the best classifying attribute?

Here, we take help of the Information Gain formula and calculate for every attribute. The one which gives the highest gain, that will be selected as our root node.

There are primarily 3 formulae which will be used for ID3 algorithm which are:

Information Gain-

Entropy-

Gain-

And these are the steps we follow:

  1. First calculate the information gain of the table.
  2. Calculate the Entropy for that attribute.
  3. Now calculate the Gain for the corresponding attribute.
  4. Take the one which has the highest gain to be the node.
  5. Repeat the whole process to find new set of information gain, entropy and gain after filtering the attribute values.

In the similar manner, I have calculated the gains for other attributes too.

Gain(Outlook) = 0.248
Gain(Temperature) = 0.029
Gain(Humidity) = 0.151
Gain(Wind) = 0.048

As we can see we have the highest Gain in “Outlook” attribute thus, this will be the root node for our decision tree. Thus our decision tree look like this:

Now we shall repeat the same steps with a subset of table wherein we take each type of outlook and see which attribute will be used to further carry on the decision tree.

So these are the rows we are left with when we take “Sunny” as the outlook.

Similarly we calculate the Entropy and Gain for other attributes (Humidity, Wind). Thus these are the Gains for the attributes in Sunny table:

Gain(Temperature) = 0.57
Gain(Humidity) = 0.970
Gain(Wind) = 0.020

As we can see the highest gain is obtained by Humidity attribute and hence it becomes our next node for classification. Also if we see for overcast outlook we observe all the all the outcomes are yes and we can directly end it here itself (this is the leaf).

Thus the trees now looks like this:

As you now might have got an idea how we shall proceed with Outlook-Rain table. After performing the above steps again for the table outlook-rain, Wind has the highest gain and we finally have the complete decision tree ready and this is how it will look like:

I have even implemented the above algorithm using python and if you want to see how it was done, check out my GitHub repository here.

I hope by now you would have a pretty clear understanding on how decision trees work and how it is calculated.

Thanks!

Originally published at http://madhureshgupta.home.blog on May 17, 2020.

--

--

Madhuresh Gupta
Analytics Vidhya

Follows Microsoft, Tesla Motors, SpaceX, Google, Elon Musk. Aviation Admirer!