Impurity and information criterions#

Information criteria are statistical measures used to evaluate and compare different models, including decision trees, based on their ability to fit the data while penalizing for model complexity. Information criteria are often used during the process of splitting nodes in decision trees to determine the best feature and split point. Two common information criteria used in this context are the Gini impurity and entropy.

  1. Gini Impurity:

  • Definition: Gini impurity is a measure of impurity or disorder in a set of data points. For a given node in a decision tree, the Gini impurity measures the probability of misclassifying a randomly chosen data point from that node.

  • Mathematical Formulation: For a binary classification problem with two classes (0 and 1), the Gini impurity for a node is calculated as follows: \(Gini(p) = 1 - (p_0^2 + p_1^2)\) where \(p_0\) is the proportion of data points belonging to class 0, and \(p_1\) is the proportion belonging to class 1.

  • Splitting Criterion: When splitting a node in a decision tree, the Gini impurity is used to evaluate the impurity of potential child nodes created by different feature and split point choices. The split that results in the lowest Gini impurity is chosen.

  1. Entropy:

  • Definition: Entropy measures the level of disorder or uncertainty in a set of data points. In the context of decision trees, entropy quantifies the uncertainty in class labels at a node.

  • Mathematical Formulation: For a binary classification problem, the entropy for a node is calculated as follows: \(Entropy(p) = -p_0 \log_2(p_0) - p_1 \log_2(p_1) \) where \(p_0\) and \(p_1\) are the proportions of data points in class 0 and class 1, respectively.

  • Splitting Criterion: Similar to Gini impurity, entropy is used as a criterion for selecting the best split during node splitting in a decision tree. The split that results in the lowest entropy is chosen.

  1. Decision Tree Splitting:

  • When building a decision tree, the goal is to find the feature and split point (threshold) that maximizes the reduction in impurity or entropy.

  • The information gain or reduction in impurity (measured by the initial impurity minus the impurity of child nodes) is used to determine the best split.

  • The feature and split point that result in the greatest information gain are chosen as the splitting criteria for the current node.

  • This process is applied recursively for each internal node, leading to the construction of a binary tree.

In summary, information criteria such as Gini impurity and entropy play a crucial role in the splitting process of decision trees. They help evaluate the quality of splits and guide the tree construction by selecting the feature and split point that result in the most significant reduction in impurity or entropy, ultimately leading to a more interpretable and generalizable model.