Extremely Randomized Trees (Extra-Trees) is a machine learning ensemble method that builds upon Random Forests construction process. Unlike Random Forests, which search for the optimal split point, Extra-Trees randomly selects both the features and the split thresholds to create more diversified trees. This approach, while sacrificing some bias for lower variance, can reduce overfitting, create more diverse trees, and speed up training times. The final prediction is made by averaging regression outputs or performing a majority vote for classification, similar to a Random Forest. (medium post, geeksforgeeks post)
How Extra-Trees Work
- Ensemble of Trees: Extra-Trees creates a forest of multiple decision trees, similar to a Random Forest.
- Random Feature Selection: At each node, a random subset of features is selected for splitting, a technique also used in Random Forests.
- Random Threshold Selection: Unlike Random Forests, which search for the optimal split point, Extra-Trees randomly selects a threshold for each feature. It then picks the best split from this set of random thresholds.
- Whole Dataset Use: Instead of using bootstrap samples, each tree in the Extra-Trees forest is typically trained on the entire dataset.
- Aggregation of Predictions: For regression, the final prediction is the average of the individual tree predictions. For classification, it is the majority vote among the trees.
Key Differences from Random Forests
- Randomness in Split Points:
- Extra Trees: Splits are chosen randomly, making the algorithm faster and simplifying the process of building each tree.
- Random Forest: Searches for the optimal split point for each node, which requires more computational effort but can potentially lead to better predictive performance.
- Bootstrapping:
- Extra Trees: No Bootstrapping. Each decision tree is trained on the entire original training dataset (unlike Random Forest, which uses bootstrap samples). This reduces the model’s overall bias.
- Random Forest: Uses bootstrap replicas, meaning each tree is trained on a random subsample of the original data with replacement.
- Bias and Variance:
- Extra Trees: The added randomness of random splits can increase the model’s bias but also helps to reduce variance, leading to better generalization.
- Random Forest: By reducing variance and using optimal splits, Random Forest typically has lower bias and variance compared to a single decision tree.
- Computational Cost:
- Extra Trees: Faster to train due to the lack of optimal split point calculation.
- Random Forest: More computationally expensive because it involves finding the best split point for each node.
When to Choose Which
- Choose Extra Trees if:
- You need a faster training time.
- You want a model that is more robust to noisy or irrelevant features.
- You want to reduce the need for extensive hyperparameter tuning.

