3.4.3. Random Forest
Introduced by [58], Random Forests is a statistical supervised machine learning techniques that us used for both regression and classification. This is an ensemble learning technique that uses an averaged combination of many decision trees for the final prediction. The technique of averaging a statistical machine learning model is called bagging and it improves stability and avoids overfitting [59]. Normally, decision trees are not that competitive to the best-supervised learning approaches in terms of prediction accuracy since they tend to have high variance and low bias. This is because building two different decision trees can yield in two different trees. Bagging is therefore well suited for decision tress since it reduces the variance. The idea behind Random Forests is to draw B bootstrap samples from the training data set and then build several different decision trees on the B different training samples. The reason why this method is called Random Forests is that it chooses random input variables before every split when building each tree. By doing this, each tree will have a reduced covariance which in turn will lower the overall variance even further [59]. A random forest ensemble algorithm was created with 100 combined trees. The batch size was selected as 10 and the depth of the trees was set to unlimited. Other metrics for the random forest algorithm is given in Table 4.