Biased random forest for dealing with the class imbalance problem
Research output: Contribution to journal › Article
The class imbalance issue has been a persistent problem in machine learning that hinders the accurate predictive analysis of data in many real-world applications. The Class imbalance problem exists when the number of instances present in a class (or classes) is significantly fewer than the number of instances belonging to another class (or classes). Sufficiently recognising the minority class during classification is a problem as most algorithms employed to learn from data input are biased towards the majority class. The underlying issue is made more complex with the presence of data difficult factors embedded in such data input. This paper presents a novel and effective ensemble-based method for dealing with the class imbalance problem. This study is motivated by the idea of moving the oversampling from the data level to the algorithm level, instead of increasing the the minority instances in the datasets, the algorithms in this paper aims to ”oversample the classification ensemble” by increasing the number of classifiers that represent the minority class in the ensemble i.e. Random Forest. The proposed Biased Random Forest BRAF algorithm employs the nearest neighbour algorithm to identify the critical areas in a given dataset. The standard "random forest" is then fed with more random-trees generated based on the "critical areas". The results show that the proposed algorithm is very effective in dealing with the class imbalance problem.
|Journal||IEEE Transactions on Neural Networks and Learning Systems|
|State||Accepted for publication - 16 Oct 2018|