Biased random forest for dealing with the class imbalance problem
Research output: Contribution to journal › Article
The class imbalance issue has been a persistent problem in machine learning that hinders the accurate predictive analysis of data in many real-world applications. The class imbalance problem exists when the number of instances present in a class (or classes) is significantly fewer than the number of instances belonging to another class (or classes). Sufficiently recognizing the minority class during classification is a problem as most algorithms employed to learn from data input are biased toward the majority class. The underlying issue is made more complex with the presence of data difficult factors embedded in such data input. This paper presents a novel and effective ensemble-based method for dealing with the class imbalance problem. This paper is motivated by the idea of moving the oversampling from the data level to the algorithm level, instead of increasing the minority instances in the data sets, the algorithms in this paper aims to ``oversample the classification ensemble'' by increasing the number of classifiers that represent the minority class in the ensemble, i.e., random forest. The proposed biased random forest algorithm employs the nearest neighbor algorithm to identify the critical areas in a given data set. The standard random forest is then fed with more random trees generated based on the critical areas. The results show that the proposed algorithm is very effective in dealing with the class imbalance problem.
|Journal||IEEE Transactions on Neural Networks and Learning Systems|
|Early online date||30 Nov 2018|
|Publication status||Early online - 30 Nov 2018|
Rights statement: © © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript (Post-print), 971 KB, PDF-document