Abstract
The class imbalance issue has been a persistent problem in machine learning that hinders the accurate predictive analysis of data in many real-world applications. The class imbalance problem exists when the number of instances present in a class (or classes) is significantly fewer than the number of instances belonging to another class (or classes). Sufficiently recognizing the minority class during classification is a problem as most algorithms employed to learn from data input are biased toward the majority class. The underlying issue is made more complex with the presence of data difficult factors embedded in such data input. This paper presents a novel and effective ensemble-based method for dealing with the class imbalance problem. This paper is motivated by the idea of moving the oversampling from the data level to the algorithm level, instead of increasing the minority instances in the data sets, the algorithms in this paper aims to ``oversample the classification ensemble'' by increasing the number of classifiers that represent the minority class in the ensemble, i.e., random forest. The proposed biased random forest algorithm employs the nearest neighbor algorithm to identify the critical areas in a given data set. The standard random forest is then fed with more random trees generated based on the critical areas. The results show that the proposed algorithm is very effective in dealing with the class imbalance problem.
Original language | English |
---|---|
Pages (from-to) | 2163-2172 |
Journal | IEEE Transactions on Neural Networks and Learning Systems |
Volume | 30 |
Issue number | 7 |
Early online date | 20 Nov 2018 |
DOIs | |
Publication status | Published - 1 Jul 2019 |
Keywords
- Class Imbalance
- Classification
- Random Forest
- Nearest Neighbour
Fingerprint
Dive into the research topics of 'Biased random forest for dealing with the class imbalance problem'. Together they form a unique fingerprint.Datasets
-
Data availability statement for 'Biased random forest for dealing with the class imbalance problem'.
Bader-El-Den, M. (Creator), Teitei, E. (Creator) & Perry, T. (Creator), IEEE Computational Intelligence Society, 16 Oct 2018
Dataset