Biased random forest for dealing with the class imbalance problem

Mohamed Bader-El-Den, Eleman Teitei, Todd Perry

Research output: Contribution to journalArticlepeer-review

2143 Downloads (Pure)

Abstract

The class imbalance issue has been a persistent problem in machine learning that hinders the accurate predictive analysis of data in many real-world applications. The class imbalance problem exists when the number of instances present in a class (or classes) is significantly fewer than the number of instances belonging to another class (or classes). Sufficiently recognizing the minority class during classification is a problem as most algorithms employed to learn from data input are biased toward the majority class. The underlying issue is made more complex with the presence of data difficult factors embedded in such data input. This paper presents a novel and effective ensemble-based method for dealing with the class imbalance problem. This paper is motivated by the idea of moving the oversampling from the data level to the algorithm level, instead of increasing the minority instances in the data sets, the algorithms in this paper aims to ``oversample the classification ensemble'' by increasing the number of classifiers that represent the minority class in the ensemble, i.e., random forest. The proposed biased random forest algorithm employs the nearest neighbor algorithm to identify the critical areas in a given data set. The standard random forest is then fed with more random trees generated based on the critical areas. The results show that the proposed algorithm is very effective in dealing with the class imbalance problem.

Original languageEnglish
Pages (from-to)2163-2172
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume30
Issue number7
Early online date20 Nov 2018
DOIs
Publication statusPublished - 1 Jul 2019

Keywords

  • Class Imbalance
  • Classification
  • Random Forest
  • Nearest Neighbour

Fingerprint

Dive into the research topics of 'Biased random forest for dealing with the class imbalance problem'. Together they form a unique fingerprint.

Cite this