Abstract
Class imbalance is a problem that commonly affects 'real world' classification datasets, and has been shown to hinder the performance of classifiers. A dataset suffers from class imbalance when the number of instances belonging to one class outnumbers the number of instance belonging to another class. Two ways of dealing with class imbalance are modifying the dataset to reduce the number of instances belonging to the majority class(es) (known as resampling), or allowing the classifier to penalize misclassifying the minority class(es) more than the majority class(es), this can be done by implementing a cost matrix. This paper attempts to improve the classification performance of the Random Forest classifier on imbalanced datasets by exploiting these two techniques, to do this a genetic algorithm is employed to find optimal parameters. Results are compared to commonly used classification algorithms.
Original language | English |
---|---|
Title of host publication | GECCO Companion '15 |
Subtitle of host publication | proceedings of the companion publication of the 2015 on genetic and evolutionary computation conference |
Place of Publication | New York |
Publisher | ACM |
Pages | 1453-1454 |
ISBN (Print) | 978-1450334884 |
Publication status | Published - 2015 |
Event | Genetic and Evolutionary Computation Conference - Madrid, Spain Duration: 11 Jul 2015 → 15 Jul 2015 |
Conference
Conference | Genetic and Evolutionary Computation Conference |
---|---|
Country/Territory | Spain |
City | Madrid |
Period | 11/07/15 → 15/07/15 |
Keywords
- random forest
- genetic algorithms
- classification
- cost-sensitive classification
- cost matrix