TY - GEN
T1 - Imbalanced classification using genetically optimized cost sensitive classifiers
AU - Perry, Todd
AU - Bader-El-Den, Mohamed
AU - Cooper, Steven
PY - 2015/11
Y1 - 2015/11
N2 - Classification is one of the most researched problems
in machine learning, since the 1960s a myriad of different
techniques have been proposed. The purpose of a classification
algorithm, also known as a ‘classifier’, is to identify what class, or
category an observation belongs to. In many real-world scenarios,
datasets tend to suffer from class imbalance, where the number
of observations belonging to one class greatly outnumbers that of
the observations belonging to other classes. Class imbalance has
been shown to hinder the performance of classifiers, and several
techniques have been developed to improve the performance of
imbalanced classifiers. Using a cost matrix is one such technique
for dealing with class imbalance, however it requires a matrix
to be either pre-defined, or manually optimized. This paper
proposes an approach for automatically generating optimized cost
matrices using a genetic algorithm. The genetic algorithm can
generate matrices for classification problems with any number
of classes, and is easy to tailor towards specific use-cases. The
proposed approach is compared against unoptimized classifiers
and alternative cost matrix optimization techniques using a
variety of datasets. In addition to this, storage system failure
prediction datasets are provided by Seagate UK, the potential of
these datasets is investigated.
AB - Classification is one of the most researched problems
in machine learning, since the 1960s a myriad of different
techniques have been proposed. The purpose of a classification
algorithm, also known as a ‘classifier’, is to identify what class, or
category an observation belongs to. In many real-world scenarios,
datasets tend to suffer from class imbalance, where the number
of observations belonging to one class greatly outnumbers that of
the observations belonging to other classes. Class imbalance has
been shown to hinder the performance of classifiers, and several
techniques have been developed to improve the performance of
imbalanced classifiers. Using a cost matrix is one such technique
for dealing with class imbalance, however it requires a matrix
to be either pre-defined, or manually optimized. This paper
proposes an approach for automatically generating optimized cost
matrices using a genetic algorithm. The genetic algorithm can
generate matrices for classification problems with any number
of classes, and is easy to tailor towards specific use-cases. The
proposed approach is compared against unoptimized classifiers
and alternative cost matrix optimization techniques using a
variety of datasets. In addition to this, storage system failure
prediction datasets are provided by Seagate UK, the potential of
these datasets is investigated.
KW - drives
KW - genomics
KW - bioinformatics
KW - sociology
KW - statistics
KW - genetic algorithms
U2 - 10.1109/CEC.2015.7256956
DO - 10.1109/CEC.2015.7256956
M3 - Conference contribution
SN - 978-147997492415
SP - 680
EP - 687
BT - 2015 IEEE Congress on Evolutionary Computation (CEC)
T2 - IEEE Congress on Evolutionary Computation
Y2 - 25 August 2015 through 28 August 2015
ER -