Imbalanced classification using genetically optimized cost sensitive classifiers

Todd Perry, Mohamed Bader-El-Den, Steven Cooper

Research output: Chapter in Book/Report/Conference proceedingConference contribution

306 Downloads (Pure)

Abstract

Classification is one of the most researched problems in machine learning, since the 1960s a myriad of different techniques have been proposed. The purpose of a classification algorithm, also known as a ‘classifier’, is to identify what class, or category an observation belongs to. In many real-world scenarios, datasets tend to suffer from class imbalance, where the number of observations belonging to one class greatly outnumbers that of the observations belonging to other classes. Class imbalance has been shown to hinder the performance of classifiers, and several techniques have been developed to improve the performance of imbalanced classifiers. Using a cost matrix is one such technique for dealing with class imbalance, however it requires a matrix to be either pre-defined, or manually optimized. This paper proposes an approach for automatically generating optimized cost matrices using a genetic algorithm. The genetic algorithm can generate matrices for classification problems with any number of classes, and is easy to tailor towards specific use-cases. The proposed approach is compared against unoptimized classifiers and alternative cost matrix optimization techniques using a variety of datasets. In addition to this, storage system failure prediction datasets are provided by Seagate UK, the potential of these datasets is investigated.
Original languageEnglish
Title of host publication2015 IEEE Congress on Evolutionary Computation (CEC)
Pages680-687
Number of pages8
ISBN (Electronic)978-1-4799-7492-4
DOIs
Publication statusPublished - Nov 2015
EventIEEE Congress on Evolutionary Computation - Sendai, Japan
Duration: 25 Aug 201528 Aug 2015

Publication series

Name
ISSN (Print)1089-778X
ISSN (Electronic)1941-0026

Conference

ConferenceIEEE Congress on Evolutionary Computation
Country/TerritoryJapan
CitySendai
Period25/08/1528/08/15

Keywords

  • drives
  • genomics
  • bioinformatics
  • sociology
  • statistics
  • genetic algorithms

Fingerprint

Dive into the research topics of 'Imbalanced classification using genetically optimized cost sensitive classifiers'. Together they form a unique fingerprint.

Cite this