Classifiers accuracy improvement based on missing data imputation

Ivan Jordanov, Nedyalko Petrov, Alessio Petrozziello

Research output: Contribution to journalArticlepeer-review

241 Downloads (Pure)

Abstract

In this paper we investigate further and extend our previous work on radar signal classification and source identification using three classification methods: Neural Networks (NN); Support Vector Machines (SVM); and Random Forests (RF). The dataset used in this task consists of pulse train characteristics such as signal frequencies, type of modulation, pulse repetition intervals, scanning type, scan period, etc., represented as a mixture of continuous, discrete and categorical data. Also, a considerable part of the data samples contains missing values. In our previous work, we used only part of the radar dataset, resulting from listwise deletion of the samples with missing values and processed relatively small subset of complete data. To deal with the missing data we investigate three different imputation techniques: multiple imputation (MI), K-Nearest Neighbour Imputation (KNNI) and Bagged Tree Imputation (BTI). We apply these methods to all data samples with up to 60% missingness, this way increasing more than twice the size of the initially used data subsets. The performance of the imputation models is assessed with Wilcoxon’s test for statistical significance and Cohen’s effect size metric. We employ the three classifiers (NN, SVM, and RF), and critically analyse which imputation method helps most to improve the classifiers’ performance, using a multiclass classification accuracy metric, based on the area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each containing several ‘subclasses’, and introduce and propose two new metrics: inner class accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy (OCA) metric. We conclude that they can be used as complementary to the OCA when choosing the best classifier for the problem at hand.
Original languageEnglish
Pages (from-to)31-48
Number of pages15
JournalJournal of Artificial Intelligence and Soft Computing Research
Volume8
Issue number1
Early online date1 Nov 2017
DOIs
Publication statusPublished - 1 Jan 2018

Keywords

  • Radar signal classification
  • Machine learning
  • Missing data
  • Model-based imputation
  • Neural networks
  • Random forests
  • Support vector machines
  • simulation evaluation

Fingerprint

Dive into the research topics of 'Classifiers accuracy improvement based on missing data imputation'. Together they form a unique fingerprint.

Cite this