Column-wise guided data imputation

Alessio Petrozziello, Ivan Jordanov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

169 Downloads (Pure)

Abstract

This paper investigates data imputation techniques for pre-processing of dataset with missing values. The current literature is mainly focused on the overall accuracy, evaluated estimating the missing values on the dataset at hand, however the predictions can be suboptimal when considering the model performance for each feature. To address this problem, a Column-wise Guided Data Imputation method (cGDI) is proposed. Its main novelty resides in the selection of the most suitable model from a multitude of imputation techniques for each individual feature, through a learning process on the known data. To assess the performance of the proposed technique, empirical experiments have been conducted on 13 publicly available datasets. The results show that cGDI outperforms two baselines and has always comparable or greater estimation accuracy over four state-of-the-art methods, widely applied to solve the problem at hand. Furthermore, cGDI has a straightforward implementation and any other known imputation technique can be easily added.
Original languageEnglish
Title of host publicationProcedia Computer Science
Place of PublicationZurich
PublisherElsevier
Pages2282–2286
Number of pages5
Volume108
EditionC
DOIs
Publication statusPublished - 9 Jun 2017

Publication series

NameProcedia Computer Science
PublisherElsevier
ISSN (Electronic)1877-0509

Keywords

  • missing data
  • data imputation
  • multitude of imputation models

Fingerprint

Dive into the research topics of 'Column-wise guided data imputation'. Together they form a unique fingerprint.

Cite this