Feature based multivariate data imputation

Alessio Petrozziello*, Ivan Jordanov

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

356 Downloads (Pure)

Abstract

We investigate a new multivariate data imputation approach for dealing with variety of types of missingness. The proposed approach relies on the aggregation of the most suitable methods from a multitude of imputation techniques, adjusted to each feature of the dataset. We report results from comparison with two single imputation techniques (Random Guessing and Median Imputation) and four state-of-the-art multivariate methods (K-Nearest Neighbour Imputation, Bagged Tree Imputation, Missing Imputation Chained Equations, and Bayesian Principal Component Analysis Imputation) on several datasets from the public domain, demonstrating favorable performance for our model. The proposed method, namely Feature Guided Data Imputation is compared with the other tested methods in three different experimental settings: Missing Completely at Random, Missing at Random and Missing Not at Random with 25% missing data in the test set over five-fold cross validation. Furthermore, the proposed model has straightforward implementation and can easily incorporate other imputation techniques.

Original languageEnglish
Title of host publicationMachine Learning, Optimization, and Data Science - 4th International Conference, LOD 2018, Revised Selected Papers
EditorsGiuseppe Nicosia, Giovanni Giuffrida, Giuseppe Nicosia, Panos Pardalos, Vincenzo Sciacca, Renato Umeton
PublisherSpringer Verlag
Pages26-37
Number of pages12
ISBN (Print)9783030137083
DOIs
Publication statusPublished - Mar 2019
Event4th International Conference on Machine Learning, Optimization, and Data Science - Volterra, Italy
Duration: 13 Sept 201816 Sept 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11331 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th International Conference on Machine Learning, Optimization, and Data Science
Abbreviated titleLOD 2018
Country/TerritoryItaly
CityVolterra
Period13/09/1816/09/18

Keywords

  • Data mining
  • Missing data
  • Multitude of imputation models
  • Multivariate data imputation

Fingerprint

Dive into the research topics of 'Feature based multivariate data imputation'. Together they form a unique fingerprint.

Cite this