Feature based multivariate data imputation

Alessio Petrozziello*, Ivan Jordanov

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

345 Downloads (Pure)


We investigate a new multivariate data imputation approach for dealing with variety of types of missingness. The proposed approach relies on the aggregation of the most suitable methods from a multitude of imputation techniques, adjusted to each feature of the dataset. We report results from comparison with two single imputation techniques (Random Guessing and Median Imputation) and four state-of-the-art multivariate methods (K-Nearest Neighbour Imputation, Bagged Tree Imputation, Missing Imputation Chained Equations, and Bayesian Principal Component Analysis Imputation) on several datasets from the public domain, demonstrating favorable performance for our model. The proposed method, namely Feature Guided Data Imputation is compared with the other tested methods in three different experimental settings: Missing Completely at Random, Missing at Random and Missing Not at Random with 25% missing data in the test set over five-fold cross validation. Furthermore, the proposed model has straightforward implementation and can easily incorporate other imputation techniques.

Original languageEnglish
Title of host publicationMachine Learning, Optimization, and Data Science - 4th International Conference, LOD 2018, Revised Selected Papers
EditorsGiuseppe Nicosia, Giovanni Giuffrida, Giuseppe Nicosia, Panos Pardalos, Vincenzo Sciacca, Renato Umeton
PublisherSpringer Verlag
Number of pages12
ISBN (Print)9783030137083
Publication statusPublished - Mar 2019
Event4th International Conference on Machine Learning, Optimization, and Data Science - Volterra, Italy
Duration: 13 Sept 201816 Sept 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11331 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference4th International Conference on Machine Learning, Optimization, and Data Science
Abbreviated titleLOD 2018


  • Data mining
  • Missing data
  • Multitude of imputation models
  • Multivariate data imputation


Dive into the research topics of 'Feature based multivariate data imputation'. Together they form a unique fingerprint.

Cite this