TY - GEN
T1 - Feature based multivariate data imputation
AU - Petrozziello, Alessio
AU - Jordanov, Ivan
PY - 2019/3
Y1 - 2019/3
N2 - We investigate a new multivariate data imputation approach for dealing with variety of types of missingness. The proposed approach relies on the aggregation of the most suitable methods from a multitude of imputation techniques, adjusted to each feature of the dataset. We report results from comparison with two single imputation techniques (Random Guessing and Median Imputation) and four state-of-the-art multivariate methods (K-Nearest Neighbour Imputation, Bagged Tree Imputation, Missing Imputation Chained Equations, and Bayesian Principal Component Analysis Imputation) on several datasets from the public domain, demonstrating favorable performance for our model. The proposed method, namely Feature Guided Data Imputation is compared with the other tested methods in three different experimental settings: Missing Completely at Random, Missing at Random and Missing Not at Random with 25% missing data in the test set over five-fold cross validation. Furthermore, the proposed model has straightforward implementation and can easily incorporate other imputation techniques.
AB - We investigate a new multivariate data imputation approach for dealing with variety of types of missingness. The proposed approach relies on the aggregation of the most suitable methods from a multitude of imputation techniques, adjusted to each feature of the dataset. We report results from comparison with two single imputation techniques (Random Guessing and Median Imputation) and four state-of-the-art multivariate methods (K-Nearest Neighbour Imputation, Bagged Tree Imputation, Missing Imputation Chained Equations, and Bayesian Principal Component Analysis Imputation) on several datasets from the public domain, demonstrating favorable performance for our model. The proposed method, namely Feature Guided Data Imputation is compared with the other tested methods in three different experimental settings: Missing Completely at Random, Missing at Random and Missing Not at Random with 25% missing data in the test set over five-fold cross validation. Furthermore, the proposed model has straightforward implementation and can easily incorporate other imputation techniques.
KW - Data mining
KW - Missing data
KW - Multitude of imputation models
KW - Multivariate data imputation
UR - http://www.scopus.com/inward/record.url?scp=85063591805&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-13709-0_3
DO - 10.1007/978-3-030-13709-0_3
M3 - Conference contribution
AN - SCOPUS:85063591805
SN - 9783030137083
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 26
EP - 37
BT - Machine Learning, Optimization, and Data Science - 4th International Conference, LOD 2018, Revised Selected Papers
A2 - Nicosia, Giuseppe
A2 - Giuffrida, Giovanni
A2 - Nicosia, Giuseppe
A2 - Pardalos, Panos
A2 - Sciacca, Vincenzo
A2 - Umeton, Renato
PB - Springer Verlag
T2 - 4th International Conference on Machine Learning, Optimization, and Data Science
Y2 - 13 September 2018 through 16 September 2018
ER -