Feature based multivariate data imputation
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
We investigate a new multivariate data imputation approach for dealing with variety of types of missingness. The proposed approach relies on the aggregation of the most suitable methods from a multitude of imputation techniques, adjusted to each feature of the dataset. We report results from comparison with two single imputation techniques (Random Guessing and Median Imputation) and four state-of-the-art multivariate methods (K-Nearest Neighbour Imputation, Bagged Tree Imputation, Missing Imputation Chained Equations, and Bayesian Principal Component Analysis Imputation) on several datasets from the public domain, demonstrating favorable performance for our model. The proposed method, namely Feature Guided Data Imputation is compared with the other tested methods in three different experimental settings: Missing Completely at Random, Missing at Random and Missing Not at Random with 25% missing data in the test set over five-fold cross validation. Furthermore, the proposed model has straightforward implementation and can easily incorporate other imputation techniques.
|Title of host publication||Machine Learning, Optimization, and Data Science - 4th International Conference, LOD 2018, Revised Selected Papers|
|Editors||Giuseppe Nicosia, Giovanni Giuffrida, Giuseppe Nicosia, Panos Pardalos, Vincenzo Sciacca, Renato Umeton|
|Number of pages||12|
|Publication status||Published - Mar 2019|
|Event||4th International Conference on Machine Learning, Optimization, and Data Science - Volterra, Italy|
Duration: 13 Sep 2018 → 16 Sep 2018
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||4th International Conference on Machine Learning, Optimization, and Data Science|
|Abbreviated title||LOD 2018|
|Period||13/09/18 → 16/09/18|
- Feature Based Multivariate Data Imputation - post-print
Rights statement: The final authenticated version is available online at: http://dx.doi.org/10.1007%2F978-3-030-13709-0_3.
Accepted author manuscript (Post-print), 310 KB, PDF document