TY - GEN
T1 - Column-wise guided data imputation
AU - Petrozziello, Alessio
AU - Jordanov, Ivan
PY - 2017/6/9
Y1 - 2017/6/9
N2 - This paper investigates data imputation techniques for pre-processing of dataset with missing values. The current literature is mainly focused on the overall accuracy, evaluated estimating the missing values on the dataset at hand, however the predictions can be suboptimal when considering the model performance for each feature. To address this problem, a Column-wise Guided Data Imputation method (cGDI) is proposed. Its main novelty resides in the selection of the most suitable model from a multitude of imputation techniques for each individual feature, through a learning process on the known data. To assess the performance of the proposed technique, empirical experiments have been conducted on 13 publicly available datasets. The results show that cGDI outperforms two baselines and has always comparable or greater estimation accuracy over four state-of-the-art methods, widely applied to solve the problem at hand. Furthermore, cGDI has a straightforward implementation and any other known imputation technique can be easily added.
AB - This paper investigates data imputation techniques for pre-processing of dataset with missing values. The current literature is mainly focused on the overall accuracy, evaluated estimating the missing values on the dataset at hand, however the predictions can be suboptimal when considering the model performance for each feature. To address this problem, a Column-wise Guided Data Imputation method (cGDI) is proposed. Its main novelty resides in the selection of the most suitable model from a multitude of imputation techniques for each individual feature, through a learning process on the known data. To assess the performance of the proposed technique, empirical experiments have been conducted on 13 publicly available datasets. The results show that cGDI outperforms two baselines and has always comparable or greater estimation accuracy over four state-of-the-art methods, widely applied to solve the problem at hand. Furthermore, cGDI has a straightforward implementation and any other known imputation technique can be easily added.
KW - missing data
KW - data imputation
KW - multitude of imputation models
U2 - 10.1016/j.procs.2017.05.008
DO - 10.1016/j.procs.2017.05.008
M3 - Conference contribution
VL - 108
T3 - Procedia Computer Science
SP - 2282
EP - 2286
BT - Procedia Computer Science
PB - Elsevier
CY - Zurich
ER -