Distributed neural networks for missing big data imputation

Alessio Petrozziello, Ivan Jordanov, Christian Sommeregger

Research output: Chapter in Book/Report/Conference proceedingConference contribution

142 Downloads (Pure)

Abstract

In this paper we investigate the use of Distributed Neural Networks for the imputation of missing values in Big Data context. The presented framework for data imputation is implemented in Spark, allowing easy imputation as an additional step to the data pre-processing pipeline. The Distributed Neural Networks model is using Mini-batch Stochastic Gradient Descent, scaling well with the cluster size and minimizing the communication among the workers. The model is tested on a real-world Recommender Systems dataset, where the missing data is generally a problem for new items, as the systems ranking is usually biased towards the popular items. The model is compared with univariate (Mean and Median Imputation) and multivariate (K-Nearest Neighbours and Linear Regression) imputation techniques, and its performance is validated using prediction accuracy and speed. Furthermore, we evaluate the speedup compared to the sequential implementation of Neural Networks with Stochastic Gradient Descent.
Original languageEnglish
Title of host publication2018 International Joint Conference on Neural Networks (IJCNN)
PublisherIEEE
Pages131-138
Number of pages9
ISBN (Electronic)978-1-5090-6014-6
ISBN (Print)978-1-5090-6015-3
DOIs
Publication statusPublished - 15 Oct 2018
EventIEEE WCCI 2018, World Congress on Computational Intelligence - Roi de Janeiro, Brazil
Duration: 8 Jul 201813 Jul 2018
http://www.ecomp.poli.br/~wcci2018/

Publication series

NameIEEE IJCNN Proceedings Series
PublisherIEEE
ISSN (Electronic)2161-4407

Conference

ConferenceIEEE WCCI 2018, World Congress on Computational Intelligence
Abbreviated titleIJCNN
Country/TerritoryBrazil
Period8/07/1813/07/18
Internet address

Keywords

  • Distributed Computation
  • Big Data
  • Missing Data Imputation
  • Neural Networks

Fingerprint

Dive into the research topics of 'Distributed neural networks for missing big data imputation'. Together they form a unique fingerprint.

Cite this