Performance comparison of a parallel recommender algorithm across three Hadoop-based frameworks

Bryan Carpenter, Christina Pierrette Abilali Diedhiou, Aamir Shafi, Soumabha Sarkar, Ramazan Esmeli, Ryan Gadsdon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

449 Downloads (Pure)

Abstract

One of the challenges our society faces is the ever-increasing amount of data, requiring systems to analyse large data sets without compromising their performances, and humans to navigate through a deluge of irrelevant material. Among existing platforms that address the system requirements, Hadoop is a framework widely used to store and analyse “big data”. On the human side, one of the aids to finding the things people really want is recommendation systems. This paper evaluates approaches to highly scalable parallel algorithms for recommendation systems with application to very large data sets. A particular goal is to evaluate an open source Java message passing library for parallel computing called MPJ Express, which has been integrated with Hadoop. As a demonstration we use MPJ Express to implement collaborative filtering on various data sets using the algorithm ALSWR (Alternating-Least-Squares with Weighted-λ-Regularization). We benchmark the performance and demonstrate parallel speedup on Movielens and Yahoo Music data sets. We then compare our results with two other frameworks: Mahout and Spark. Our results indicate that MPJ Express implementation of ALSWR has very competitive performance and scalability in comparison with the two other frameworks.
Original languageEnglish
Title of host publication2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages8
ISBN (Electronic)978-1-5386-7769-8
ISBN (Print)978-1-5386-7770-4
DOIs
Publication statusPublished - 21 Feb 2019
EventHigh Performance Machine Learning Workshop - Lyon, France
Duration: 24 Nov 2018 → …

Publication series

NameIEEE SBAC-PAD Proceedings Series
PublisherIEEE
ISSN (Print)1550-6533

Conference

ConferenceHigh Performance Machine Learning Workshop
Abbreviated titleHMPL 2018
Country/TerritoryFrance
CityLyon
Period24/11/18 → …

Keywords

  • HPC
  • Mahout
  • Spark
  • YARN
  • MapReduce
  • Hadoop
  • MPJExpress

Fingerprint

Dive into the research topics of 'Performance comparison of a parallel recommender algorithm across three Hadoop-based frameworks'. Together they form a unique fingerprint.

Cite this