MPJ Express meets YARN: towards Java HPC on Hadoop systems

Hamza Zafar, Farrukh Aftab Kahn, David Bryan Carpenter, Aamir Shafi, Asad Waqar Malik

Research output: Contribution to journalArticlepeer-review

139 Downloads (Pure)


Many organizations—including academic, research, commercial institutions—have invested heavily in setting up High Performance Computing (HPC) facilities for running computational science applications. On the other hand, the Apache Hadoop software—after emerging in 2005— has become a popular, reliable, and scalable open-source framework for processing large-scale data (Big Data). Realizing the importance and significance of Big Data, an increasing number of organizations are investing in relatively cheaper Hadoop clusters for executing their mission critical data processing applications. An issue here is that system administrators at these sites might have to maintain two parallel facilities for running HPC and Hadoop computations. This, of course, is not ideal due to redundant maintenance work and poor economics. This paper attempts to bridge this gap by allowing HPC and Hadoop jobs to co-exist on a single hardware facility. We achieve this goal by exploiting YARN—Hadoop v2.0—that de-couples the computational and resource scheduling part of the Hadoop framework from HDFS. In this context, we have developed a YARN-based reference runtime system for the MPJ Express software that allows executing parallel MPI-like Java applications on Hadoop clusters. The main contribution of this paper is provide Big Data community access to MPI-like programming using MPJ Express. As an aside, this work allows parallel Java applications to perform computations on data stored in Hadoop Distributed File System (HDFS).
Original languageEnglish
Pages (from-to)2678-2682
JournalProcedia Computer Science
Publication statusPublished - 2015
EventInternational Conference on Computational Science - Reykjavik, Iceland
Duration: 1 Jun 20153 Jun 2015


Dive into the research topics of 'MPJ Express meets YARN: towards Java HPC on Hadoop systems'. Together they form a unique fingerprint.

Cite this