Bi-objective optimization of data-parallel applications on heterogeneous HPC platforms for performance and energy through workload distribution

Hamidreza Khaleghzadeh, Muhammad Fahad, Arsalan Shahid, Ravi Reddy Manumachu, Alexey Lastovetsky

Research output: Contribution to journalArticlepeer-review


Performance and energy are the two most important objectives for optimization on modern parallel platforms. In this article, we show that moving from single-objective optimization for performance or energy to their bi-objective optimization on heterogeneous processors results in a tremendous increase in the number of optimal solutions (workload distributions) even for the simple case of linear performance and energy profiles. We then study full performance and energy profiles of two real-life data-parallel applications and find that they exhibit shapes that are non-linear and complex enough to prevent good approximation of them as analytical functions for input to exact algorithms or optimization software for determining the Pareto front. We, therefore, propose a solution method solving the bi-objective optimization problem on heterogeneous processors. The method's novel component is an efficient and exact global optimization algorithm that takes as an input performance and energy profiles as arbitrary discrete functions of workload size, which accurately and realistically take into account resource contention and NUMA inherent in modern parallel platforms, and returns the Pareto-optimal solutions (generally speaking, load imbalanced). To construct the input discrete energy functions, the method employs a methodology that accurately models the energy consumption by a hybrid data-parallel application executing on a heterogeneous HPC platform containing different computing devices using system-level power measurements provided by power meters. We experimentally analyse the proposed solution method using three data-parallel applications, matrix multiplication, 2D fast Fourier transform (2D-FFT), and gene sequencing, on two connected heterogeneous servers consisting of multicore CPUs, GPUs, and Intel Xeon Phi. We show that it determines a superior Pareto front containing the best load balanced solutions and all the load imbalanced solutions that are ignored by load balancing methods.
Original languageEnglish
Pages (from-to)543-560
JournalIEEE Transactions on Parallel and Distributed Systems
Issue number3
Publication statusPublished - 28 Sept 2020


  • Heterogeneous platforms
  • data-parallel applications
  • workload partitioning
  • performance optimization
  • energy optimization
  • bi-objective optimization
  • workload distribution
  • multicore CPU
  • GPU
  • Intel Xeon Phi


Dive into the research topics of 'Bi-objective optimization of data-parallel applications on heterogeneous HPC platforms for performance and energy through workload distribution'. Together they form a unique fingerprint.

Cite this