A ‘Performance Portability’ Task in WP2 of ChEESE-2P
In the second phase of ChEESE, one of the major scientific goals involves the development of 11 community flagship codes to tackle 12 domain-specific computational challenges at the Exascale level. In this technical news article, Piero Lanucara, a researcher at CINECA and the leader of Work Package 2: Exascale Technical Challenges in Flagship Codes, provides insights on how WP2 addresses the crucial issue of performance portability for flagship codes in the Exascale era. A challenge that requires the development of an optimization methodology to ensure the optimal utilization of available HPC resources.
One of the primary challenges faced by WP2 of ChEESE-2P involves studying the ‘performance portability’ of the 11 flagship codes. It is important to remark that this issue is not new, but certainly the pressure has significantly increased due to the fact that some of the most powerful computing platforms (as seen in the current Top500 list) are made up of accelerators, including not only NVIDIA accelerators but also AMD and INTEL.
This presents us with a significant gap. On the one hand, our flagship applications may have been originally written and optimized for ‘conventional’ processors or accelerators (such as x86 CPUs and NVIDIA GPUs). On the other hand, we have a series of exascale-oriented machines with uncommon computing power but highly heterogeneous and highly evolved to the use of different accelerators in the HPC arena.
WP2 works for the adequacy of flagship codes to EuroHPC heterogeneous hardware such as Lumi and Leonardo supercomputer. Photos: P. Agarth (left) and E. Saluzzi (right).
From this challenge, which started already in the first phase of the project but is still ongoing now, arises task T2.2 of WP2: ‘Increasing the performance portability of flagship codes for different accelerators’. The idea of this task is to ensure that these applications follow a developmental trajectory that extends beyond the current branch and remains open to the utilization of ‘performance portable’ software technologies in the near future.
CINECA, the largest Italian computing centre and one of the most important worldwide, is leading this task, working closely with the code development teams in ChEESE to address application and language issues concerning the 11 flagship codes.
This task started at the beginning of the project and remains ongoing. Part of the initial work was to identify what are the best software solutions for a specific flagship code and to direct the code towards a specific line of development. A vital aspect of the strategy involves the utilization of ‘mini-apps,’ applications specifically developed within the framework of the project that are simplified with respect to the original version but contain all or almost all application and scientific features.
CINECA has supported this initiative with a series of educational activities, including webinars, open to developers in WP2 of ChEESE-2P. These webinars aimed to enhance understanding of popular tools in this field and were conducted by either CINECA staff in WP2 or external experts with proven skills in their respective fields of interest.
The topics encompassed tools such as Kokkos (a C++ ecosystem for performance portability), OpenCL (a more low-level environment but capable of exploring different accelerators with excellent performance and portability) and the intrinsic parallelism in Fortran, representing an innovative and powerful tool for adapting Fortran codes to GPUs with outstanding performance.
More webinars will follow shortly.
Piero Lanucara / CINECA