A case study of accelerator performance
In recent years the designs of High Performance Computing (HPC) clusters have become more complex. This is due to the emergence of new processing elements, in particular Graphics Processing Units (GPUs) and other many-core processors that can be combined with multi-core processors to enhance application performance. The design of a cluster includes processing elements that meet the needs of the applications that will run on the system. Unfortunately, it has become increasingly difficult to compare the performance of novel many-core processing elements due to the differences in their architectures. This work describes an attempt to develop a methodology for comparing the architectures of three many-core processing elements, which are called accelerators, that are used in several existing HPC systems, i.e., the Fermi, Kepler, and MIC architectures. Using the LULESH 1.0 proxy application, which has been ported to processing elements with different architectures and programming models, we compared the number of instructions executed per cycle (IPC), memory behavior, vectorization capacity, and power energy consumption of these three architectures, as well as of the multi-core Sandy Bridge processor, the performance of which was used as a baseline for comparison. This study showed that (1) the Kepler architecture achieved the best execution-time performance, while consuming the least power/energy; and the Kepler’s superior execution-time performance is due to LULESH’s vectorization usage and high IPCs.^
Gallardo, Esthela, "A case study of accelerator performance" (2015). ETD Collection for University of Texas, El Paso. AAI10000797.