Efficient, scalable, parallel, matrix-matrix multiplication
For the past decade, power/energy consumption has become a limiting factor for large-scale and embedded High Performance Computing (HPC) systems. This is especially true for systems that include accelerators, e.g., high-end computing devices, such as Graphics Processing Units (GPUs), with terascale computing capabilities and high power draws that greatly surpass that of multi-core CPUs. Accordingly, improving the node-level power/energy efficiency of an application can have a direct and positive impact on both classes of HPC systems. ^ The research reported in this thesis explores the use of software techniques to enhance the execution-time and power-consumption performance of applications executed on a CPU/GPGPU compute node. We conducted this exploration in the context of parallel matrix-matrix multiplication with the goal of designing and developing a Single-precision General Matrix-matrix Multiplication (SGEMM) routine for CPU/GPGPU node execution that executes faster and/or consumes less power than competing routines. This thesis reports on this work, which resulted in an efficient, scalable, parallel single-precision matrix-matrix multiplication routine for square matrices that has comparable performance to existing routines but can multiply larger matrices and is limited by the size of the host CPU memory, instead of the size of the GPGPU device memory. ^
Engineering, Computer|Engineering, Electronics and Electrical|Computer Science
Portillo, Enrique, "Efficient, scalable, parallel, matrix-matrix multiplication" (2013). ETD Collection for University of Texas, El Paso. AAI1552278.