Development of load balancing algorithm based on analysis of multi-core architecture on a Beowulf cluster

Damian Valles, University of Texas at El Paso


In this work, analysis, and modeling were employed to improve the Linux Scheduler for HPC use. The performance throughput of a single compute-node of the 23 node Beowulf cluster, Virgo 2.0, was analyzed to find bottlenecks and limitations that affected performance in the processing hardware where each compute-node consisted of two quad-core processors with eight gigabytes of memory. The analysis was performed using the High Performance Linpack (HPL) benchmark. ^ In addition, the processing hardware of the compute-node was modeled using an Instruction per Cycle (IPC) metric that was estimated using linear regression. Modeling data was obtained by using the Tuning CacheEdge program, which is part of the ATLAS libraries, and collected using the PerfMonitor program. The model presented a peak IPC throughput and higher Level 2 (L2)-cache memory hit rate with a five thread concurrency for the eight processing cores. ^ Modifications were made to the Linux Scheduler in order to improve the performance throughput using the results obtained from the hardware analysis and model which indicated potential bottlenecks at the processor Front-Side Busses (FSBes), Memory Controller Hub (MCH), and L2-caches. The modifications included: changing policy of tasks, grouping runnable tasks, load balancing with affinity assignment of the task groups, and control of process termination and feedback. ^ The results showed that this approach helped to improve performance throughput since the load balancing approach created a higher L2-cache awareness, with increased hit rate, while reducing the number of times processes accessed the FSB and MCH during execution. Performance throughput peaked with block sizes of 64 and 128 for different matrix size and problem sizes, however as problem and block sizes increased, the performance throughput decreased due to hardware contentions found in the FSBes and MCH. The peak was due to the matching of the block sizes with the data width of the FSBes and MCH.^

Subject Area

Engineering, Computer|Engineering, Electronics and Electrical|Computer Science

Recommended Citation

Valles, Damian, "Development of load balancing algorithm based on analysis of multi-core architecture on a Beowulf cluster" (2011). ETD Collection for University of Texas, El Paso. AAI3490305.