Memory-Based, Locality-Driven Supercomputer Affinity
Edgar Leon
Abstract
Computer systems are becoming increasingly complex and may include
many-core technologies, hybrid machines with throughput-optimized
cores and latency-optimized cores, and multiple levels of memory. The
complexity involved in extracting the performance benefits these
systems offer challenges the productivity of computational scientists
greatly. A significant part of this challenge involves mapping
parallel applications efficiently to the underlying hardware. A poor
mapping may result in dramatic performance loss. Furthermore, an
application mapping is often machine-dependent breaking portability to
favor performance.
In this talk, I will present mpibind, a memory-driven algorithm to map
parallel hybrid applications to the underlying architecture
transparently from the point of view of applications. This tool
employs a simple interface for computational scientists and results in
a full mapping of MPI tasks, threads, and GPU kernels to hardware
processing units and memory domains. Furthermore, scientists do not
have to deal with intricate details of the hardware topology and thus
increasing their productivity. At Lawrence Livermore National
Laboratory, we use mpibind to bridge the gap between performance and
portability on commodity technology systems as well as advanced
technology systems.