Linpack benchmark superocmputers
The number of processors is automatically detected at runtime, and decomposed into, where is the total number of processors. Each processor is responsible for a subset of rows corresponding to a local domain of size, chosen by the user in the setup file. Rows in the sparse matrix represent points in the grid. The particular sparse linear system used in HPCG is a simple elliptic partial differential equation discretized with a 27-point stencil on a regular 3D grid. The PCG algorithm solves a sparse linear system given an initial guess. In this post we present the steps taken to obtain high performance of the HPCG benchmark on GPU-accelerated clusters, and demonstrate that our GPU-accelerated HPCG results are the fastest per-processor results reported to date. Obtaining good acceleration on the GPU for the HPCG benchmark is more challenging due to the limited parallelism and memory access patterns of the computational kernels involved. GPU-accelerated supercomputers have proven to be very effective for accelerating compute-intensive applications like HPL, especially in terms of power efficiency. The PCG algorithm better represents the computational and communication patterns prevalent in modern application workloads which rely more heavily on memory system and network performance than HPL. This new benchmark solves a large sparse linear system using a multigrid preconditioned conjugate gradient (PCG) algorithm.
The High Performance Conjugate Gradient Benchmark ( HPCG) is a new benchmark intended to complement the High-Performance Linpack ( HPL) benchmark currently used to rank supercomputers in the TOP500 list.