## Advanced Practical Course (IFP): Energy-efficient High Performance Computing

### Prof. Dr. Vincent Heuveline, Martin Wlotzka, Martin Baumann, Sabine Richling

Numerical simulations have been established as a third pillar of scientific research besides theoretical analysis and experiments. The performance of the hardware in the area of high performance computing (HPC) has risen well above the petaflop range, driven by the increasing demand for computing power in many fields of science. Today, the HPC community prepares for the exascale era. This brings, in addition to the runtime performance, also the energy consumption into focus. The energy consumption of many HPC systems has reached a critical level, where the energy costs exceed the hardware acquisition costs already after a few years.

This course builds a bridge from education to research. The topics are closely related to current research activities in the Exa2Green project:

**Multi-GPU algorithms for asynchronous iteration methods and extension to distributed memory systems**

The goal is to implement asynchronous iteration methods for solving linear systems of equations on graphics processing units (GPU). Compared to common CPUs, the GPUs can yield a higher computational performance for a lower energy consumption. To exploit this advantage, the implementation of the methods must account for the particular hardware properties.**Parallel mixed-precision methods**

The goal is to develop a parallel implementation of mixed-precision methods. These iterative methods for the solution of linear systems of equations yield a sequence of approximations to the solution by means of a defect correction. The algorithms use both high precision and low precision data formats. The linear system is only solved in the low precision format, and its solution is used as a correction for the high precision approximation.**Parallel interpolation operators for energy-efficient geometric multigrid methods**

The goal is to implement an energy-aware geometric multigrid method. The sequence of decreasing problem sizes allows to decrease the number of active nodes until the lowest grid level is reached. Once the system is solved on the lowest grid level, the number of active nodes is increased again, simultaneously to the prolongation of the solution approximation.**Benchmarking of numerical algorithms and simulations by energy and runtime performance measurements**

The goal is to define benchmarks for energy consumption and runtime performance of numerical algorithms and simulations, and to conduct numerical experiments. External and internal power meters are used to measure the energy consumption of individual hardware components. The measurement data is fused with profiling data from energy and parallel performance profiling tools, to analyse the behaviour of the algorithms with respect to power consumption and runtime performance.