GPU Mekong Project - Simplified Multi-GPU Programming

Aims & Objectives

The main objective of (GPU) Mekong is to provide a simplified path to scale out the execution of GPU programs from one GPU to almost any number, independent of whether the GPUs are located within one host or distributed at the cloud or cluster level. Unlike existing solutions, this work proposes to maintain the GPU’s native programming model, which relies on a bulk-synchronous, thread-collective execution; that is, no hybrid solutions like OpenCL/CUDA programs combined with message passing are required. As a result, we can maintain the simplicity and efficiency – in terms of both compute performance and energy consumption – of GPU computing in the scale-out case, together with a high productivity and performance. Mekong (formerly GCUDA) received funding from Google in form of a research award. Mekong has just been granted additional BMBF funding.

About the name

With Mekong we are actually referring to the Mekong Delta, a huge river delta in southwestern Vietnam that transforms from one of the longest rivers of the world into an abundant number of distributaries, before this huge water stream is finally emptied in the South China Sea. It forms a large triangle that embraces a variety of physical landscapes, and is famous among backpackers and tourists as travel destination.

What actually motivated us to choose Mekong as a name, is the fact that a single huge stream is transformed into a large number of distributaries; an effect that we are also seeing in our GPU project: Mekong as a project gears to transform a single data stream into a large number of smaller streams that embrace smaller islands (computational units, memory) that mostly operate independently except for interactions like data distribution, communication, and synchronization.

The Mekong project was previously called GCUDA, and you might find a few reference to this old name. As we moved from CUDA to OpenCL as the primary input language, the previous name was changed to reflect its new focus.

Research Topics

  • Identification and implementation of mini-apps (small showcases)
  • Implementation of a platform for the evaluation of both compute performance and energy consumption
  • Modelization in order to predict compute performance and energy consumption
  • Hardware-aware optimizations
  • Domain-specific relaxations


The project is founded by the Federal Ministry of Education and Science of Germany BMBF.




Project Link


In preparation