Scalable Tools Workshop

Efficient Instrumentation and Tracepoint Insertion for GPU Compute Kernels

Sébastien Darche

Abstract

Reducing instrumentation noise and overhead in software development tools is a major factor in providing insightful performance reports for programmers. This is particularly challenging when tracing highly parallel GPU compute kernels, which poses many challenges for instrumentation, data movement, and trace analysis. Furthermore, while complex kernels usually benefit the most from tracing tools, the instrumentation overhead often strongly correlates with the kernel complexity. Thus, GPU compute kernel tracing is a prime candidate for improvement.

We propose a method for efficient tracepoint placement in GPU compute kernels, by leveraging properties derived from static analysis of the control flow graph (CFG) at compilation time. This is enabled by GPUs relying on stack-based SIMT control flow, allowing for postmortem computation of vector control flow data. Compared to current tracing methods, our approach can reduce the number of instrumentation points, while guaranteeing the same level of detail when processing the trace.

We evaluate the reference implementation of our method on a comprehensive scientific computing benchmark, obtaining on average a reduction of 59% on the total number of inserted tracepoints, thus reducing runtime overhead and total trace size, when tracing a program. The reference implementation is freely available and integrated into a complete GPU tracing tool, ready for use by programmers.

Scalable Tools Workshop

Granlibakken Resort
Lake Tahoe, California
Sunday, July 6 through Thursday, July 10, 2025

Tuesday, July 8
Talks and Working Groups

Mountain Room

Binary Analysis and Instrumentation in Dyninst on SIMD/SIMT Code for AMD GPUs

Hsuan-Heng Wu and Ronak Chauhan

Abstract

Luthier, a Dynamic Binary Instrumentation Framework Targeting AMD GPUs

Matin Raayai-Ardakani, Norman Rubin, and David Kaeli

Abstract

Efficient Instrumentation and Tracepoint Insertion for GPU Compute Kernels

Sébastien Darche

Abstract

Scalable Tools Workshop

Granlibakken Resort Lake Tahoe, California Sunday, July 6 through Thursday, July 10, 2025

Tuesday, July 8Talks and Working Groups

Mountain Room

Binary Analysis and Instrumentation in Dyninst on SIMD/SIMT Code for AMD GPUs

Hsuan-Heng Wu and Ronak Chauhan

Abstract

Luthier, a Dynamic Binary Instrumentation Framework Targeting AMD GPUs

Matin Raayai-Ardakani, Norman Rubin, and David Kaeli

Abstract

Efficient Instrumentation and Tracepoint Insertion for GPU Compute Kernels

Sébastien Darche

Abstract

Granlibakken Resort
Lake Tahoe, California
Sunday, July 6 through Thursday, July 10, 2025

Tuesday, July 8
Talks and Working Groups