Totalview® for HPC User Guide : PART I Introduction to Debugging with TotalView : Chapter 1 About TotalView : CUDA Debugger
CUDA Debugger
The TotalView CUDA debugger is an integrated debugging tool capable of simultaneously debugging CUDA code that is running on the host Linux-x86_64 and the NVIDIA® GPU. CUDA support is an extension to the standard version of Linux-x86_64 TotalView, and is capable of debugging 64-bit CUDA programs on Linux-x86_64. Debugging 32-bit CUDA programs is currently not supported.
Supported major features:
Debug a CUDA application running directly on GPU hardware
Set breakpoints, pause execution, and single step in GPU code
View GPU variables in PTX registers, and in local, parameter, global, or shared memory
Access runtime variables, such as threadIdx, blockIdx, blockDim, etc.
Debug multiple GPU devices per process
Support for the CUDA MemoryChecker
Debug remote, distributed and clustered systems
All Linux-x86_64 host debugging features are supported, except ReplayEngine
 
RELATED TOPICS 
 
The CUDA debugger
The CLI dcuda command
dcuda in the TotalView for HPC Reference Guide