Totalview® for HPC User Guide : PART IV Advanced Tools and Customization : Chapter 20 Setting Up Parallel Debugging Sessions : Debugging Cray XT Applications : Cray Linux Environment (CLE) : Support for Cray Abnormal Termination Processing (ATP)
Support for Cray Abnormal Termination Processing (ATP)
Cray's ATP module stops a running job at the moment it crashes. This allows you to attach TotalView to the held job and begin debugging it. To hold a job as it is crashing you must set the ATP_HOLD_TIME environment variable before launching your job with aprun.
When your job crashes, aprun outputs a message stating that your job has crashed and that ATP is holding it. You can now attach TotalView to aprun using the normal attach procedure (see "Attaching to a Running Program".
For more information on ATP, see the Cray intro_atp man page.