Totalview® for HPC Reference Guide : PART III Platforms and Operating Systems : Chapter 10 Operating Systems : Debugging Your Program’s Dynamically Loaded Libraries : dlopen Options for Scalability : Filtering dlopen Events
Filtering dlopen Events
Two state variables and their related command line options enable you to filter dlopen events to plant breakpoints in the dlopened libraries only when the process stops for some other reason.
dlopen event filtering is controlled by the settings on two state variables, TV::dlopen_always_recalculate and TV::dlopen_recalculate_on_match, and their related command line options dlopen_always_recalculate and dlopen_recalculate_on_match
Three possible dlopen filtering modes are made possible by these variables: Slow, Medium and Fast.
In Fast mode, the process never stops for a dlopen event, not even "null" dlopen events. Using this option can result in significant performance gains, but may be impractical for some applications.In Medium mode, some libraries can be specified to always reevaluate their breakpoints, rather than all or none.
Slow Mode: Reloads libraries on every dlopen event
Option:
dlopen_always_recalculate==true
Reloads libraries on every dlopen event, retaining TotalView’s traditional breakpoint reevaluation semantics. This mode is compatible with CUDA and is a good choice when your session has pending breakpoints. However, this mode does not perform or scale as well as the other modes, because it requires the TotalView client to handle every (non-null) dlopen event for every process.
If performance is not the primary concern, or the application or runtime environment does not perform many dlopen events, then this may be a good choice.
In this mode, when the target stops with a dlopen event, the server reports the event to the client, where the library list is reloaded and checked to see if any additional breakpoint locations need to be planted in the newly loaded libraries
Medium Mode: Reports only libraries that match defined patterns on a dlopen event
Options:
dlopen_always_recalculate==false
dlopen_recalculate_on_match=="glob-list"
A glob-list is a colon-separated list of simple glob patterns used to compare and match the dlopened library. A simple glob pattern is a string, optionally ending with asterisk character ('*') For example:
dlopen_recalculate_on_match=="libcuda.so*:libmylib1*:libmylib2.so"
This mode strikes a balance between performance and enabling breakpoints to be planted in dlopened libraries.
In Medium mode, the target process stops on every dlopen event (just as in Slow mode), but the event is not reported to the client unless one of the newly loaded libraries matches the provided pattern.
This setting requires:
Adding the names of any dlopened libraries to the TV::dlopen_recalculate_on_match list if you want breakpoints planted in the library when the library is loaded.
Adding "libcuda.so*" to the match list if you are debugging CUDA; otherwise TotalView will miss CUDA kernel launch events.
Fast Mode: Does not stop for dlopen events
Options:
dlopen_always_recalculate==false
dlopen_recalculate_on_match==""
This mode provides the best performance, disallowing planting breakpoints in dlopened libraries when the library is loaded. Breakpoints are planted in the dlopened libraries only when the process stops for some other reason; however, be aware with this option that an application may have executed past the point at which you want to start debugging inside the dlopened library.
Because the debugger does not plant the dlopen breakpoint in the process, the process never stops for a dlopen event, not even "null" dlopen events. While this mode may be impractical for some applications, the performance gains are significant.
Table 4 summarizes the pros and cons of each mode.
 
Table 4: dlopen Event Filtering Modes
Mode/Speed
Option
 
Slow
dlopen_always_recalculate==true
 
 
Pros:
Retains TotalView’s traditional breakpoint reevaluation semantics.
Works best with pending breakpoints.
Compatible with CUDA.
Cons:
Does not perform or scale as well as the other modes because the TotalView client handles every (non-null) dlopen event for every process.
Medium
dlopen_always_recalculate==false
dlopen_recalculate_on_match=="glob-list"
 
 
Pros:
Performs better by filtering out dlopen events.
Allows the TotalView client to process multiple dlopen events at a time.
Compatible with CUDA.
Cons:
Process stops at the dlopen breakpoint, even for "null" dlopen events.
An application may execute past the point at which you want to start debugging inside the dlopened library.
Requires adding to the match list any libraries that should have breakpoints planted when the library is loaded.
Requires adding to the match list libcuda.so* for CUDA support.
Fast
dlopen_always_recalculate==false
dlopen_recalculate_on_match==""
 
 
Pros:
Performs best by never stopping the process at dlopen events.
Allows the TotalView client to process multiple dlopen events at a time.
Cons:
Breakpoints cannot be calculated when a particular library is loaded.
Breaks CUDA support.