RWalib C Array Library User Guide > Performance Utilities > alibset
  

alibset
Adjusts performance parameters at runtime. For some background, see RWalib Introduction. The PV-WAVE API for this routine is SET_OMP.
Prototype
void alibset( wvlong nt, wvlong th, wvlong dy, wvlong tc, char *fi, wvlong pr, wvlong *fg, SYS_OMP *po, SYS_OMPTH *pt, SYS_OMPNT *pn, SYS_CACHE *pc )
Parameters
nt — (Input) Maximum number of threads for parallel operations. This is set internally with omp_set_num_threads(), so nonRWalib code can be affected. alibinit() initializes nt to the number of physical plus HT cores on the host. To leave nt unchanged from this value or from any other value, input nt as –1. ATC honors nt if nt and *fi are set in the same call to alibset().
th — (Input) Minimum number of iterations for OpenMP loops. Only RWalib is affected by th. alibinit() initializes th to 1000. To leave th unchanged from this or any other value, input th as –1. ATC supplements the universal threshold th with operation-specific thresholds and sets th to their minimum.
dy — (Input) Toggle 0/1 which disables/enables OpenMP dynamic threading. It is done internally with omp_set_dynamic(), so nonRWalib code can be affected. alibinit() initializes dy to 0. To leave dy unchanged from this or any other value, input dy as -1. As previously mentioned in the discussion of *fi, initiation of ATC can have an (overridable) effect on dynamic threading.
tc — (Input) Toggle 0/1 which disables/enables ATC. This does not affect nonRWalib code. Only ATC initiation can have an (overridable) effect (see *fi). tc is always 0 and not a true toggle unless ATC has been initiated. To leave tc unchanged from its current value, input tc as -1.
*fi — (Input) Name of a tuning-file to be loaded, thus initiating ATC. This disables OpenMP dynamic threading, but omp_set_dynamic(1) can be used to enable it, leaving ATC enabled. NonRWalib code is otherwise unaffected by ATC. ATC is disabled by default, and to leave ATC disabled or to leave it enabled with a particular tuning-file already in-use, input *fi as NULL.
pr — (Input) If nonzero, the values of all performance parameters are printed to stdout. On a 12-core hyper-threaded host for example, the printout for the default parameter settings is:
OpenMP:      Enabled
Processors:  24
Max Threads: 24
Threshold:   1000
Tuning File: 
OMP_auto:    OFF
Dynamic Threads: OFF
cache line:  64
cache l1:    32768
cache l2:    262144
cache l3:    8388608
cache l4:    67108864
where the OpenMP label shows 'Enabled' if the current value of nt is greater than 1, the Processors label shows the number of physical plus HT cores, the Max Threads label shows the current value of nt, the Threshold label shows the current value of th, the Tuning File label shows the current value of *fi, the OMP_auto label shows the current value of tc, the Dynamic Threads label shows the current value of dy, and where the remaining labels show current settings for cache sizes (see *pc).
*fg — (Output) If not NULL, receives a copy of the current value of tc.
*po — (Output) If not NULL, receives a copy of the structure containing the current values of nt, th, and dy:
    typedef struct {
      wvlong nthreads;
      wvlong threshold;
      wvlong dynamic;
    } SYS_OMP;
This structure is defined in alib.h.
*pt — Used only by PV-WAVE so the RWalib user should input it as NULL.
*pn — Used only by PV-WAVE so the RWalib user should input it as NULL.
*pc — (Input/Output) Pointer to a SYS_CACHE structure which can be used to inform RWalib about data-cache sizes on the host. This parameter has no affect outside of RWalib and is defined in alib.h as
    typedef struct {
      wvlong line;  /* cache line size */
      wvlong l1;    /* l1 (smallest) cache size */
      wvlong l2;    /* l2 (second smallest) cache size */
      wvlong l3;    /* l3 (third smallest) cache size */
      wvlong l4;    /* not currently used */
    } SYS_CACHE;
alibinit() initializes these fields to 64, 32768, 262144, 8388608, and 67108864. To leave the fields unchanged from these or any other values, input *pc as NULL. If a tuning-file *fi is loaded then cache sizes are updated from the file, and if a nonNULL *pc is present when the file is loaded, it receives a copy of the updated structure.
Example 1
Consider for example a machine with 12 hyper-threaded cores and with cache sizes which match all default SYS_CACHE fields except L3. In the required call to alibinit(), we retrieve the total number np of physical plus HT cores and retrieve the default cache sizes pc. We correct the cache structure and include it in our call to alibset(). Also included in the call are changes to nt and th as well as a nonzero value for the print flag pr. The changes to nt and th are what one might want for an application dominated by inexpensive operations where too many threads and a low parallelization threshold can be liabilities.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "alib.h"
void main() {
   wvlong np;
   SYS_CACHE pc;
   alibinit( &np, NULL, NULL, &pc );
   pc.l3 = 12e6;
   alibset( np/2, 10000, -1, -1, NULL, 1, NULL, NULL, NULL, NULL, &pc );
}
 
Output:
 
OpenMP:      Enabled
Processors:  24
Max Threads: 12
Threshold:   10000
Tuning File: 
OMP_auto:    OFF
Dynamic Threads: OFF
cache line:  64
cache l1:    32768
cache l2:    262144
cache l3:    12000000
cache l4:    67108864
Example 2
Alternatively, for the same host we can load a tuning-file which in-turn loads all performance parameters and initiates ATC, effectively replacing universal constants th and nt with a set of operation-specific values. The Max Threads label shows the default value of 24 for nt, but with ATC enabled RWalib ignores this constant in favor of operation-specific values. Similarly, Threshold shows a value of 64 for th which now represents the minimum of all operation-specific parallelization thresholds. Note also that cache sizes are correctly loaded.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "alib.h"
void main() {
   alibinit( NULL, NULL, NULL, NULL );
   alibset( -1, -1, -1, -1, "loisln64", 1, NULL, NULL, NULL, NULL, NULL );
}
 
Output:
 
OpenMP:      Enabled
Processors:  24
Max Threads: 24
Threshold:   64
Tuning File: loisln64
OMP_auto:    ON
Dynamic Threads: OFF
cache line:  64
cache l1:    32768
cache l2:    262144
cache l3:    12582912
cache l4:    67108864

Version 2017.1
Copyright © 2019, Rogue Wave Software, Inc. All Rights Reserved.