6
–
SHMEM Description and Configuration
Progress Model
6-12
IB0054606-02 A
Active Progress
In the active progress mode SHMEM progress is achieved when the application
calls into the SHMEM library. This approach is well matched to applications that
call into SHMEM frequently, for example, to have a fine grained mix of SHMEM
operations and computation. This mix is typical of many SHMEM applications.
Applications that spend large amount of contiguous time in computation without
calling SHMEM routines will cause SHMEM progress to be delayed for that period
of time. Additionally, applications must not poll on locations waiting for puts to
arrive without calling SHMEM, since progress will not occur and the program will
hang. Instead, SHMEM applications should use one of the wait synchronization
primitives provided by SHMEM. In active progress mode QLogic SHMEM will
achieve full performance.
Passive Progress
In the passive progress mode SHMEM progress will continue to occur when the
application calls into SHMEM, but can additionally occur in the background when
the application is not calling into SHMEM. This is achieved using an additional
progress thread per PE. The progress thread is provided by PSM and is
scheduled at a relatively low frequency, typically 10 to 100 times a second. This
thread will cause independent SHMEM progress where required, both on the
initiator side and the target side of SHMEM operations. In this mode applications
can poll on locations waiting for puts to arrive without calling SHMEM. Progress
will be achieved in this case by the progress thread, though it will incur the
scheduling latency for the progress thread which may have a significant impact on
overall performance if this idiom is used frequently. The scheduling frequency of
the PSM progress thread can be tuned as described in the
section.
Other performance effects of using passive progress include the following:
The progress thread consumes some CPU cycles, though this is low
because the progress thread runs infrequently.
The SHMEM library uses additional locks in its implementation to protect its
data structures against concurrent updates from the PE thread and the
progress thread. There is a slight additional cost in the performance critical
path because of this locking. This cost is minimal because contention on the
lock is very low (the progress thread runs infrequently) and because each
progress thread runs on the same CPU core as the corresponding PE
thread (giving good cache locality for the lock).
Summary of Contents for OFED+ Host
Page 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...
Page 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...
Page 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...
Page 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...
Page 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...
Page 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...
Page 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...
Page 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...
Page 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...
Page 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...
Page 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...
Page 283: ......