4
–
Running MPI on QLogic Adapters
Open MPI
4-20
IB0054606-02 A
Job Blocking in Case of Temporary IB Link Failures
By default, as controlled by
mpirun’s
quiescence parameter
-q
, an MPI job is
killed for quiescence in the event of an IB link failure (or unplugged cable). This
quiescence timeout occurs under one of the following conditions:
A remote rank’s process cannot reply to out-of-band process checks.
MPI is inactive on the IB link for more than 15 minutes.
To keep remote process checks but disable triggering quiescence for temporary
IB link failures, use the
-disable-mpi-progress-check
option with a
nonzero
-q
option. To disable quiescence triggering altogether, use
-q 0
. No
matter how these options are used, link failures (temporary or other) are always
logged to
syslog
.
If the link is down when the job starts and you want the job to continue blocking
until the link comes up, use the
-t -1
option.
LD_LIBRARY_PATH
This variable specifies the path to the run-time
library.
Default:
Unset
Table 4-7. Environment Variables Relevant for Open MPI
Name
Description
OMPI_COMM_WORLD_SIZE
This environment variable selects the number of
processes in this process' MPI Comm_World
OMPI_COMM_WORLD_RANK
This variable is used to select the MPI rank of this
process
OMPI_COMM_WORLD_LOCAL_RANK
This environment variable selects the relative rank
of this process on this node within it job. For
example, if four processes in a job share a node,
they will each be given a local rank ranging from 0
to 3.
OMPI_UNIVERSE_SIZE
This environment variable selects the number of
process slots allocated to this job. Note that this
may be different than the number of processes in
the job.
Table 4-6. Environment Variables Relevant for any PSM (Continued)
Name
Description
Summary of Contents for OFED+ Host
Page 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...
Page 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...
Page 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...
Page 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...
Page 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...
Page 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...
Page 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...
Page 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...
Page 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...
Page 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...
Page 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...
Page 283: ......