D
–
Troubleshooting
OpenFabrics and InfiniPath Issues
D-6
IB0054606-02 A
MPI Job Failures Due to Initialization Problems
If one or more nodes do not have the interconnect in a usable state, messages
similar to the following appear when the MPI program is started:
userinit: userinit ioctl failed: Network is down [1]: device
init failed
userinit: userinit ioctl failed: Fatal Error in
keypriv.c(520): device init failed
These messages may indicate that a cable is not connected, the switch is down,
SM is not running, or that a hardware error occurred.
OpenFabrics and InfiniPath Issues
The following sections cover issues related to OpenFabrics (including Subnet
Managers) and InfiniPath.
Stop Infinipath Services Before Stopping/Restarting
InfiniPath
The following Infinipath services must be stopped before
stopping/starting/restarting InfiniPath:
QLogic Fabric Manager
OpenSM
SRP
Here is a sample command and the corresponding error messages:
#
/etc/init.d/openibd stop
Unloading infiniband modules: sdp cm umad uverbs ipoib sa
ipath mad coreFATAL:Module ib_umad is in use.
Unloading infinipath modules FATAL: Module ib_qib is in use.
[FAILED]
Summary of Contents for OFED+ Host
Page 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...
Page 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...
Page 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...
Page 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...
Page 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...
Page 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...
Page 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...
Page 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...
Page 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...
Page 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...
Page 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...
Page 283: ......