/lib/modules/2.6.18-194.el5/updates/kernel/net/sunrpc/auth_gss/auth_rpcgss.ko
/lib/modules/2.6.18-194.el5/updates/kernel/fs/exportfs/exportfs.ko
3.
Rename all of the above files to use the following suffix:
/path/name.ofed
. For example:
mv /lib/modules/2.6.18-194.el5/updates/kernel/fs/nfs/nfs.ko
/lib/modules/2.6.18-194.el5/updates/kernel/fs/nfs/nfs.ko.ofed
4.
Clean up the modules with the
depmod -a
command and reboot the nodes. A reboot is
necessary for the changes to take effect.
"depmod -a" , "reboot"
5.
Execute the following commands on each node to ensure that the modules are loaded on
startup:
chkconfig openibd on
service openibd start
6.
This step is needed only if you have unmanaged InfiniBand switches in your network. If the
subnet manager runs on managed switches, skip this step.
The Subnet Manager
opensmd
must be running on at least one file serving node. Run the
command
/usr/sbin/sminfo
as
root
to determine whether
opensmd
is running on the
IB network.
If
opensmd
is not running, issue the following commands:
chkconfig opensmd on
service opensmd start
7.
Verify the status of the HCA.
NOTE:
If you are using Host Based SM, by default it is tied to Port1 of the HCA.
Run the following checks:
* ofed_info
* ibstat
* ibclearcounters
* ibdiagnet -lw 4x -ls 10 -r
8.
Verify that the link is up and the state is
active
. If the state is
initializing
, there is no
subnet manager running on the fabric. See step 6.
Troubleshooting the InfiniBand network
Force connected mode for a file serving node:
/sys/class/net/ib0
"echo connected > /sys/class/net/ib0/mode"
"ifconfig ib0 mtu 65520"
NOTE:
For Windows WinOF (OFFED) IB client connectivity, check Windows Sockets Direct
(wsd). This must be enabled for Windows.
Troubleshoot physical errors (logical, sim erros, and so on).
Note the following:
•
Use
ibstat
to check errors on InfiniBand nodes.
•
Use
ibclearcounters
to watch for error counter increments.
•
Check
/sys/class/infiniband/mthca0/ports/1/counters
.
•
symbol_error
and
port_rcv_erros
are physical hardware failures.
Troubleshooting the InfiniBand network
145