Maintaining and Servicing the NVIDIA DGX-1
www.nvidia.com
NVIDIA DGX-1
DU-08033-001 _v13.1 | 97
12.17.1010
12.17.1010
12.17.1010
12.17.1010
The latest InfiniBand firmware version supported on DGX-1 OS release 1.0 is
12.16.1020, and the latest supported on release 2.0 is 12.17.1010.
6.
If you need to update the firmware, follow these steps:
a) Initiate the firmware update.
$
sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl
The script will check the firmware version of each card and update where
needed. If the firmware is updated for any card, you will need to reboot the
system for the changes to take effect.
b) Reboot the system if instructed.
c) After rebooting the system, verify that all the Mellanox InfiniBand cards are
using the current firmware.
$
cat /sys/class/infiniband/mlx5*/fw_ver
12.17.1010
12.17.1010
12.17.1010
12.17.1010
7.
Verify the physical port state for the InfiniBand cards.
$
ibstat
In the output text, verify that the Physical State for each card with a cable connection
is
LinkUp
and that the port for the card is configured with a GUID. The following
example output shows one card in a non-connected state, and three cards in a
connected state. Relevant text is highlighted in bold.
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.1010
Hardware version: 0
Node GUID: 0x248a0703000de288
System image GUID: 0x248a0703000de288
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0x2651e848
Port GUID: 0x248a0703000de288
Link layer: InfiniBand
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.1010
Hardware version: 0
Node GUID: 0x248a0703000de26c
System image GUID: 0x248a0703000de26c
Port 1:
State: Initializing