background image

Indicators on the Rear Panel

Check the following indicators on the server rear panel:

Power indicator

UID indicator

Network port status indicator

Fan module indicators

E9000 switch module indicators

E9000 management module indicators

7.2.3 Using SmartKit to Perform Health Inspection

Use SmartKit to inspect server health status. SmartKit provides the following

functions:

Supports inspection for racks servers, high-density servers, blade servers,

KunLun servers, and heterogeneous servers, and allows users to export

inspection reports.

Supports inspection for mainstream OSs including SLES, RHEL, CentOS,

VMware, Ubuntu, and Windows, and allows users to export inspection reports.

Supports batch log collection for BMC and blade server management

modules, and supports SLES, RHEL, and CentOS mainstream versions.

Supports batch upgrade for BMC, BIOS, CPLD, and Smart Provisioning

firmware of rack servers, high-density servers, blade servers, KunLun servers,

and heterogeneous servers.

Supports firmware bundle upgrade by using the E9000 active management

module.

Supports batch configuration for PSUs, BIOSs, BMCs, and RAID controller

cards of rack servers, high-density servers, blade servers, KunLun servers, and

heterogeneous servers.

Supports batch configuration for E9000 management modules.

NO TE

Inspection and log collection do not modify data, collect service data, or affect services, and

will delete the collection scripts and files when finished.

For details about the supported server models and inspection operations, see the

FusionServer Tools 2.0 SmartKit User Guide

.

7.2.4 Checking the System Status Through iBMC

Prerequisites

You can log in to the iBMC WebUI.

Procedure 1 (For iBMC V549 and Earlier)

Step 1 Log in to the iBMC WebUI. For details, see 

8.9 Logging In to the iBMC WebUI

.

Huawei Servers

Troubleshooting

7 Preventive Maintenance

Issue 20 (2020-09-25)

Copyright © Huawei Technologies Co., Ltd.

118

Summary of Contents for 3010

Page 1: ...Huawei Servers Troubleshooting Issue 20 Date 2020 09 25 HUAWEI TECHNOLOGIES CO LTD ...

Page 2: ...cribed in this document may not be within the purchase scope or the usage scope Unless otherwise specified in the contract all statements information and recommendations in this document are provided AS IS without warranties guarantees or representations of any kind either express or implied The information in this document is subject to change without notice Every effort has been made in the prep...

Page 3: ...ence This document is intended for Technical support engineers Maintenance engineers Symbol Conventions The symbols that may be found in this document are defined as follows Symbol Description Indicates a hazard with a high level of risk which if not avoided will result in death or serious injury Indicates a hazard with a medium level of risk which if not avoided could result in death or serious i...

Page 4: ...020 07 16 This issue is the nineteenth official release Added information about the Atlas 800 AI training server model 9010 18 2020 05 12 This issue is the eighteenth official release Changed FusionServer G5500 to FusionServer Pro G5500 17 2020 04 29 This issue is the seventeenth official release Deleted contents related to the ServiceCD 16 2019 09 30 This issue is the sixteenth official release A...

Page 5: ...th official release Added description that faulty E9000 compute nodes cannot be reseated in 5 5 Checking Indicators to Locate Faults 05 2016 10 27 This issue is the fifth official release Added the quick recovery method for E9000 switch modules in 5 6 Handling Faults Based on Symptoms 04 2016 07 11 This issue is the fourth official release Modified 4 4 2 7 Using the Switch Module CLI to Collect FC...

Page 6: ...ng Indicators to Locate Faults Added description about how to collect FreeBSD and Solaris host information in 4 2 Collecting OS Logs 01 2015 10 09 The issue is the first official release Huawei Servers Troubleshooting About This Document Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd v ...

Page 7: ...Information 20 4 4 2 4 Using the V8 Switch Module CLI to Collect Ethernet Switching Plane Information 23 4 4 2 5 Using the Web Tools Page of a Switch Module to Collect FC Switching Plane Information MX510 28 4 4 2 6 Using the Switch Module CLI to Collect FC Switching Plane Information MX510 30 4 4 2 7 Using the Switch Module CLI to Collect FC Switching Plane Information MX210 MX220 32 4 5 Collecti...

Page 8: ... the MM910 WebUI to Collect Information in Batches for Versions Earlier Than U54 2 20 138 8 5 Using the MM910 WebUI to Collect Information in Batches for U54 2 20 or Later 138 8 6 Using the FusionDirector WebUI to Collection Information in Batches 139 8 7 Using the MM510 CLI to Collect Information FusionServer Pro G5500 139 8 8 Logging In to the iMana 200 WebUI 140 8 9 Logging In to the iBMC WebUI...

Page 9: ...FTP to Transfer Files 174 9 Other Resources 176 9 1 Obtaining Technical Support 176 9 2 Product Information Resources 177 9 3 Product Configuration Resources 178 9 4 Maintenance Tools 178 Huawei Servers Troubleshooting Contents Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd viii ...

Page 10: ...erating the device in residential areas Personal Safety Only personnel certified or authorized by Huawei are allowed to install equipment or its components Discontinue any dangerous operations and take protective measures Report anything that could cause personal injury or equipment damage to a project supervisor Do not move devices or install cabinets and power cables in hazardous weather conditi...

Page 11: ... Conductive objects to be removed Figure 1 3 shows how to wear an ESD wrist strap 1 Secure the wrist strap around your wrist 2 Fasten the strap buckle and ensure that the ESD wrist strap is snug against the skin 3 Insert the attached ground terminal into the jack on the grounded rack or chassis Huawei Servers Troubleshooting 1 Safety Instructions Issue 20 2020 09 25 Copyright Huawei Technologies C...

Page 12: ... PDUs for active standby operation Transportation Precautions The logistics company engaged to transport the equipment must be reliable and comply with international standards for transporting electronics Ensure that the equipment being transported is always kept upright Take necessary precautions to prevent collisions corrosion package damage damp conditions and pollution Transport the equipment ...

Page 13: ...rganization Weight kg lb European Committee for Standardization CEN 25 55 13 International Organization for Standardization ISO 25 55 13 National Institute for Occupational Safety and Health NIOSH 23 50 72 Health and Safety Executive HSE 25 55 13 General Administration of Quality Supervision Inspection and Quarantine of the People s Republic of China AQSIQ Male 15 33 08 Female 10 22 05 Huawei Serv...

Page 14: ...roubleshooting complexity identify the root cause and rectify the fault Figure 2 1 shows the recommended troubleshooting process Figure 2 1 Troubleshooting flowchart Table 2 1 Troubleshooting steps Step Description 3 Preparing for Troubleshooting Prepare the manuals and tools required for fault diagnosis and rectification Huawei Servers Troubleshooting 2 Troubleshooting Process Issue 20 2020 09 25...

Page 15: ...ifying Faults Locate the fault and take troubleshooting measures 9 1 Obtaining Technical Support If a fault is difficult to locate or rectify after you refer to documents contact Huawei technical support Huawei Servers Troubleshooting 2 Troubleshooting Process Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 6 ...

Page 16: ... architecture Indicators on the front and rear panels Systems that run on servers Device operating conditions Common hardware operations such as power on and power off Common software operations such as upgrade Device maintenance process Essential Materials Table 3 1 lists the materials that you must read before routine maintenance for Huawei servers Huawei Servers Troubleshooting 3 Preparing for ...

Page 17: ...og in to Support Management Software Server Management Software iBMC Troubleshooting Alarm Handling 2 View the corresponding alarm handling manual Equipment Room Management Regulations Describes the regulations for equipment room management and routine maintenance Comply with the customer s equipment room management regulations during onsite maintenance Software Tools Table 3 2 lists the software ...

Page 18: ...versions Third party tool used for file transfer for the Ethernet switching plane of a switch module You can obtain the tool from the Internet CoreFTPServer mini sftp server All Huawei servers of all versions Third party tools used for file transfer for the FC switching plane of a switch module You can obtain the tool from the Internet Hardware Tools Table 3 3 lists the hardware tools required for...

Page 19: ...Used to access the management network port or a service network port over the network to capture data You need to prepare a network cable Serial cable Used to connect the serial port on the server The serial port is usually a DB9 or RJ45 port Thermometer and hygrometer Used to measure the equipment room temperature and relative humidity Oscilloscope Used to measure the voltage and time sequence Hu...

Page 20: ... HBA Logs 4 7 Collecting Other Logs 4 1 Collecting Basic Information The customer needs to collect basic information listed in Table 4 1 before submitting a service request Table 4 1 Server fault records Server fault records Trouble Ticket No Example 123456 Fault Report Time Example 2015 10 18 20 30 00 Customer Name Full name of your organization Address Example 20 Baker Street New York Customer C...

Page 21: ...s responding upon power on Action Before Fault Occurrence Example BIOS settings configuration memory capacity expansion network settings modification Action and Result After Fault Occurrence Optional Example After the power cable is disconnected and then reconnected the fault persists After the DVD ROM is replaced the fault persists 4 2 Collecting OS Logs Collect OS logs after an OS fault occurs N...

Page 22: ... occurs 3 Hot restart the system and run the vm support command to collect all VMware logs 4 After logs are collected check that a log file in the esxsupport YYYY MM DD HH MM SS tgz format is generated in the var tmp directory If the PSOD occurs and the customer hot restarts the system run vm support to collect all of the VMware logs and check that a log file in the esxsupport YYYY MM DD HH MM SS ...

Page 23: ...r Guide 4 4 Collecting Switch Module Logs for E9000 MM910 4 4 1 Preparing for Log Collection 4 4 1 1 Connecting a PC to the Ethernet Switching Plane Connect a PC to the Ethernet switching plane before logging in to the switching plane Procedure Step 1 Connect the Ethernet port of the PC to the management network ports of the active and standby MM910 modules over the LAN Figure 4 1 shows the networ...

Page 24: ...itch module to the same network at the same time otherwise a network storm occurs and the network is interrupted For MM910 U54 2 26 or later the MGMT port is used as the management network port by default For details about how to query the version see the MM910 Management Module V100R001 User Guide You can run the outportmode command to change the mode in which the MM910 management port is provide...

Page 25: ...e ID of the switching plane The value for the Ethernet switching plane is 2 ipaddress indicates the IP address of the management network port maskaddress indicates the subnet mask of the management network port Step 5 Optional Configure the gateway for the switching plane by running the following command so that the switching plane can communicate with the PC NOTE For stacked switching planes conf...

Page 26: ...port 192 168 9 61 Subnet mask 255 255 255 0 Floating IP address subnet mask and gateway of the MM910 IP address 10 85 4 77 Subnet mask 255 255 255 0 Gateway 10 85 4 1 Procedure Step 1 Connect the PC to the Ethernet switching plane For details see 4 4 1 1 Connecting a PC to the Ethernet Switching Plane Step 2 Log in to the CLI of the Ethernet switching plane by using the SOL function of the MM910 F...

Page 27: ...CB Version CX910_10GE VER C 2 MAB Version 1 3 Board Type CX910_10GE4 CPLD1 Version 013 5 BIOS Version 038 6 Software Version 1 2 1 0 39 If the command output contains Software Version the software version is V8 End 4 4 2 Collecting Switch Module Logs 4 4 2 1 Collection Method Table 4 4 lists the methods for collecting switch module logs Table 4 4 Methods for collecting switch module logs Switc hin...

Page 28: ...e V5 Switch Module CLI to Collect Ethernet Switching Plane Information WebUI 8 5 Using the MM910 WebUI to Collect Information in Batches for U54 2 20 or Later V8 switch modules Using the CLI 4 4 2 4 Using the V8 Switch Module CLI to Collect Ethernet Switching Plane Information WebUI 8 5 Using the MM910 WebUI to Collect Information in Batches for U54 2 20 or Later FC switchi ng plane CX311 CX911 an...

Page 29: ... User Guide 4 4 2 3 Using the V5 Switch Module CLI to Collect Ethernet Switching Plane Information Operation Scenario Use the E9000 server switch module CLI of the V5 platform to collect Ethernet switching plane information including Logs Debugging information Trap information For details about how to query the Ethernet switching plane version see 4 4 1 2 Querying the Software Version of the Ether...

Page 30: ...tch module This tool is third party software You need to prepare it by yourself Procedure Step 1 Configure the FTP server For detailed about the configuration operations see 8 20 Configuring an FTP Server Step 2 Configure the IP address of the management network port 1 After logging in to the switch module by using a serial port or the SOL function run the following commands on the switching plane...

Page 31: ...he following command to collect logs Fabric display diagnostic information diag info txt Now saving the diagnostic information to the device Info The diagnostic information was saved to the device successfully Fabric save logfile Save log file successfully 2 View the log file system Fabric dir Directory of flash Idx Attr Size Byte Date Time LMT FileName 0 rw 1 075 Apr 01 2000 23 55 17 private data...

Page 32: ...put Error The file name is invalid Error The file name is invalid 200 PORT command okay 150 F log dblg file ready to receive in IMAGE Binary mode 226 Transfer finished successfully FTP 1513938 byte s sent in 1 160 second s 1305 11Kbyte s sec 200 PORT command okay 150 F log log file ready to receive in IMAGE Binary mode 226 Transfer finished successfully FTP 2689148 byte s sent in 1 940 second s 13...

Page 33: ...tails see 8 11 Logging In to the MM910 WebUI For the MM910 versions earlier than U54 2 20 choose System Management Network Management xx IP addresses For the MM910 U54 2 20 or later choose Chassis Settings Network Settings xx Software Tools wftpd32 exe used to transfer files between different platforms for example from a PC to a switch module This tool is third party software You need to prepare i...

Page 34: ... interface MEth 0 0 0 HUAWEI MEth0 0 0 ip address 192 168 100 123 24 HUAWEI MEth0 0 0 commit HUAWEI MEth0 0 0 display this interface MEth0 0 0 ip address 192 168 100 123 255 255 255 0 return HUAWEI MEth0 0 0 quit HUAWEI quit Step 4 Obtain the log information 1 View the log file system HUAWEI system view Enter system view return user view with return command HUAWEI diagnose Warning Enter diagnose v...

Page 35: ...8 23 11 diaglog_3_20140221182310 log zip 2 rw 1 756 870 Apr 08 2014 16 18 55 diagnostic_information zip 3 rw 4 269 737 Apr 08 2014 16 45 08 log log 4 rw 354 428 Dec 22 2013 11 32 34 log_3_20131222113233 log zip 5 rw 353 715 Jan 16 2014 08 50 19 log_3_20140116085018 log zip 1 048 576 KB total 367 972 KB free 2 Query stack information Record the queried slot numbers and roles of the stacked switch m...

Page 36: ...40221182310 log zip 100 226 File received ok FTP 444920 byte s send in 0 113 second s 3845 061Kbyte s sec 200 Port command successful 150 Opening data connection for diagnostic_information zip 100 226 File received ok FTP 1756870 byte s send in 0 308 second s 5570 431Kbyte s sec 200 Port command successful 150 Opening data connection for log log 100 226 File received ok FTP 4272491 byte s send in ...

Page 37: ...itch module in the logfile directory 4 View the log file in the FTP directory on the PC End 4 4 2 5 Using the Web Tools Page of a Switch Module to Collect FC Switching Plane Information MX510 Operation Scenario Use Web Tools page of a switch module MX510 to collect information about the FC switching plane This section applies to the CX311 CX911 and CX915 Prerequisites Conditions The connection bet...

Page 38: ...e 4 2 Web Tools home page Step 2 Select the directory for storing the log file and click Start The log file download starts If Support file saved is displayed in the Status area the log file has been successfully exported See Figure 4 3 Figure 4 3 Download Support File dialog box Huawei Servers Troubleshooting 4 Collecting Information Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 29 ...

Page 39: ...4 8 Data list Parameter Example Value IP address 192 168 1 100 Subnet mask 255 255 255 0 Default gateway 0 0 0 0 The default username of the switching plane is admin and the default password is Huawei12 Software Tools mini sftp server exe used to transfer files between different platforms for example from a switch module to a PC This tool is third party software You need to prepare it by yourself ...

Page 40: ...ic IPv4 address as prompted EthIPv4NetworkDiscovery 1 Static 2 Bootp 3 Dhcp 4 Rarp 1 EthIPv4NetworkAddress dot notated IP Address 192 168 101 123 EthIPv4NetworkMask dot notated IP Address 255 255 255 0 EthIPv4GatewayAddress dot notated IPv4 Address 192 168 101 254 Do you want to save and activate this system setup y n n y 3 Query the static IPv4 address of the FCoE gateway FCoE_GW admin admin show...

Page 41: ...ng plane information This section applies to the CX210 CX220 CX912 and CX916 The FC switching planes of the CX210 and CX912 are the MX210 and those of the CX220 and CX916 are the MX220 Prerequisites Conditions The PC has been connected to the management network port of the server by using a network cable The mini sftp server exe software has been obtained Data Table 4 9 Data list Parameter Example...

Page 42: ...Pv4 FC_SW admin ipaddrset Ethernet IP Address 10 77 77 77 10 32 53 47 Ethernet Subnetmask 255 255 255 0 255 255 240 0 Fibre Channel IP Addresss none Fibre Channel Subnetmask none Gateway IP Address 0 0 0 0 10 32 48 1 DHCP Off IP address is being changed Done FC_SW admin ipaddrshow FC_SW admin ipaddrshow Ethernet IP Address 10 32 53 47 Ethernet Subnetmask 255 255 240 0 Fibre Channel IP Addresss non...

Page 43: ...g collection Host IP or Host Name specifies the address for storing logs on the target device the SFTP server IP address User Name specifies the username for logging in to the target device the SFTP server username Password specifies the password for logging in to the target device the SFTP server password Protocol specifies the transfer protocol Set this parameter to sftp Remote Directory specifi...

Page 44: ... to the FusionDirector WebUI For details see 8 12 Logging In to the FusionDirector WebUI Step 2 Choose Compute Hardware Add Device Add Online to add the chassis of the MM920 MM921 For details see the FusionDirector User Guide Step 3 Choose Compute Hardware Chassis The list of chassis managed by FusionDirector is displayed Step 4 Click the chassis name The Overview tab page is displayed as shown in...

Page 45: ...d Windows Collect system information to help technical support personnel diagnose and rectify faults Linux Collect information to help diagnose Fibre Channel FC and iSCSI HBA faults Solaris Collect data VMware Collect VMware system logs including QLogic HBA logs 4 7 Collecting Other Logs Use the following methods to collect other host logs Collect Emulex HBA logs when an NIC is faulty Use the offi...

Page 46: ...xternal devices for faults such as a power failure and peer device failure first Check the network and then network elements NEs According to the network topology check whether the network environment is normal and then check whether the NEs are normal Determine which NE is faulty if possible Check the high speed signal alarms and then the low speed signal alarms Alarm signal streams show that hig...

Page 47: ...nServer Tools Toolkit Hardware information collection Quick diagnosis Tests for CPUs drives and DIMMs Reference tools and scripts for common configuration and deployment Creation of a bootable USB flash drive for easy O M Automatic configuration diagnosis for channel partners For details about the supported server models and the methods of using the tool see the FusionServer Tools 2 0 Toolkit User...

Page 48: ...heck Configuration check Server log collection Server burn in test Device repair Firmware upgrade For details about the supported server models and the methods of using the tool see the FusionServer Tools 2 0 SmartKit User Guide 5 3 Handling Alarms This section describes how to use the server management system to handle alarms Search for alarm codes in the alarm handling manual to find the handlin...

Page 49: ...Rack Server iBMC Alarm Handling X6000 See the FusionServer Pro X6000 Server iBMC Earlier than V250 Alarm Handling or X6000 Server Alarm Handling iMana 200 X8000 See the X8000 Server V100R001 Alarm Reference X6800 See the FusionServer Pro X6800 Server iBMC Earlier than V250 Alarm Handling G2500 See the FusionServer Pro G2500 Server 1 0 0 iBMC Alarm Handling FusionServer Pro G5500 See the FusionServ...

Page 50: ...he server is operating properly N A Error code A component fault has occurred For details about error codes see Error Code Handling in the alarm handling guide of corresponding server For details about alarm handling guides see Table 5 2 Figure 5 1 Position of the fault diagnosis LED 5 5 Checking Indicators to Locate Faults For details about the positions of indicators see the sections about the f...

Page 51: ... 1 Observe the status indicators of the servers Table 5 4 Status indicators Indicator Status Meaning Diagnosis Health status indicator Steady green The server is operating properly N A Blinking red A fault alarm is generated 1 Log in to the iMana 200 or iBMC WebUI to view the alarm information For details see Basic Operations in Huawei Servers Troubleshooting 5 Diagnosing and Rectifying Faults Iss...

Page 52: ...ator Steady green The server is powered on N A Blinking yellow The iMana 200 or iBMC management system is being started In this case you cannot power on or off the server by pressing the power button Steady yellow The server is ready to power on Press the power button to power on the server If the server cannot be powered on log in to the iMana 200 or iBMC WebUI and view the alarm to rectify the f...

Page 53: ...icator and management module indicator on the rear of the chassis are steady green If yes the chassis power supply is normal If no check the external power supply 3 For the E9000 server if the power supply is normal and the PSUs are normal contact Huawei technical support Do not reseat the compute nodes or power on or off the chassis Huawei Servers Troubleshooting 5 Diagnosing and Rectifying Fault...

Page 54: ...r PSU status indicator network port indicator and FlexIO card status indicator and the corresponding handling procedures Table 5 11 describes the meanings of the indicators for each module of the RH5885 V2 RH5885 V3 and RH5885H V3 and the corresponding handling procedures Table 5 12 describes the meanings of the indicators for each module of the RH8100 and X6800 and the corresponding handling proc...

Page 55: ...kit or Smart Provisioning to check the drive faults Off Off The drive is faulty or not detected Check whether the drive is connected or log in to the iMana 200 or iBMC and use FusionServer Tools Toolkit or Smart Provisioning to check the drive faults Table 5 6 NVMe drive indicators Drive Active Indicator Drive Fault Indicator Meaning Diagnosis Steady green Off The NVMe drive is detected and workin...

Page 56: ...seat the NVMe drive Table 5 7 M 2 FRU indicators Indicator Status Meaning Diagnosis Procedure M 2 FRU fault indicator Off The M 2 FRU is running properly N A Blinking yellow The M 2 FRU is being located or the RAID is being reconstructed Steady yellow The M 2 FRU cannot be detected or is faulty 1 Check whether the M 2 FRU is in good contact 2 If the contact is normal but the fault persists replace...

Page 57: ...o the PSU and whether the PSU is normal PSU operating status indicator 2000 W 2500 W 3000 W Steady green The PSU is operating properly N A Blinking green once every 2 seconds The PSU is in hibernation or is not connected properly If the fault occurs in an E9000 server check whether hibernation settings are enabled If hibernation settings are disabled or the fault occurs in another type of server c...

Page 58: ...econd The power supply is normal The input is overvoltage or undervoltage NOTE Do not reseat a PSU Check whether the external power supply to the PSU is functioning correctly Blinking green four times every second The PSU is being upgraded online N A Steady orange The input is normal but no power output is supplied due to overheat protection overcurrent protection short circuit protection output o...

Page 59: ...twork port is connected properly N A Blinking green Data is being transmitted Off The network port is not connected 1 Connect the port to another switch optical fiber and optical module to check whether the original switch and optical fiber are normal and whether the type and speed of the original optical module are correct 2 Check whether the ports on the switch and NIC are up 3 Check whether the...

Page 60: ...the NIC is operating properly Steady green The data transmission rate is 10 Gbit s N A Off The network port is not connected 1 Connect the port to another switch optical fiber and optical module to check whether the original switch and optical fiber are normal and whether the type and speed of the original optical module are correct 2 Check whether the ports on the switch and NIC are up 3 Check wh...

Page 61: ...IC are up 3 Check whether the NIC is operating properly Data transmission rate indicator for a 10GE electrical port Steady yellow The link rate is 1 Gbit s 1 Connect the port to another switch optical fiber and optical module to check whether the original switch and optical fiber are normal and whether the type and speed of the original optical module are correct 2 Check whether the ports on the s...

Page 62: ...ansmission status indicator for the management network port Blinking yellow Data is being transmitted N A Off No data is being transmitted 1 Connect the port to another switch and network cable to check whether the original switch and network cable are normal 2 Check whether the ports on the switch and NIC are up 3 Check whether the NIC is operating properly Connection status indicator for the man...

Page 63: ...er the original switch and network cable are normal 2 Check whether the ports on the switch and NIC are up 3 Check whether the NIC is operating properly Connection status indicator for a GE electrical port Steady green The network port is connected properly N A Off The network port is not connected 1 Connect the port to another switch and network cable to check whether the original switch and netw...

Page 64: ...Table 5 10 FlexIO card indicators NIC Type Chip Mode l Port Status Network Status Operation SM210 FlexIO card 4 x GE electrical ports 5719 Active Blinking yellow Data is being transmitted on the network N A Off No data is being transmitted on the network 1 Connect the port to another switch and network cable to check whether the original switch and network cable are normal 2 Check whether the NIC ...

Page 65: ... 1 Connect the port to another switch and network cable to check whether the original switch and network cable are normal 2 Check whether the NIC is operating properly Link Steady green The network connection is normal N A Off The network connection is unavailable 1 Connect the port to another switch and network cable to check whether the original switch and network cable are normal 2 Check whethe...

Page 66: ...x 10GE electrical ports X540 Link Speed Steady green High speed 10 Gbit s N A Steady yellow Low speed 1 Gbit s 1 Connect the port to another switch and network cable to check whether the original switch and network cable are normal 2 Check whether the NIC is operating properly Off The network connection is unavailable 1 Connect the port to another switch and network cable to check whether the orig...

Page 67: ... optical port SM252 FlexIO card 2 x 56G IB optical ports CX3 Active Steady green The network connection is normal N A Blinking green The network connection is abnormal 1 Connect the port to another switch and network cable to check whether the original switch and network cable are normal 2 Check whether the NIC is operating properly Off The network connection is unavailable Link Steady yellow No d...

Page 68: ...t to another switch and network cable to check whether the original switch and network cable are normal 2 Check whether the NIC is operating properly Off The network connection is unavailable 1 Connect the port to another switch and network cable to check whether the original switch and network cable are normal 2 Check whether the NIC is operating properly Link Active Steady green No data is being...

Page 69: ...eaning Diagnosis Power indicator on a memory riser Steady green The memory riser is on N A Off The memory riser is off Memory riser fault indicator Steady red A DIMM on the memory riser is faulty Locate the faulty DIMM according to the DIMM fault locating indicator and replace the faulty DIMM with a spare one Off All DIMMs on the memory riser are normal N A DIMM fault locating indicator Steady red...

Page 70: ... the POST phase check and replace the PCIe card Off The PCIe card is operating properly N A Power indicator on a hot swappable PCIe card Steady green The power supply to the PCIe card is normal N A Blinking green The PCIe card is powering on or off Off The PCIe card is off Diagnostic panel on the RH5885 V2 server Steady green A fault alarm is generated for the server component For details see Comp...

Page 71: ...icator Status Meaning Diagnosis RH8100 V3 fan module indicator Steady green The fan module hardware or backplane is faulty or the fan module software is performing an online upgrade An online upgrade takes about 3 minutes Check whether the fan module hardware or backplane is faulty and whether the fan module software is performing an online upgrade Blinking green once every 2 seconds The fan modul...

Page 72: ...lace the fan module Steady red The fan module hardware or backplane is faulty Reseat the fan module If the alarm persists replace the fan module Blinking red The fan module has an alarm or the fan module hardware or backplane is damaged Reseat the fan module If the alarm persists replace the fan module Off The fan module is not powered on N A Fan module operating status indicator on the X6800 Stea...

Page 73: ...second The memory riser is not installed properly Check whether the memory riser is installed properly Off The memory riser is off Memory riser ATTN indicator Steady yellow The hot insertion or removal operation has failed Check whether services can be migrated or stopped After services are stopped power off and then power on the server If the indicator is off attempt to hot swap the memory riser ...

Page 74: ...figured on the memory riser N A Off Memory mirroring has not been configured on the memory riser Compute module status indicator Steady green The compute module is operating properly N A Blinking red once every second The compute module is faulty View the iBMC event alarm logs to check whether the compute module is faulty Blinking red five times every second The compute module is not installed pro...

Page 75: ...ormal 2 Check whether the ports on the switch and NIC are up 3 Check whether the NIC is operating properly Aggregated network port active status indicator Blinking orange Data is being transmitted over the network port N A Off The network port is idle 1 Connect the port to another switch and network cable to check whether the original switch and network cable are normal 2 Check whether the ports o...

Page 76: ...ent in the chassis The indicators on both the active and standby MM910s are red Check whether the MM910 is installed properly and log in to the HMM WebUI to view alarms Blinking red four times every second A critical alarm is generated for a component in the chassis and the indicators on both the active and standby MM910s are red Blinking red five times every second The MM910 is not installed prop...

Page 77: ...working slot Check whether the slot is faulty by inserting a working fan module into that slot Blinking red once every 2 seconds The fan module has reported an alarm 1 Log in to the HMM WebUI and check fan alarms 2 Check whether the power connector of the fan module is connected properly If it is connected properly replace the fan module Off The fan module has no power supply Check whether the fan...

Page 78: ...cking mode and is operating properly A switch module that cannot be stacked is being powered on Off The switch module is not powered on Health indicator HLY Steady green The switch module is operating properly N A Blinking red The switch module has a fault alarm or is not installed properly Log in to the HMM WebUI to view event alarms and check whether the switch module is installed and operating ...

Page 79: ... to check whether the original switch and optical fiber are normal and whether the type and speed of the original optical module are correct 2 Check the NIC status in the OS 3 Check whether the ports on the switch and NIC are up Connection status indicator of a 10GE optical port 25GE optical port connectivity status indicator Steady green The port is connected properly N A Huawei Servers Troublesh...

Page 80: ...heck whether the ports on the switch and NIC are up Data transmission status indicator of a 10GE optical port Data transmission status indicator of a 25GE optical port Blinking orange Data is being transmitted or received over the port N A Off No data is being transmitted over the port 40GE optical port indicator Steady green The network is connected properly N A Blinking green Data is being trans...

Page 81: ...ansmission status indicator for the 16G FC optical port Steady orange Signals are not synchronized between the port on the switch module and the port on the peer device Check whether the network cable is connected properly and whether the optical module and NIC are normal Blinking orange once every 2 seconds The port is disabled Blinking orange twice every second The port is not functioning correc...

Page 82: ...nloop diagnosis mode occurs N A Blinking green four times every second The link is connected and data is being transmitted Off If the diagnosis status indicator is off no optical module is installed or the optical module is not receiving optical signals properly Check whether the optical module is installed and operating properly and whether the optical cable is faulty Data transmission status ind...

Page 83: ...d properly or the port is not functioning correctly An overtemperature alarm is generated if the data transmission status indicator is blinking orange twice every second Check the port optical module and optical cable Off No optical module is installed or the optical module is receiving optical signals abnormally Check the optical module and optical cable InfiniBand IB optical port status indicato...

Page 84: ... green The fan module is operating properly N A Blinking red The fan module has reported an alarm 1 Log in to the iBMC WebUI and check fan alarms 2 Check whether the power connector of the fan module is properly connected or replace the fan module Table 5 18 Network port indicators Indicator Status Meaning Diagnosis Procedure 2 x 100GE optical port connection status indicators Steady green The net...

Page 85: ...ng properly 2 x 100GE optical port rate indicators Steady green The data transmission rate is 100 Gbit s N A Off The network port is not connected 1 Connect the port to another switch optical fiber and optical module to check whether the original switch and optical fiber are normal and whether the type and speed of the original optical module are correct 2 Check whether the ports on the switch and...

Page 86: ...ard in the HFC2 slot RH8100 V3 dual system primary 4P One CPU in the CPU1 slot Dual system primary 4P one PSU in any slot One memory board in slot 1 One DIMM in the DIMM000 slot One HFC board in the HFC2 slot RH8100 V3 dual system secondary 4P One CPU in CPU5 slot Dual system secondary 4P one PSU in any slot One memory board in slot 9 One DIMM in the DIMM000 slot One HFC board in the HFC1 slot RH5...

Page 87: ...e DIMM000 A slot XH321 V5 XH321L V5 One CPU in the CPU1 socket None One DIMM in the DIMM000 A slot XH628 V5 One CPU in the CPU1 socket The RAID controller card is mounted to CPU 2 If the OS drive is connected to the RAID controller card the OS cannot be accessed One DIMM in the DIMM000 A slot CH121 V5 CH242 V5 CH121L V5 and CH221 V5 One CPU in the CPU1 socket None One DIMM in the DIMM000 slot CH22...

Page 88: ...nd the power indicator is steady green POST The server is in the power on self test process Diagnose and rectify power failures depending on the symptoms NOTE If a fault can be located using logs or tools see Handling Procedure If a fault needs to be rectified quickly onsite see Quick Recovery Method For more fault symptoms and solutions see the Computing Case Library The Computing Case Library is...

Page 89: ...o 3 3 Replace the PSU with a spare PSU and check whether the fault is rectified If yes no further action is required If no go to 4 4 Replace the PSU backplane or replace the mainboard if no PSU backplane is configured Check whether the fault is rectified If yes no further action is required If no contact Huawei technical support 1 Check whether the current configuration has sufficient power suppli...

Page 90: ...esolve this issue 2 Replace the PSU with a normal one and check whether the fault is rectified If yes no further action is required If no go to 3 3 Replace the mainboard and PSU backplane and check whether the fault is rectified If yes no further action is required If no contact Huawei technical support Follow the handling procedure to replace any faulty modules Huawei Servers Troubleshooting 5 Di...

Page 91: ...the chassis cannot be connected to the power source no matter which PSU is installed replace the chassis 4 If the chassis cannot be connected to the power source after a PSU is installed replace the PSU 5 After verifying that the chassis and PSUs can be connected to the power source install only one PSU Then install the switch modules compute nodes fan modules and management modules one at a time ...

Page 92: ...tall the node into a server again If yes services are not affected If no contact Huawei technical support 2 Follow the handling procedure to replace any faulty modules 5 6 2 KVM Login Faults 1 Diagnose the fault based on the fault symptoms listed in the following table NOTE If a fault can be located using logs or tools see Handling Procedure If a fault needs to be rectified quickly onsite see Quic...

Page 93: ...ils about the operating environment requirements see the iMana 200 or iBMC help document 1 Follow the handling procedure to replace any faulty modules 2 Restart iMana 200 iBMC and replace the local PC 3 Connect the management network port to the local PC directly instead of through a switching network The KVM displays an error message If the number of login users exceeds the maximum value use the ...

Page 94: ... the port is normal attempt to mount the ISO file by using FusionServer Tools Toolkit or Smart Provisioning to check whether the ISO file is correct and upgrade the iBMC HMM iMana 200 and BIOS versions 5 6 3 POST Faults Diagnose and rectify power on self test POST faults depending on the symptoms NOTE If a fault can be located using logs or tools see Handling Procedure If a fault needs to be recti...

Page 95: ... is used by default After the startup is complete the serial port is switched for the system serial port During the iBMC startup process the serial port on a server is used by default After the startup is complete the serial port is switched for the system serial port For a rack server Atlas 800 AI inference server model 3010 Atlas 800 AI training server model 9010 perform the following operations...

Page 96: ...erver cannot be powered on by pressing the power button check whether the hardware of the component where the power button is located is faulty 3 Check whether the mainboard and DIMMs are installed properly 1 Remove the external PCIe devices such as NICs and FC HBAs Then check whether the fault is rectified If yes no further action is required If no go to 2 2 Retain only the minimum server configu...

Page 97: ...er supply link to the mainboard has failed NOTE For an E9000 server you are advised to use the MM910 for one click log collection 2 Set the printing level for debugging the BIOS with the iMana 200 or iBMC CLI restart the server and save system serial port logs When the fault is repeated collect iMana 200 or iBMC logs and download the bin file of the BIOS NOTE You can run the ipmcset t maintenance ...

Page 98: ...erial port logs When the fault is repeated collect iMana 200 or iBMC logs and download the bin file of the BIOS NOTE You can run the ipmcset t maintenance d biosprint v 1 command to print all BIOS logs For details see Querying and Setting BIOS Print Enablement Status biosprint in the iBMC User Guide of the required version 3 Restore the default BIOS settings and check whether the server operates p...

Page 99: ... biosprint v 1 command to print all BIOS logs For details see Querying and Setting BIOS Print Enablement Status biosprint in the iBMC User Guide of the required version 4 Enable the video recording function on the iMana 200 or iBMC WebUI restart the server and save system serial port logs When the fault is repeated collect iMana 200 or iBMC logs and download the bin file of the BIOS 5 Check the ex...

Page 100: ...tified If yes no further action is required If no go to 4 4 If the BBU or supercapacitor runs out of power follow the instructions shown in the displayed messages to keep the server running After the server runs for 30 minutes check the BBU or supercapacitor status If the BBU or supercapacitor is abnormal replace it NIC Preboot Execution Environm ent PXE has failed 1 Check whether the NIC supports...

Page 101: ...dure If a fault needs to be rectified quickly onsite see Quick Recovery Method For more fault symptoms and solutions see the Computing Case Library The Computing Case Library is available only to Huawei engineers and partners Huawei Servers Troubleshooting 5 Diagnosing and Rectifying Faults Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 92 ...

Page 102: ...4 Check whether a DIMMxxx configuration error alarm is generated by iBMC If yes replace the faulty DIMM For details see 5 3 Handling Alarms If no go to 5 5 Check whether any DIMM slots are abnormal If a DIMM slot is abnormal replace the mainboard 1 If the iBMC generates the DIMMxxx Configuration Error alarm replace the related DIMM 2 If the DIMM status displayed in iBMC or the OS is abnormal unide...

Page 103: ... the fault is caused by the DIMM you suspect to be faulty replace the DIMM If the fault is caused by the DIMM slot change the CPU with a normal one If the problem is caused by the CPU replace the CPU Otherwise replace the mainboard or memory board 2 If the preceding steps do not reproduce the fault use FusionServer Tools Toolkit or Smart Provisioning to perform memory pressure tests If the fault i...

Page 104: ...undant RAID arrays to avoid data loss 3 Follow the handling procedure to replace any faulty modules A RAID controller card fails to identify one or more drives 1 Power off the server swap the drive that cannot be identified with a normal drive and power on the server to check whether the drive is faulty If the fault is caused by the drive replace the drive If the fault is caused by the drive slot ...

Page 105: ...ging the drive installation positions Note If a fault occurs on the RH2288A V2 server check whether the cable connecting the mainboard to the power adapter board is connected properly Figure 5 3 shows the cable connection Figure 5 3 Cable connection 5 6 6 Ethernet Controller Faults Diagnose and rectify Ethernet controller faults depending on the symptoms NOTE If a fault can be located using logs o...

Page 106: ...perform the following steps a Check the logical topology of the NIC If the NIC PCI bus does not have a CPU screw in PCI cards connected to the bus are invisible b Power the iMana 200 or iBMC off and then on Check whether the fault persists c Insert the NIC you suspect to be faulty into another slot and a normal NIC into the slot you suspect to be faulty Then check which of these cause the fault 4 ...

Page 107: ...s on the OS CLI If the command output displays the corresponding version information the GCC and C C software is installed properly Otherwise install the GCC and C C software first c Check the optical module type If an Intel NIC and a non Intel optical module are configured the driver cannot be loaded and the network port is invisible d Reinstall the driver Check that no errors are reported during...

Page 108: ...sical network ports and check whether the network port status indicators are on and whether the network ports on the switch are up NOTE The ethtool p ethN command applies only to plug in PCIe cards 5 Check the network port configuration of the switch module by referring to E9000 Blade Server Mezzanine Module Switch Module 1 Use the ping command to check whether the server or other servers on the n...

Page 109: ...h sides must be up 6 Check the settings of IP addresses gateway addresses VLANs bondings and uplink switch network ports 7 Collect OS logs For details see 4 2 Collecting OS Logs Huawei Servers Troubleshooting 5 Diagnosing and Rectifying Faults Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 100 ...

Page 110: ...spect to be faulty to a different network port Then check whether the fault is caused by the network port 6 To check parameters regarding the packet error or loss run the ethtool S ethN command in Linux or similar in other operating systems 7 Collect OS logs For details see 4 2 Collecting OS Logs 1 Check whether the packet loss occurs only on a single server Run the ethtool S ethN command to check...

Page 111: ... check whether the TSO and GSO settings of the network port have been modified run the ethtool k ethN command in Linux or equivalent in other operating systems 5 To check whether the network port buffer information has been modified run the ethtool g ethN command in Linux or equivalent in other operating systems 6 Collect OS logs For details see 4 2 Collecting OS Logs 5 6 7 FC Controller Faults Co...

Page 112: ...tion for faults g Collect log information of the switches 3 If the HBA is successfully registered with the switch the switch obtains the host WWPN but the storage cannot identify host WWPNs rectify the fault as follows a Check the FC links optical cables and modules between the switch and the storage device b Check whether the HBA and the storage ports are in the same zone c Check whether the zone...

Page 113: ...and firmware match the E9000 2 Check for error codes on FC links between the HBA and the storage device 3 Run the iostat command on the host to query the I O delay and concurrent I O operations 4 Collect the OS message log and check the lpfc driver information and the I O queue depth configured for the HAB driver 5 Perform drive performance tests read and write 100 GB and 100 MB files 6 Contact st...

Page 114: ...ce the faulty ones 3 Before contacting Huawei technical support it is recommended that you migrate services and collect switch module logs OS logs LLD networking information and device time differences Storage services are affected but HBA links are normal 1 Migrate all services and safely power off the server Next remove and reinstall the compute node and power on the server Then check whether th...

Page 115: ...d switch module is faulty move the compute node to a working slot to check whether the fault is caused by the HBA switch module or backplane Replace any faulty modules as required 2 Clear the error code count history observe the error codes for 10 minutes test the performance and contact the storage vendor for quick fault recovery 5 6 8 Switch Module Faults Switch Module Quick Recovery Method Rect...

Page 116: ... are inserted on the same ports on the panel after the board replacement During system startup do not power off or remove the board To continue the startup press Y 1 If services are running connect the network cable or the optical cable to the switch module and press Y to continue 2 If no services are running press Y to continue After logging in to a switch module over SOL the SOL screen shows Cri...

Page 117: ...ed properly to the faulty switch module and the device it is directly connected to Check whether any optical cables are damaged Check whether the optical modules of the faulty switch module and the device it is directly connected to are working properly If there is a transmission device between the switch module and its connected device check the transmission device gateway for alarms 2 If the val...

Page 118: ...arget drive is identified by the RAID controller and use Computing Product Compatibility Checker to check whether the target drive is compatible with the server Then check the BIOS to see whether the target storage devices including SATADOMs microSD cards and built in USB flash drives are identified 2 Check the RAID controller card model and determine whether to configure RAID LSI SAS1078 LSI SAS2...

Page 119: ... it is a PCIe card compatibility issue There is a power supply issue A cat err alarm is generated on iMana 200 or iBMC The PCIe protocol is not supported There is a driver issue The PCIe card is incompatible Check whether the breakdown screenshot contains CPUidle NOTE The G2500 server does not currently support this method The OS kernel is incompatible with the hardware platform NOTE The G2500 ser...

Page 120: ... is faulty The software or hardware interface setting is incorrect Collect the following information For new servers confirm the proportion of abnormal servers and check whether normal and abnormal servers have the same configurations For existing servers confirm the number of servers that are not functioning correctly and check whether the issues occur under specific circumstances Check iMana 200...

Page 121: ...or security purposes causing issues Check whether the Kdump information of the breakdown screenshot periodically displays update_cpu_power divide_error or timer_xx NOTE The G2500 server does not currently support this method The OS has bugs or kernel defects Check whether the Kdump information of the breakdown screenshot non periodically displays gethostbyname NOTE The G2500 server does not curren...

Page 122: ... guide perform the following steps 1 Log in to Support Intelligent Servers or Support Ascend Computing 2 Choose a server model to access the product page 3 On the Documentation tab page choose Installation Upgrade Upgrade Guide 4 View the required upgrade guide To obtain the upgrade package perform the following steps 1 Log in to Support Intelligent Servers or Support Ascend Computing Rack server ...

Page 123: ...odel 9010 2 Choose a server model to access the product page 3 Click the Software Download tab 4 Select the latest patch version 5 Download the required upgrade package iBMC BIOS LCD CPLD and card firmware Huawei Servers Troubleshooting 6 Software and Firmware Upgrade Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 114 ...

Page 124: ...1 Inspecting the Equipment Room Environment and Cable Layout 7 2 Inspecting Servers 7 3 Huawei Server Inspection Report 7 1 Inspecting the Equipment Room Environment and Cable Layout 7 1 1 Precautions Familiarize yourself with the security icons listed in Table 7 1 before preventive maintenance to reduce the chance of injury to yourself or damage to the equipment These security icons will be on so...

Page 125: ...it is not externally grounded Each end of a ground cable should be connected to a different device and the devices must be connected to ground points Indicates that this device can cause personal injury or can fail to operate properly if it is not internally grounded Each end of a ground cable should be connected to different device components and the device must be connected to a ground point Ind...

Page 126: ...s Before inspecting servers obtain the iMana 200 or iBMC IP address MM910 IP address and password of the root or Administrator user for each server to be inspected After inspecting servers advise the customer to change the password of the root or Administrator user as soon as possible 7 2 2 Inspecting Indicators The front and rear panels of Huawei servers provide indicators and buttons including t...

Page 127: ...D and Smart Provisioning firmware of rack servers high density servers blade servers KunLun servers and heterogeneous servers Supports firmware bundle upgrade by using the E9000 active management module Supports batch configuration for PSUs BIOSs BMCs and RAID controller cards of rack servers high density servers blade servers KunLun servers and heterogeneous servers Supports batch configuration f...

Page 128: ...er second of all CPU cores calculated by the CPU internal module If iBMA 2 0 is not installed on the server OS obtain the latest iBMA user guide and software package and install iBMA 2 0 by referring to the user guide 4 In the navigation tree choose Sensor Info to view the status of sensors End Procedure 2 For iBMC V561 and Later or iBMC V3 01 00 00 and Later Step 1 Log in to the iBMC WebUI For de...

Page 129: ...tor Ph on e Nu mb er Inspecting Party formation Time of Inspection Inspected By Phone Numb er Huawei Contact Phone Numb er Service Hotline Enterprise China Region 4008229999 Enterprise global technical assistance center TAC Global Service Hotline Huawei Servers Troubleshooting 7 Preventive Maintenance Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 120 ...

Page 130: ...l Abnormal Brief description 2 Storage temper ature 40 C to 65 C 40 F to 149 F Normal Abnormal Brief description 3 Temper ature change rate 20 C h 68 F h Normal Abnormal Brief description 4 Operati ng humidit y 8 to 90 RH non condensing Normal Abnormal Brief description 5 Storage humidit y 5 to 95 RH non condensing Normal Abnormal Brief description 6 Operati ng altitude 3050m Normal Abnormal Brief...

Page 131: ... and power cables along the two sides of the cabinet respectively Normal Abnormal Brief description 2 Power cable layout Power cables are not tangled and are arranged in an orderly fashion Power cables are arranged in the same way as those in any existing cabinets No power cables are coiled Normal Abnormal Brief description 3 Service cable layout Service cables are not tangled and are arranged in ...

Page 132: ... cable connect or Signal cables and data cables are connected to devices such as servers and switches properly Normal Abnormal Brief description Inspecting Servers View the inspection report generated by SmartKit to check server health status An item has passed the inspection if the value of Result for the item is OK in the report Server Inspection Results N o Item Criteria Result 1 iMana 200 iBMC...

Page 133: ...lp improve your service availability If you receive inspection results please provide your comments and suggestions in the following Customer s Inspection Comments and Suggestions table Inspection Conclusions and Suggestions Insp ecte d By Ph on e N u m be r Date Customer s Inspection Comments and Suggestions In sp ec te d By P h o n e N u m b er Date Huawei Servers Troubleshooting 7 Preventive Ma...

Page 134: ...Logging In to the Web Tools of the MX510 8 11 Logging In to the MM910 WebUI 8 12 Logging In to the FusionDirector WebUI 8 13 Logging In to the MM510 CLI 8 14 Logging In to the RMC CLI 8 15 Logging In to a Server Over a Network Port by Using PuTTY 8 16 Logging In to a Server Over a Serial Port by Using PuTTY 8 17 Logging In to a Compute Node Passthrough Module or Switch Module by Using the SOL Func...

Page 135: ...ers The value 10 indicates Huawei and other values indicate outsourcing vendors 4 Year and month two characters The first character indicates the year The digits 1 to 9 indicate 2001 to 2009 the letters A to H indicate 2010 to 2017 the letters J to N indicate 2018 to 2022 and the letters P to Y indicate 2023 to 2032 NOTE The years from 2010 are represented by upper case letters excluding I O and Z...

Page 136: ...character indicates the year The digits 1 to 9 indicate 2001 to 2009 the letters A to H indicate 2010 to 2017 the letters J to N indicate 2018 to 2022 and the letters P to Y indicate 2023 to 2032 NOTE The years from 2010 are represented by upper case letters excluding I O and Z because the three letters are similar to the digits 1 0 and 2 The second character indicates the month Digits 1 to 9 indi...

Page 137: ...label to obtain its ESN The product label position varies with the Huawei server model For details see the user guide of a specific server Figure 8 3 shows the product SN of a rack server Figure 8 3 Product SN of a rack server Figure 8 4 shows the product SN of an Atlas 800 AI inference server model 3010 Figure 8 4 Product SN of an Atlas 800 AI inference server model 3010 Figure 8 5 shows the prod...

Page 138: ...bel of the server and 2 is the product label of a server node Figure 8 6 Product SN of an X6800 Figure 8 7 shows the product SN of an E9000 In Figure 8 7 1 is the product label of the server and 2 is the product label of a compute node Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 129 ...

Page 139: ... and RH5885H V3 X6000 server node XH310 V2 XH311 V2 XH320 V2 XH321 V2 and XH621 V2 X8000 server node DH310 V2 DH320 V2 DH321 V2 DH620 V2 DH621 V2 DH626 V2 and DH628 V2 E9000 compute node CH121 CH140 CH220 CH221 CH222 CH240 CH242 and CH242 V3 a Log in to the iMana 200 WebUI For details see 8 8 Logging In to the iMana 200 WebUI b On the Overview page view the product SN of the server as shown in Fig...

Page 140: ...L V5 X6800 server node XH620 V3 XH622 V3 XH628 V3 XH628 V5 E9000 compute node CH121 V3 CH121L V3 CH140 V3 CH140L V3 CH220 V3 CH222 V3 CH225 V3 CH226 V3 CH121 V5 CH121L V5 CH221 V5 CH225 V5 CH242 V5 Kunlun server 9008 V5 For iBMC V549 and earlier i Log in to the iBMC WebUI For details see 8 9 Logging In to the iBMC WebUI ii Choose Information Information Summary Overview Summary The menu varies dep...

Page 141: ...ly to E9000 servers whose MM910 version is U54 2 20 or later a Log in to the MM910 WebUI For details see 8 11 Logging In to the MM910 WebUI b Choose Chassis Information Manufacturing Information and view the product SN of the server as shown in Figure 8 11 Figure 8 11 Product SN c Choose Chassis Information Compute Node Slot Number Manufacturing Information and view the SN of the compute node as s...

Page 142: ...e Menu Compute Hardware Chassis c On displayed chassis list click a chassis name to access the chassis details page d Click the Overview tab to view the chassis SN as shown in Figure 8 13 Figure 8 13 Product SN e Click the Device tab and click Server Management Module and Switch Module respectively to view the SNs of the compute node management module and switch module as shown in Figure 8 14 Huaw...

Page 143: ...versions Information similar to the following is displayed root BMC ipmcset t maintenance d imtool tar removing leading from member names Tar result information success iMana If the following information is displayed log collection is successful tar removing leading from member names Tar result information success Step 3 Use a cross platform file transfer tool to connect to the iMana 200 IP addres...

Page 144: ...tlas 800 AI inference server model 3010 Atlas 800 AI training server model 9010 Server information X6000 Compute node information X8000 X6800 FusionServer Pro G5500 Server information MM510 management module information and heterogeneous node information Procedure 1 For iBMC V549 and Earlier Step 1 Log in to the iBMC WebUI For details see 8 9 Logging In to the iBMC WebUI Step 2 Choose Information ...

Page 145: ...nd download the file to the local PC as prompted End Procedure 2 For iBMC V561 and Later or iBMC V3 01 00 00 and Later Step 1 Log in to the iBMC WebUI For details see 8 9 Logging In to the iBMC WebUI Step 2 Choose Home The Home page is displayed as shown in Figure 8 17 or Figure 8 18 Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 136 ...

Page 146: ... home page iBMC V3 01 00 00 and later Step 3 Click One Click Info Collection in the Shortcuts area to download the collected maintenance information End Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 137 ...

Page 147: ...le named one_touch_info_all tar gz is displayed in the File Name area Step 4 Click the log file name and download it to the local PC as prompted NOTE For MM910 earlier than U54 2 20 you need to collect logs of both the active and standby HMMs End 8 5 Using the MM910 WebUI to Collect Information in Batches for U54 2 20 or Later Operation Scenario For U54 2 20 or later use the MM910 WebUI to collect...

Page 148: ...displayed Step 3 Click Collect Log In the displayed dialog box click OK The Task area is displayed on the right of the page showing the progress and status of the log collecting task When the task is complete a message indicating success is displayed Step 4 Click Export Log to export the log information to a local directory End 8 7 Using the MM510 CLI to Collect Information FusionServer Pro G5500 ...

Page 149: ...s The local PC is properly connected to the iMana 200 management network port on the server by using a network cable The IP addresses of the local PC and the iMana 200 management network port are on the same network segment Table 8 2 Local PC configuration requirements OS Software Version Windows 7 32 bit Windows 8 32 bit Windows Server 2008 32 bit Browser Internet Explorer Internet Explorer 8 0 1...

Page 150: ...Table 8 3 Required data Type Paramet er Description Example User login informa tion User name Username for logging in to the iMana 200 WebUI root Password User password for logging in to the iBMC WebUI NOTE The default iMana 200 user is root The root user belongs to the administrator group The default password is Huawei12 Huawei12 Procedure Step 1 Connect the local PC to the iMana 200 management n...

Page 151: ...Press Enter The iMana 200 login page is displayed as shown in Figure 8 20 NOTE If the message There is a problem with this website s security certificate is displayed click Continue to this website not recommended If the Security Alert dialog box indicating a certificate error is displayed click Yes Figure 8 20 Logging in to the iMana 200 WebUI Huawei Servers Troubleshooting 8 Common Operations Is...

Page 152: ...erequisites Conditions The local PC that uses the remote control function must have the Java runtime environment JRE and the browser of the required version For details see the corresponding iBMC User Guide Ensure that the local PC meets the following networking conditions The local PC is connected to the iBMC management network port on the server by using a network cable The IP addresses of the l...

Page 153: ...t the local PC to the iBMC management network port on the server by using a crossover cable or twisted pair cable Figure 8 21 shows the network diagram Figure 8 21 Network diagram Step 2 Open Internet Explorer on the local PC Step 3 In the address box enter the IP address of the server iBMC management network port for example https 192 168 2 100 and press Enter The iBMC login page is displayed as ...

Page 154: ...in the upper right corner End Procedure 2 For iBMC V561 and Later or iBMC V3 01 00 00 and Later This section uses a PC running Windows 7 and Internet Explorer 11 as an example Step 1 Open Internet Explorer enter the iBMC management network port address https ipaddress in the address box and press Enter NOTE Enter an IPv6 address in brackets or an IPv4 address directly For example IPv4 address 192 ...

Page 155: ...ding iBMC User Guide If no trust certificate is available and network security can be ensured add the iBMC to the Exception Site List on Java Control Panel or reduce the Java security level This operation however poses security risks Exercise caution when performing this operation Step 2 Click Continue to this website not recommended The login page is displayed as shown in Figure 8 24 Huawei Serve...

Page 156: ...name and password for logging in to the iBMC WebUI Step 4 Select Local iBMC from the Domain drop down list Step 5 Click Log In The Home page is displayed End Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 147 ...

Page 157: ...2 Tool Java plug in This tool is third party software You need to prepare it by yourself JRE 1 8 or later is required Procedure Step 1 Connect a client for example a local PC to the management network port of the management module by using a network cable Step 2 In this displayed security alert dialog box click Allow to allow web access Step 3 In the displayed security alert dialog box select Do n...

Page 158: ...igure 8 26 Web Tools home page End 8 11 Logging In to the MM910 WebUI Scenarios Log in to the MM910 WebUI by using a browser on the local PC to configure and manage the chassis MM910s compute nodes storage nodes switch modules passthrough modules power supply units PSUs and fan modules Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 149 ...

Page 159: ...have obtained the following data Username for logging in to the server to be connected The default username is root User password for logging in to the server to be connected The default user password is Huawei12 Procedure Step 1 Connect the Ethernet port on the local PC to the MGMT ports on the active and standby MM910s over the local area network LAN NOTICE If the active MM910 MGMT port has been...

Page 160: ...the MGMT port on the MM910 and the port on the switch module to the same network at the same time otherwise a network storm occurs and the network is interrupted For MM910 U54 2 26 or later the MGMT port is used as the management network port by default For details about how to query the version see the MM910 Management Module V100R001 User Guide You can run the outportmode command to change the m...

Page 161: ...yed Step 9 Click Continue to this website not recommended The page for logging in to the HMM WebUI is displayed Step 10 Set the parameters See Figure 8 28 and Figure 8 29 Language Select English User name Enter the username for login The default username is root Password Enter the user password for login The default password is Huawei12 Login To Select This Machine computer in most cases Select LD...

Page 162: ...as shown in Figure 8 30 or Figure 8 31 Figure 8 30 HMM WebUI MM910 U54 2 20 or later Figure 8 31 HMM WebUI MM910 earlier than U54 2 20 End Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 153 ...

Page 163: ...rector supports a maximum of 100 concurrent users The default timeout interval of FusionDirector is 30 minutes If you do not perform any operation on the WebUI within 30 minutes the account is automatically logged out You need to enter the username and password to log in again If the number of login failures caused by incorrect user names and passwords reaches the value specified in the system sec...

Page 164: ...an be in either of the following formats IPv4 address in dotted decimal format XXX XXX XXX XXX Fully qualified domain name FQDN of FusionDirector The browser may display a message indicating that the website has a security certificate error Ignore this error and continue the login if the IP address is correct Step 4 Enter the login information Table 8 5 describes the information required on the lo...

Page 165: ...ocal If you log in as an LDAP user select LDAP Step 5 Click Log In The FusionDirector Dashboard is displayed as shown in Figure 8 34 NOTE If the username or password is incorrect you need to enter a verification code in the second login attempt If the verification code is not clear click to refresh the verification code If you enter incorrect passwords for three consecutive times the account will ...

Page 166: ...nt module of the FusionServer Pro G5500 Prerequisites When logging in to the HMM CLI ensure that If you log in to the CLI over SSH a maximum of five concurrent users are supported Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 157 ...

Page 167: ...l SSH protocol provides secure remote login and other secure network services over an insecure network The methods for logging in to the CMC CLI over SSH varies according to the client operating system If the client uses Linux a Connect the client to the management network port on the server b Run the ssh ipaddress command on the terminal tool for example shell to log in to the CLI In the command ...

Page 168: ...he rack management controller RMC CLI Two login methods are available SSH SSH provides secure remote login and other secure network services over an insecure network To log in to the RMC CLI over SSH connect a PC to the RMC management network port by using a network cable Login over the local serial port Prerequisites The RMC is operating properly Huawei Servers Troubleshooting 8 Common Operations...

Page 169: ... This tool is third party software You need to prepare it by yourself PuTTY 0 60 or later is required for login over a serial port Document For details about the RMC see the X8000 Server RMC Command Reference Log in to the RMC CLI over a serial port Step 1 Connect the PC to the RMC serial port by using a serial cable Step 2 On the PC double click PuTTY exe The PuTTY Configuration window is display...

Page 170: ...00 Data bits 8 Stop bits 1 Parity None Flow control None Step 5 Click Open The PuTTY window is displayed prompting login as for you to enter a user name Step 6 Enter a user name and password After login the RMC command prompt root RMC is displayed End Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 161 ...

Page 171: ... in Figure 8 37 Figure 8 37 PuTTY Configuration SSH Step 4 In the Host Name or IP address text box enter the IP address of the RMC management network port Step 5 Click Open The PuTTY window is displayed prompting login as for you to enter a user name Step 6 Enter a user name and password After login the RMC command prompt root RMC is displayed End Huawei Servers Troubleshooting 8 Common Operations...

Page 172: ...ss of the server to be connected You have obtained the user name and password for logging in to the server to be connected Software Tools PuTTY exe This tool is third party software You need to prepare it by yourself Procedure Step 1 Set an IP address and a subnet mask or add route information for the PC so that the PC can properly communicate with the server You can run the Ping Server IP address...

Page 173: ...nly on clean exit NOTE Configure Host Name and Saved Sessions and click Save You can double click the saved record under Saved Sessions to log in to the server the next time Step 4 Optional After logging in to the Ethernet plane by using PuTTY if you fail to delete characters on the CLI by using the Backspace key choose Terminal Keyboard and select Control H under The Backspace key as shown in Fig...

Page 174: ... Security Alert dialog box is displayed Click Yes to proceed If an incorrect user name or password is entered you must set up a new PuTTY session Step 6 Enter a user name and password If the login is successful the server host name is displayed on the left of the prompt End Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 165 ...

Page 175: ...plane Prerequisites Conditions A PC is connected to the server by using a serial cable PuTTY 0 60 or later has been installed Data You have obtained the user name and password for logging in to the server to be connected Software Tools PuTTY exe This tool is third party software You need to prepare it by yourself PuTTY 0 60 or later is required for login over a serial port Procedure Step 1 Double ...

Page 176: ...t Connection type in Serial as shown in Figure 8 40 Figure 8 40 PuTTY Configuration Step 6 Click Open The PuTTY window is displayed Step 7 Enter a user name and password If the login is successful the server host name is displayed on the left of the prompt End Huawei Servers Troubleshooting 8 Common Operations Issue 20 2020 09 25 Copyright Huawei Technologies Co Ltd 167 ...

Page 177: ... management module The default user name of the MM910 is root and the default password is Huawei12 User name and password for logging in to the compute node to be connected The default user name is root and the password is Huawei12 Password for logging in to the passthrough module or switch module to be connected The default password is Huawei12 Procedure Step 1 Use an SSH tool and the floating IP...

Page 178: ...ibed as follows 1 to 32 indicate the compute nodes in slots 1 to 32 respectively 33 to 36 indicate the switch modules in slots 1E 2X 3X and 4E respectively Step 4 Enter the slot number of the compute node passthrough module or switch module and press Enter If you enter a compute node slot number the following serial port information is displayed 1 systemcom 2 RAIDcom 3 BMCcom 4 Exboardcom Or 1 SYS...

Page 179: ...the E9000 Prerequisites Conditions You have logged in to the MM920 MM921 CLI by using the floating IP address of the MM920 MM921 There is no jumper cap over the pins on the mainboard of the compute node passthrough module or switch module Data You have obtained the following data Username and password for logging in to the management module The default username and password of the MM920 MM921 are ...

Page 180: ...Scenarios Use WinSCP to transfer files from a PC to a server Prerequisites Conditions The Secure File Transfer Protocol SFTP service has been enabled on the destination device Data You have obtained the following data You have obtained the IP address of the server to be connected You have obtained the user name and password for logging in to the server to be connected Software Tools WinSCP exe Thi...

Page 181: ...op down list and select Allow SCP fallback Step 3 Click Login The WinSCP file transfer window is displayed NOTE If a private key file is not selected at the first login the warning message Continue connecting and add host key to cache is displayed Click Yes The WinSCP file transfer window is displayed On Windows 7 C Users Administrator Documents on the local PC is opened in the left pane and root ...

Page 182: ...en different platforms for example from a PC to a switching plane of a switch module This tool is third party software You need to prepare it by yourself Procedure Step 1 Double click wftpd32 exe The No log file open WFTPD window is displayed Step 2 Choose Logging Log Options The Logging Options dialog box is displayed Step 3 Select all check boxes except Winsock Calls and click OK Huawei Servers ...

Page 183: ...E The directory can contain only English characters Step 8 Select vxworks from the User Name combo box and enter the upgrade file directory for example D FTP in the Home Directory text box See Figure 8 43 Figure 8 43 Users Rights Security Dialog dialog box Step 9 Click Done The FTP server is configured End 8 21 Using SFTP to Transfer Files Scenarios Transfer files on the local PC using SFTP Prereq...

Page 184: ...ername for logging in to the SFTP server Password specifies the password for logging in to the SFTP server Port specifies the port number which is 22 Root path specifies the home directory of the SFTP server Step 3 Click Options and enter the SFTP server IP address of the SFTP server For example 192 168 2 10 Step 4 Click Start The file transfer page is displayed End Huawei Servers Troubleshooting ...

Page 185: ...oduct Information Service Platform for server product documentation Visit Huawei Enterprise iKnow for Q A about products Visit Huawei Enterprise Support Community Servers for learning and discussion News For notices about product life cycles warnings and updates visit Support Bulletins Product Bulletins Cases For details about existing cases see the Computing Case Library NOTE The Computing Case L...

Page 186: ... outside China can obtain the customer service information from Global TAC Information Contact the technical support personnel of the local Huawei office 9 2 Product Information Resources Table 9 1 describes the product information resources Table 9 1 Product information resources Information Resource Description How to Obtain Server product documentation Describes the server structure specificati...

Page 187: ...Display 9 3 Product Configuration Resources Table 9 2 describes the product configuration resources Table 9 2 Product configuration resources Tool Name Description How to Obtain Removal and installation videos Describe how to remove and install hardware Intelligent Computing Product Hardware Installation Multimedia Computing Product Memory Configuration Assistant Online application that shows the ...

Page 188: ...rver Tools 2 0 SmartKit User Guide Used for new site deployment and delivery troubleshooting and firmware upgrade Download link FusionServer Tools Smart Provisioning See the Smart Provisioning User Guide Only Huawei FusionServer V5 servers are supported Smart Provisioning is used to install OSs without a physical DVD ROM drive configure RAID upgrade firmware and perform troubleshooting Download li...

Reviews: