54
Troubleshooting
Use the CLI
As an alternative to using the SMU, you can run the
show system
command in the CLI to view the health
of the system and its components. If any component has a problem, the system health will be Degraded,
Fault, or Unknown, and those components will be listed as Unhealthy Components. Follow the
recommended actions in the component Health Recommendations field to resolve the problem.
Monitor event notification
With event notification configured and enabled, you can view event logs to monitor the health of the
system and its components. If a message tells you to check whether an event has been logged, or to view
information about an event in the log, you can do so using either the SMU or the CLI. Using the SMU, you
would view the event log and then click on the event message to see detail about that event. Using the CLI,
you would run the
show events detail
command (with additional parameters to filter the output) to
see the detail for an event.
View the enclosure LEDs
You can view the LEDs on the hardware (while referring to
for your enclosure model) to
identify component status. If a problem prevents access to either the SMU or the CLI, this is the only option
available. However, monitoring/management is often done at a management console using storage
management interfaces, rather than relying on line-of-sight to LEDs of racked hardware components.
Performing basic steps
You can use any of the available options in performing the basic steps comprising the fault isolation
methodology.
Gather fault information
When a fault occurs, it is important to gather as much information as possible. Doing so will help you
determine the correct action needed to remedy the fault.
Begin by reviewing the reported fault:
•
Is the fault related to an internal data path or an external data path?
•
Is the fault related to a hardware component such as a disk drive module, controller module, or power
supply?
By isolating the fault to
one
of the components within the storage system, you will be able to determine the
necessary action more quickly.
Determine where the fault is occurring
Once you have an understanding of the reported fault, review the enclosure LEDs. The enclosure LEDs are
designed to alert users of any system faults, and might be what alerted the user to a fault in the first place.
When a fault occurs, the Fault ID status LED on the enclosure right ear (see
) illuminates. Check the LEDs on the back of the enclosure to narrow the fault to a FRU,
connection, or both. The LEDs also help you identify the location of a FRU reporting a fault.
Use the SMU to verify any faults found while viewing the LEDs. The SMU is also a good tool to use in
determining where the fault is occurring if the LEDs cannot be viewed due to the location of the system. The
SMU provides you with a visual representation of the system and where the fault is occurring. It can also
provide more detailed information about FRUs, data, and faults.
Review the event logs
The event logs record all system events. Each event has a numeric code that identifies the type of event that
occurred, and has one of the following severities:
•
Critical. A failure occurred that may cause a controller to shut down. Correct the problem
immediately
.
•
Error. A failure occurred that may affect data integrity or system stability. Correct the problem as soon
as possible.
•
Warning. A problem occurred that may affect system stability, but not data integrity. Evaluate the
problem and correct it if necessary.
Summary of Contents for MSA 2040
Page 8: ...8 Figures ...
Page 10: ...10 Tables ...
Page 32: ...32 Installing the enclosures ...
Page 44: ...44 Connecting hosts ...
Page 50: ...50 Connecting to the controller CLI port ...
Page 52: ...52 Basic operation ...
Page 70: ...70 Troubleshooting ...
Page 74: ...74 Support and other resources ...
Page 76: ...76 Documentation feedback ...
Page 88: ...88 LED descriptions ...
Page 94: ...94 Electrostatic discharge ...
Page 100: ...100 Index ...