Chapter 9: MediaStore 5100 and 5000 series hardware reference
cycled. When this condition is detected, the disk will be failed, regardless of any of the requisite
conditions mentioned previously. In extreme circumstances, this may cause the MediaDirector to shut
down its file system, stopping all playback and recording on that Spectrum server (but not affecting
other Spectrum servers in an EFS system.) The drive will be automatically bypassed, and an alarm
message will be generated instructing the operator to remove and reinsert the drive. If this is done
within five minutes of the failure, an automatic “surgical” rebuild will be immediately started. Otherwise,
one of the Spectrum servers in the system will automatically start a rebuild (provided a hot spare is
available).
About bad-block auto-repair
Review the details on Spectrum bad-block auto-repair.
When an unreadable or unwritable (“Read Error” or “Write Error”) block occurs, the block is internally
marked as bad. Bad block errors can occur occasionally on any system and do not, by themselves, imply
catastrophic drive failures. After a short period to allow collection of clusters of bad-blocks, and if it safe to
do so, a bad-block auto-repair will be performed as follows:
• Any drive with any unrepaired bad blocks (“Read/Write” and hard/soft errors) will be temporarily auto-
failed from the RAID set.
• The failed blocks will be recovered or reallocated on the disk.
• Blocks that could not be recovered will be marked for later rebuild.
• The drive will be added back into the RAID set.
• A surgical rebuild will be performed. A surgical rebuild uses RAID functionality to regenerate the
missing blocks that could not be recovered or were written to the RAID set while the drive was
removed.
In order for BBAR to be performed on a drive, its containing RAID set must be on-line with redundancy,
and the RAID set's containing file-system must be on-line and writable.
If the BBAR is unable to fix the block, the drive will be failed and will need to be replaced.
If an auto-fail is unsuccessful, be cautious when manually failing or removing any drives in the RAID Set.
Failing another drive on a rebuilding RAID Set or a compromised RAID Set could cause all Spectrum
servers to stop the file system. Contact Technical Support if you are unsure of what action to take.
Bad-block auto-repair in SystemManager
Warning alarms (yellow) are generated in SystemManager for any drive that reports a bad block.
In some cases, the bad-block auto-repair process may generate a red “CRIT” (critical) alarm in
SystemManager when the drive is temporarily auto-failed. This alarm is normal and can be ignored
provided the sequence of alarms confirms the “RAID Set rebuild completed.” The following is an example
of the SystemManager alarm sequence:
1. Disk diagnostics detect Bad Blocks (after a “Read Error” or “Write Error” on that block.)
2. Disk diagnostics deactivate the drive to recover bad blocks. During this time, RAID set protection is
momentarily lost while bad blocks are recovered or reallocated and bad- block tables are updated.
3. The drive is added back to the RAID set and, if needed, a rebuild is scheduled. At this point, the
drive is active, and RAID set protection is restored. After the rebuild is complete, disk diagnostics will
confirm and report zero bad blocks.
214