background image

Chapter 3

Server Diagnostics

3-11

3.2.2

Power Supply LEDs

The power supply LEDs (

TABLE 3-3

are located on the back of the power supply.

3.3

Using ALOM CMT for Diagnosis and
Repair Verification

The Advanced Lights Out Management (ALOM) CMT is a system controller in the
server that enables you to remotely manage and administer your server.

ALOM CMT enables you to remotely run diagnostics, such as power-on self-test
(POST), that would otherwise require physical proximity to the server’s serial port.
You can also configure ALOM CMT to send email alerts of hardware failures,
hardware warnings, and other events related to the server or to ALOM CMT.

The ALOM CMT circuitry runs independently of the server, using the server’s
standby power. Therefore, ALOM CMT firmware and software continue to function
when the server operating system goes offline or when the server is powered off.

Note –

Refer to the

Advanced Lights Out Management (ALOM) CMT Guide

for

comprehensive ALOM CMT information.

TABLE 3-3

Power Supply LEDs

Name

Color

Description

Fault

Amber

• On – Power supply has detected a failure.
• Off – Normal operation.

DC OK

Green

• On – Normal operation. DC output voltage is within normal limits.
• Off – Power is off.

AC OK

Green

• On – Normal operation. Input power is within normal limits.
• Off – No input voltage, or input voltage is below limits.

Summary of Contents for SPARC ENTERPRISE T1000

Page 1: ......

Page 2: ......

Page 3: ...SPARC Enterprise T1000 Server Service Manual Manual Code C120 E384 01EN Part No 875 4022 10 April 2007 ...

Page 4: ... and the Fujitsu logo are registered trademarks of Fujitsu Limited All SPARC trademarks are used under license and are registered trademarks of SPARC International Inc in the U S and other countries Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems Inc SPARC64 is a trademark of SPARC International Inc used under license by Fujitsu Microelectronics Inc and ...

Page 5: ...Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International Inc aux Etats Unis et dans d autres pays Les produits portant les marques SPARC sont basés sur une architecture développée par Sun Microsystems Inc SPARC64 est une marques déposée de SPARC International Inc utilisée sous le permis par Fujitsu Microelectronics Inc et F...

Page 6: ......

Page 7: ...t 1 2 2 Server Overview 2 1 2 1 Server Overview 2 1 2 2 Obtaining the Chassis Serial Number 2 3 3 Server Diagnostics 3 1 3 1 Overview of Server Diagnostics 3 1 3 1 1 Memory Configuration and Fault Handling 3 6 3 1 1 1 Memory Configuration 3 7 3 1 1 2 Memory Fault Handling 3 7 3 1 1 3 Troubleshooting Memory Faults 3 8 3 2 Using LEDs to Identify the State of Devices 3 8 3 2 1 Front and Rear Panel LE...

Page 8: ... Changing POST Parameters 3 26 3 4 3 Reasons to Run POST 3 27 3 4 3 1 Verifying Hardware Functionality 3 27 3 4 3 2 Diagnosing the System Hardware 3 28 3 4 4 Running POST in Maximum Mode 3 28 3 4 5 Correctable Errors Detected by POST 3 35 3 4 5 1 Correctable Errors for Single DIMMs 3 36 3 4 5 2 Determining When to Replace Detected Devices 3 37 3 4 6 Clearing POST Detected Faults 3 38 3 5 Using the...

Page 9: ...1 1 Required Tools 4 2 4 1 2 Shutting the System Down 4 2 4 1 3 Removing the Server From a Rack 4 3 4 1 4 Performing Electrostatic Discharge ESD Prevention Measures 4 5 4 1 5 Removing the Top Cover 4 5 5 Replacing Field Replaceable Units 5 1 5 1 Replacing the Optional PCI Express Card 5 2 5 1 1 Removing the Optional PCI Express Card 5 2 5 1 2 Installing the Optional PCI Express Card 5 3 5 2 Replac...

Page 10: ...ive Assembly 5 15 5 5 2 2 Installing the Hard Drive in a Dual Drive Assembly 5 17 5 6 Replacing DIMMs 5 19 5 6 1 Removing DIMMs 5 19 5 6 2 Installing DIMMs 5 21 5 7 Replacing the Motherboard and Chassis 5 25 5 7 1 Removing the Motherboard and Chassis 5 25 5 7 2 Installing the Motherboard and Chassis 5 25 5 8 Replacing the Clock Battery 5 27 5 8 1 Removing the Clock Battery on the Motherboard 5 27 ...

Page 11: ...Contents ix Index Index 1 ...

Page 12: ...x SPARC Enterprise T1000 Server Service Manual April 2007 ...

Page 13: ...OST Configuration 3 25 FIGURE 3 6 SunVTS GUI 3 51 FIGURE 3 7 SunVTS Test Selection Panel 3 52 FIGURE 4 1 Unlocking a Mounting Bracket 4 4 FIGURE 4 2 Location of the Mounting Bracket Release Buttons 4 4 FIGURE 4 3 Location of Top Cover Release Button 4 6 FIGURE 5 1 Releasing the PCI Express Card Release Lever 5 2 FIGURE 5 2 Removing and Installing the PCI Express Card 5 3 FIGURE 5 3 Removing the Fa...

Page 14: ...FIGURE 5 10 Installing the Single Drive Assembly 5 14 FIGURE 5 11 Location of Drive Power and Data Connectors on the Motherboard 5 15 FIGURE 5 12 Removing the Dual Drive Assembly 5 16 FIGURE 5 13 Installing the Dual Drive Assembly 5 18 FIGURE 5 14 DIMM Locations 5 20 FIGURE 5 15 Removing the Clock Battery From the Motherboard 5 27 FIGURE 5 16 Installing the Clock Battery on the Motherboard 5 28 FI...

Page 15: ...3 11 TABLE 3 4 Service Related ALOM CMT Commands 3 14 TABLE 3 5 ALOM CMT Parameters Used for POST Configuration 3 23 TABLE 3 6 ALOM CMT Parameters and POST Modes 3 26 TABLE 3 7 ASR Commands 3 46 TABLE 3 8 Useful SunVTS Tests to Run on This Server 3 52 TABLE 5 1 DIMM Names and Socket Numbers 5 20 TABLE A 1 Server FRU List A 3 ...

Page 16: ...xiv SPARC Enterprise T1000 Server Service Manual April 2007 ...

Page 17: ...ating System and the command line interface Has superuser privileges for the system being serviced Understands typical hardware troubleshooting tasks FOR SAFE OPERATION This manual contains important information regarding the use and handling of this product Read this manual thoroughly Pay special attention to the section Notes on Safety on page xx Use the product according to the instructions and...

Page 18: ...rvicing the server Chapter 5 Replacing Field Replaceable Units Describes how to remove and replace the FRUS within the server Chapter 6 Finishing Up Servicing Describes how to finish up the servicing of the server Appendix A Field Replaceable Units Lists the field replaceable components in the server Index Provides keywords and corresponding reference page numbers so that the reader can easily sea...

Page 19: ...umentation to get your system installed and running quickly C120 E379 SPARC Enterprise T1000 Server Overview Guide Provides an overview of the features of this server C120 E380 SPARC Enterprise T1000 Server Installation Guide Detailed rackmounting cabling power on and configuring information C120 E383 SPARC Enterprise T1000 Server Administration Guide How to perform administrative tasks that are s...

Page 20: ... uses the following fonts and symbols to express specific types of information The settings on your browser might differ from these settings Typeface Meaning Example AaBbCc123 The names of commands files and directories on screen computer output Edit your login file Use ls a to list all files You have mail AaBbCc123 What you type when contrasted with on screen computer output su Password AaBbCc123...

Page 21: ...This indicates a hazardous situation that could result in minor or moderate personal injury if the user does not perform the procedure correctly This signal also indicates that damage to the product or other property may occur if the user does not perform the procedure correctly Alert Messages in the Text An alert message in the text consists of a signal indicating an alert level followed by an al...

Page 22: ... are shown in Important Alert Messages on page xx Notes on Safety Important Alert Messages This manual provides the following important alert signals Caution This indicates a hazardous situation could result in minor or moderate personal injury if the user does not perform the procedure correctly This signal also indicates that damage to the product or other property may occur if the user does not...

Page 23: ...d inspections repairing and regular diagnosis and maintenance Caution The following tasks regarding this product and the optional products provided from Fujitsu should only be performed by a certified service engineer Users must not perform these tasks Incorrect operation of these tasks may cause malfunction Unpacking optional adapters and such packages delivered to the users Plugging or unpluggin...

Page 24: ...to this product Never peel off the labels The following labels provide information to the users of this product Fujitsu Welcomes Your Comments We would appreciate your comments and suggestions to improve this document You can submit your comments by using Reader s Comment Form Sample of SPARC Enterprise T1000 ...

Page 25: ...Preface xxiii Reader s Comment Form ...

Page 26: ...D AND TAPE BUSINESS REPLY MAIL FIRST CLASS MAIL PERMIT NO 741 SUNNYVALE CA NO POSTAGE NECESSARY IF MAILED IN THE UNITED STATES POSTAGE WILL BE PAID BY ADDRESSEE FUJITSU COMPUTER SYSTEMS ATTENTION ENGINEERING OPS M S 249 1250 EAST ARQUES AVENUE P O BOX 3470 SUNNYVALE CA 94088 3470 ...

Page 27: ...r For your protection observe the following safety precautions when setting up your equipment Follow all standard cautions warnings and instructions marked on the equipment and described in Important Safety Information for Hardware Systems C120 E391 Ensure that the voltage and frequency of your power source match the voltage and frequency inscribed on the equipment s electrical rating label Follow...

Page 28: ...rd drives contain electronic components that are extremely sensitive to static electricity Ordinary amounts of static electricity from clothing or the work environment can destroy components Do not touch the components along their connector edges 1 3 1 Using an Antistatic Wrist Strap Wear an antistatic wrist strap and use an antistatic mat when handling components such as drive assemblies boards o...

Page 29: ...the server Topics include Section 2 1 Server Overview on page 2 1 Section 2 2 Obtaining the Chassis Serial Number on page 2 3 2 1 Server Overview The server is a high performance entry level server that is highly scalable and very reliable FIGURE 2 1 FIGURE 2 1 Server ...

Page 30: ...RE 2 4 show the front and rear panels of the server FIGURE 2 2 Server Components FIGURE 2 3 Server Front Panel DIMMs Fan tray assembly Hard drive PCI E riser board Motherboard Chassis assembly UltraSPARC T1 mullticore processor Power supply PCI E slot opening Locator LED button Service Required LED Power OK LED and Power On Off button ...

Page 31: ... the server and another sticker at the rear of the server below the AC power connector You can also run the ALOM CMT showplatform command to obtain the chassis serial number Example sc showplatform SUNW SPARC Enterprise T1000 Chassis Serial Number 0529AP000882 Domain Status S0 OS Standby sc Power supply LEDs Locator LED button Ethernet ports DB9 serial port SC network management port PCI E slot SC...

Page 32: ...2 4 SPARC Enterprise T1000 Server Service Manual April 2007 ...

Page 33: ...2 Using LEDs to Identify the State of Devices on page 3 8 Section 3 3 Using ALOM CMT for Diagnosis and Repair Verification on page 3 11 Section 3 4 Running POST on page 3 22 Section 3 5 Using the Solaris Predictive Self Healing Feature on page 3 39 Section 3 6 Collecting Information From Solaris OS Files and Commands on page 3 44 Section 3 7 Managing Components With Automatic System Recovery Comma...

Page 34: ...th recommendations for repair The LEDs ALOM CMT Solaris OS PSH and many of the log files and console messages are integrated For example a fault detected by the Solaris PSH software displays the fault logs it passes information to ALOM CMT where it is logged and depending on the fault might illuminate of one or more LEDs The flow chart in FIGURE 3 1 and TABLE 3 1 describes an approach for using th...

Page 35: ...Chapter 3 Server Diagnostics 3 3 FIGURE 3 1 Diagnostic Flow Chart flow chart ...

Page 36: ... Healing PSH detected faults POST detected faults Faulty FRUs are identified in fault messages using the FRU name For a list of FRU names see Appendix A Section 3 3 2 Running the showfaults Command on page 3 16 3 Check the Solaris log files for fault information The Solaris message buffer and log files record system events and provide information about faults If system messages indicate a faulty d...

Page 37: ...etermine if the fault is an environmental fault If the fault listed by the showfaults command displays a temperature or voltage fault then the fault is an environmental fault Environmental faults can be caused by faulty FRUs power supply or fan tray or by environmental conditions such as when computer room ambient temperature is too high or the server airflow is blocked When the environmental cond...

Page 38: ...Replaceable Units on page 5 1 Section 3 5 2 Clearing PSH Detected Faults on page 3 43 8 Determine if the fault was detected by POST POST performs basic tests of the server components and reports faulty FRUs When POST detects a faulty FRU it logs the fault and if possible takes the FRU offline POST detected FRUs display the following text in the fault message FRU_name deemed faulty and disabled In ...

Page 39: ...provides a check to ensure the server will boot Normal operation applies to any boot of the server not intended to test power on errors hardware upgrades or repairs Once the Solaris OS is running PSH provides run time diagnosis of faults When a memory fault is detected POST displays the fault with the device name of the faulty DIMMS logs the fault and disables the faulty DIMMs by placing them in t...

Page 40: ...CMT showfaults command The showfaults command lists memory faults and lists the specific DIMMS that are associated with the fault Once you identify which DIMMs to replace see Chapter 5 for DIMM removal and replacement instructions It is important that you perform the instructions in that chapter to clear the faults and enable the replaced DIMMs 3 2 Using LEDs to Identify the State of Devices The s...

Page 41: ...Chapter 3 Server Diagnostics 3 9 FIGURE 3 3 LEDs on the Server Rear Panel Fault LED Service Required LED Power OK LED Activity LED AC OK LED DC OK LED Link LED Locator LED button Activity LED Link LED ...

Page 42: ...ht Power OK LED Front and rear panels Green The LED provides the following indications Off Indicates that the system is unavailable Either it has no power or ALOM CMT is not running Steady on Indicates that the system is powered on and is running in its normal operating state No service actions are required Standby blink Indicates the system is running at a minimum level in standby and is ready to...

Page 43: ...are failures hardware warnings and other events related to the server or to ALOM CMT The ALOM CMT circuitry runs independently of the server using the server s standby power Therefore ALOM CMT firmware and software continue to function when the server operating system goes offline or when the server is powered off Note Refer to the Advanced Lights Out Management ALOM CMT Guide for comprehensive AL...

Page 44: ...The system automatically detects that the fault condition is no longer present ALOM CMT extinguishes the Service Required LED and updates the FRU s PROM indicating that the fault is no longer present Fault repair The fault has been repaired by human intervention In most cases ALOM CMT detects the repair and extinguishes the Service Required LED If ALOM CMT does not perform these actions you must p...

Page 45: ...does not monitor the hard drive for faults As a result ALOM CMT does not recognize hard drive faults and will not light the fault LEDs on either the chassis or the hard drive itself Use the Solaris message files to view hard drive faults See Section 3 6 Collecting Information From Solaris OS Files and Commands on page 3 44 3 3 1 Running ALOM CMT Service Related Commands This section describes the ...

Page 46: ...p A depending on the mode Solaris software was booted y skips the confirmation question c executes a console command after the break command completes D forces a core dump of the Solaris OS clearfault UUID Manually clears host detected faults The UUID is the unique fault ID of the fault to be cleared console f Connects you to the host system The f option forces the console to have read and write c...

Page 47: ...is information includes system temperatures power supply front panel LED hard drive fan voltage and current sensor status See Section 3 3 3 Running the showenvironment Command on page 3 17 showfaults v Displays current system faults See Section 3 3 2 Running the showfaults Command on page 3 16 showfru g lines s d FRU Displays information about the FRUs in the server g lines specifies the number of...

Page 48: ...llowing reasons To see if any faults have been passed to or detected by ALOM To obtain the fault message ID SUNW MSG ID for PSH detected faults To verify that the replacement of a FRU has cleared the fault and not generated any additional faults At the sc prompt type the showfaults command The following showfaults command examples show the different kinds of output from the showfaults command Exam...

Page 49: ...ensors The output uses a format similar to the Solaris OS command prtdiag 1m At the sc prompt type the showenvironment command The output differs according to your system s model and configuration Example sc showfaults v ID Time FRU Fault 1 OCT 13 12 47 27 MB CMP0 CH0 R1 D0 MB CMP0 CH0 R1 D0 deemed faulty and disabled sc showfaults v ID Time FRU Fault 0 SEP 09 11 09 26 MB CMP0 CH0 R1 D0 Host detec...

Page 50: ..._VMEM OK 1 79 1 69 1 72 1 87 1 90 MB V_VTT OK 0 89 0 84 0 86 0 93 0 95 MB V_ 1V2 OK 1 18 1 09 1 11 1 28 1 30 MB V_ 1V5 OK 1 49 1 36 1 39 1 60 1 63 MB V_ 2V5 OK 2 51 2 27 2 32 2 67 2 72 MB V_ 3V3 OK 3 29 3 06 3 10 3 49 3 53 MB V_ 5V OK 5 02 4 55 4 65 5 35 5 45 MB V_ 12V OK 12 25 10 92 11 16 12 84 13 08 MB V_ 3V3STBY OK 3 33 3 13 3 16 3 53 3 59 System Load in amps Sensor Status Load Warn Shutdown MB...

Page 51: ...M at MB SEEPROM SEGMENT SD ManR ManR UNIX_Timestamp32 TUE OCT 18 21 17 55 2005 ManR Description ASSY SPARC Enterprise T1000 Motherboard ManR Manufacture Location Sriracha Chonburi Thailand ManR Sun Part No 5017302 ManR Sun Serial No 002989 ManR Vendor Celestica ManR Initial HW Dash Level 03 ManR Initial HW Rev Level 01 ManR Shortname T1000_MB SpecPartNo 885 0505 04 FRU_PROM at PS0 SEEPROM SEGMENT ...

Page 52: ... 7A SPD Vendor Serial No d03f623 FRU_PROM at MB CMP0 CH0 R1 D0 SEEPROM SPD Timestamp MON OCT 03 12 00 00 2005 SPD Description DDR2 SDRAM 2048 MB SPD Manufacture Location SPD Vendor Infineon formerly Siemens SPD Vendor Part No 72T256220HR3 7A SPD Vendor Serial No d03fc26 FRU_PROM at MB CMP0 CH0 R1 D1 SEEPROM SPD Timestamp MON OCT 03 12 00 00 2005 SPD Description DDR2 SDRAM 2048 MB SPD Manufacture L...

Page 53: ...U_PROM at MB CMP0 CH3 R1 D0 SEEPROM SPD Timestamp MON OCT 03 12 00 00 2005 SPD Description DDR2 SDRAM 2048 MB SPD Manufacture Location SPD Vendor Infineon formerly Siemens SPD Vendor Part No 72T256220HR3 7A SPD Vendor Serial No d03ec27 FRU_PROM at MB CMP0 CH3 R1 D1 SEEPROM SPD Timestamp MON OCT 03 12 00 00 2005 SPD Description DDR2 SDRAM 2048 MB SPD Manufacture Location SPD Vendor Infineon formerl...

Page 54: ...ntended to test power on errors hardware upgrades or repairs Once the Solaris OS is running PSH provides run time diagnosis of faults Note Earlier versions of firmware have max as the default setting for the POST diag_level variable To set the default to min use the ALOM CMT command setsc diag_level min For validating hardware upgrades or repairs configure POST to run in maximum mode diag_level ma...

Page 55: ...edetermined settings stby The system cannot power on locked The system can power on and run POST but no flash updates can be made diag_mode off POST does not run normal Runs POST according to diag_level value service Runs POST with preset values for diag_level and diag_verbosity diag_level min If diag_mode normal runs minimum set of tests max If diag_mode normal runs all the minimum tests plus ext...

Page 56: ...splays functional tests with a banner and pinwheel normal POST output displays all test and informational messages max POST displays all test informational and some debugging messages TABLE 3 5 ALOM CMT Parameters Used for POST Configuration Continued Parameter Values Description ...

Page 57: ...Chapter 3 Server Diagnostics 3 25 FIGURE 3 5 Flow Chart of ALOM CMT Variables for POST Configuration ...

Page 58: ...nostic Preset Values diag_mode normal off service normal setkeyswitch The setkeyswitch parameter when set to diag overrides all the other ALOM CMT POST variables normal normal normal diag diag_level Earlier versions of firmware have max as the default setting for the POST diag_level variable To set the default to min use the ALOM CMT command setsc diag_level min min n a max max diag_trigger power ...

Page 59: ...eventing faulty hardware from potentially harming software In normal operation diag_level min POST runs in mimimum mode by default to test devices required to power on the server Replace any devices POST detects as faulty in minimum mode Run POST in maximum mode diag_level max for all power on or error generated resets and to validate hardware upgrades or repairs With maximum testing enabled POST ...

Page 60: ... 1 Switch from the system console prompt to the sc prompt by issuing the escape sequence 2 Set the virtual keyswitch to diag so that POST will run in Service mode 3 Reset the system so that POST runs There are several ways to initiate a reset The following example uses the powercycle command For other methods refer to the SPARC Enterprise T1000 Server Administration Guide ok sc sc setkeyswitch dia...

Page 61: ...yright 2005 Sun Microsystems Inc All rights reserved SUN PROPRIETARY CONFIDENTIAL Use is subject to license terms 0 0 VBSC selecting POST IO Testing 0 0 VBSC enabling threads 1 0 0 VBSC setting verbosity level 3 0 0 Start Selftest 0 0 Init CPU 0 0 Master CPU Tests Basic 0 0 CPU 0 0 0 DMMU Registers Access 0 0 IMMU Registers Access 0 0 Init mmu regs 0 0 D Cache RAM 0 0 DMMU TLB DATA RAM Access 0 0 ...

Page 62: ... 0 0 L2 Scrub Tags 0 0 Test Memory Basic 0 0 Probe and Setup Memory 0 0 INFO 4096MB at Memory Channel 0 3 Rank 0 Stack 0 0 0 INFO 4096MB at Memory Channel 0 3 Rank 0 Stack 1 0 0 INFO No memory detected at Memory Channel 0 3 Rank 1 Stack 0 0 0 INFO No memory detected at Memory Channel 0 3 Rank 1 Stack 1 0 0 0 0 Data Bitwalk 0 0 L2 Scrub Data 0 0 L2 Enable 0 0 Testing Memory Channel 0 Rank 0 Stack 0...

Page 63: ... 0 0 Run POST from Memory 0 0 Verifying checksum on copied image 0 0 The Memory s CHECKSUM value is cc1e 0 0 The Memory s Content Size value is 7b192 0 0 Success Checksum on Memory Validated 0 0 L2 Cache Ram Test 0 0 Enable L2 Cache 0 0 L2 Scrub Data 0 0 L2 Enable 0 0 CPU 0 0 0 CPU 0 0 0 Test slave strand registers 0 0 Extended CPU Tests 0 0 Scrub Icache 0 0 Scrub Dcache 0 0 D Cache Tags 0 0 I Cac...

Page 64: ...rint Mem Config 0 0 Caches Icache is ON Dcache is ON 0 0 Bank 0 4096MB 00000000 00000000 00000001 00000000 0 0 Bank 1 4096MB 00000001 00000000 00000002 00000000 0 0 Block Mem Test 0 0 Test 6291456 bytes at 00000000 00600000 Memory Channel 0 3 Rank 0 Stack 0 0 0 0 0 Test 6291456 bytes at 00000001 00000000 Memory Channel 0 3 Rank 0 Stack 1 0 0 0 0 IO Bridge Tests 0 0 IO Bridge Quick Read 0 0 0 0 0 0...

Page 65: ... unit 1 lpu init test 0 0 IO Bridge unit 1 link train port B 0 0 IO Bridge unit 1 interrupt test 0 0 IO Bridge unit 1 Config MB bridges 0 0 Config port B bus 2 dev 0 func 0 tag 5714 BRIDGE 0 0 Config port B bus 3 dev 8 func 0 tag PCIX BRIDGE 0 0 IO Bridge unit 1 PCI id test 0 0 INFO 10 count read passed for MB IOB_PCIEb BRIDGE Last read VID 1166 DID 103 0 0 INFO 10 count read passed for MB IOB_PCI...

Page 66: ...ng syntax INFO or WARNING message The following example shows a POST error message In this example POST is reporting a memory error at DIMM location MB CMP0 CH0 R1 D0 J0701 0 0 Data Bitwalk 0 0 L2 Scrub Data 0 0 L2 Enable 0 0 Testing Memory Channel 0 Rank 0 Stack 0 0 0 Testing Memory Channel 3 Rank 0 Stack 0 0 0 Testing Memory Channel 0 Rank 1 Stack 0 0 0 ERROR TEST Data Bitwalk 0 0 H W under test...

Page 67: ...rs Detected by POST In maximum mode POST detects and offlines memory devices with errors that could be correctable by PSH Use the examples in this section to verify if the detected memory devices are correctable Note For servers powered on in maximum mode without the intention of validating a hardware upgrade or repair examine all faults detected by POST to verify if the errors can be corrected by...

Page 68: ...r other methods refer to the SPARC Enterprise T1000 Server Administration Guide 4 Replace the DIMM if POST continues to fault the device in minimum mode CODE EXAMPLE 3 1 POST Fault for a Single DIMM sc showfaults v ID Time FRU Fault 1 OCT 13 12 47 27 MB CMP0 CH0 R0 D0 MB CMP0 CH0 R0 D0 deemed faulty and disabled sc enablecomponent name of DIMM sc setkeyswitch normal sc setsc diag_mode normal sc se...

Page 69: ...etected replace the detected devices 2 If a detected device is a single DIMM and the same DIMM is also detected by PSH replace the DIMM CODE EXAMPLE 3 3 Note The detected DIMM in the previous example must also be replaced because it exceeds the PSH page retire threshold CODE EXAMPLE 3 2 POST Fault for Multiple DIMMs sc showfaults v ID Time FRU Fault 1 OCT 13 12 47 27 MB CMP0 CH0 R0 D0 MB CMP0 CH0 ...

Page 70: ...RU is replaced ALOM CMT detects the repair and extinguishes the Service Required LED If ALOM CMT does not perform these actions use the enablecomponent command to manually clear the fault and remove the component from the ASR blacklist This procedure describes how to do this 1 After replacing a faulty FRU at the ALOM CMT prompt use the showfaults command to identify POST detected faults POST detec...

Page 71: ...ver to diagnose problems while the Solaris OS is running and mitigate many problems before they negatively affect operations The Solaris OS uses the fault manager daemon fmd 1M which starts at boot time and runs in the background to monitor the system If a component generates an error the daemon handles the error by correlating the error with data from previous errors and other related information...

Page 72: ...e Additional Predictive Self Healing information is available at http www sun com msg 3 5 1 Identifying PSH Detected Faults When a PSH fault is detected a Solaris console message similar to the following is displayed SUNW MSG ID SUN4V 8000 DX TYPE Fault VER 1 SEVERITY Minor EVENT TIME Wed Sep 14 10 09 46 EDT 2005 PLATFORM SPARC Enterprise T1000 CSN HOSTNAME wgs48 37 SOURCE cpumem diagnosis REV 1 5...

Page 73: ...mand the ALOM CMT showfaults command provides information about faults and displays fault UUIDs See Section 3 3 2 Running the showfaults Command on page 3 16 1 Check the event log using the fmdump command with v for verbose output In this example a fault is displayed indicating the following details Date and time of the fault Sep 14 10 09 Universal Unique Identifier UUID that is unique for every f...

Page 74: ...memory DIMM are being removed from service as errors are reported Impact Total system memory capacity will be reduced as pages are retired Suggested Action for System Administrator Schedule a repair procedure to replace the affected memory DIMM the identity of which can be determined using the command fmdump v u EVENT_ID Details The Message ID SUN4V 8000 DX indicates diagnosis has determined that ...

Page 75: ...e text Host detected fault Example If no fault is reported you do not need to do anything else Do not perform the subsequent steps If a fault is reported perform Step 2 through Step 4 rsrc mem component MB CMP0 CH0 R0 D0 J0601 In this example the DIMM location is MB CMP0 CH0 R0 D0 J0601 Refer to the Service Manual or the Service Label attached to the server chassis to find the physical location of...

Page 76: ...OS running on the server you have the full compliment of Solaris OS files and commands available for collecting information and for troubleshooting If POST ALOM or the Solaris PSH features do not indicate the source of a fault check the message buffer and log files for notifications for faults Hard drive faults are usually captured by the Solaris message files Use the dmesg command to view the mos...

Page 77: ... new messages file is automatically created The original contents of the messages file are rotated to a file named messages 1 Over a period of time the messages are further rotated to messages 2 and messages 3 and then deleted 1 Log in as superuser 2 Issue the following command 3 If you want to view all logged messages issue the following command 3 7 Managing Components With Automatic System Recov...

Page 78: ...o see the asrkeys on a given system Note A reset or power cycle is required after disabling or enabling a component If the status of a component is changed with power on there is no effect to the system until the next reset or power cycle 3 7 1 Displaying System Components The showcomponent command displays the system components asrkeys and reports their status At the sc prompt enter the showcompo...

Page 79: ...ASR blacklist 1 At the sc prompt enter the disablecomponent command 2 After receiving confirmation that the disablecomponent command is complete reset the server so that the ASR command takes effect sc showcomponent Keys ASR state clean sc showcomponent Keys ASR state Disabled Devices MB CMP0 CH3 R1 D1 dimm8 deemed faulty sc disablecomponent MB CMP0 CH3 R1 D1 SC Alert MB CMP0 CH3 R1 D1 disabled sc...

Page 80: ...ses the system by continuously running a comprehensive battery of tests Sun provides the SunVTS software for this purpose This section describes the tasks necessary to use SunVTS software to exercise your server Section 3 8 1 Checking Whether SunVTS Software Is Installed on page 3 48 Section 3 8 2 Exercising the System Using SunVTS Software on page 3 49 3 8 1 Checking Whether SunVTS Software Is In...

Page 81: ...equires that you specify one of two security schemes to use when running SunVTS The security scheme you choose must be properly configured in the Solaris OS for you to run SunVTS For details refer to the SunVTS User s Guide SunVTS software features both character based and graphics based interfaces This procedure assumes that you are using the graphical user interface GUI on a system running the C...

Page 82: ...Supplement SPARC 3 8 3 Using SunVTS Software 1 Log in as superuser to a system with a graphics display The display system should be one with a frame buffer and monitor capable of displaying bitmap graphics such as those produced by the SunVTS GUI 2 Enable the remote display On the display system type where test system is the name of the server you plan to test 3 Remotely log in to the server as su...

Page 83: ...SunVTS GUI 5 Expand the test lists to see the individual tests The test selection area lists tests in categories such as Network as shown in FIGURE 3 7 To expand a category left click the icon expand category icon to the left of the category name ...

Page 84: ... right clicking on the name of the test For example in FIGURE 3 7 right clicking on the text string ce0 nettest brings up a menu that enables you to configure this Ethernet test TABLE 3 8 Useful SunVTS Tests to Run on This Server SunVTS Tests FRUs Exercised by Tests cmttest cputest fputest iutest l1dcachetest dtlbtest and l2sramtest indirectly mptest and systest DIMMS motherboard disktest Disks ca...

Page 85: ...the Reports menu This action opens a log window from which you can choose to view the following logs Information Detailed versions of all the status and error messages that appear in the test messages area Test Error Detailed error messages from individual tests VTS Kernel Error Error messages pertaining to SunVTS software itself You should look here if SunVTS software appears to be acting strange...

Page 86: ...3 54 SPARC Enterprise T1000 Server Service Manual April 2007 ...

Page 87: ...itch immediately shuts the system down when the cover is removed 4 1 Common Procedures for Parts Replacement Before you can remove and replace parts that are inside the server you must perform the following procedures Section 4 1 2 Shutting the System Down on page 4 2 Section 4 1 3 Removing the Server From a Rack on page 4 3 Section 4 1 4 Performing Electrostatic Discharge ESD Prevention Measures ...

Page 88: ...e SPARC Enterprise T1000 Server Administration Guide for log file information 2 Notify affected users Refer to your Solaris system administration documentation for additional information 3 Save any open files and quit all running programs Refer to your application documentation for specific information on these processes 4 Shut down the OS At the Solaris OS prompt issue the uadmin command to halt ...

Page 89: ...ing command from the ALOM sc prompt to locate the system that requires maintenance Once you have located the server press the Locator button to turn it off 2 Check to see that no cables will be damaged or interfere when the server chassis is removed from the rack 3 Disconnect the power cord from the power supply Note After you have disconnected the power cord from the power supply you must wait ab...

Page 90: ...ease tab on both mounting brackets to release the right and left mounting brackets then pull the server chassis out of the rails FIGURE 4 2 The mounting brackets slide approximately 4 in 10 cm farther before disengaging FIGURE 4 2 Location of the Mounting Bracket Release Buttons 7 Set the chassis on a sturdy work surface ...

Page 91: ...e an antistatic wrist strap 4 1 5 Removing the Top Cover Access to all field replaceable units FRUs requires the removal of the top cover Note Never run the system with the top cover removed The top cover must be in place for proper air flow The cover interlock switch immediately shuts the system down when the cover is removed Caution The system supplies 3 3 Vdc standby power to the circuit boards...

Page 92: ...4 6 SPARC Enterprise T1000 Server Service Manual April 2007 FIGURE 4 3 Location of Top Cover Release Button Cover release button Top cover ...

Page 93: ...eplacing the Power Supply on page 5 5 Section 5 4 Replacing the Hard Drive Assembly on page 5 7 Section 5 5 Replacing a Hard Drive on page 5 12 Section 5 6 Replacing DIMMs on page 5 19 Section 5 7 Replacing the Motherboard and Chassis on page 5 25 Section 5 8 Replacing the Clock Battery on page 5 27 For a list of FRUs see Appendix A Note Never attempt to run the system with the cover removed The c...

Page 94: ...re to remove the optional low profile PCI Express PCI E card from the server 1 Perform the procedures described in Chapter 4 2 Remove any cables that are attached to the card 3 On the rear of the chassis pull the release lever that secures the PCI Express card to the chassis FIGURE 5 1 FIGURE 5 1 Releasing the PCI Express Card Release Lever PCI E card Release lever ...

Page 95: ...is procedure to replace the PCI Express cards 1 Unpack the replacement PCI Express card and place it on an antistatic mat Note Only low profile PCI Express cards with low brackets fit into the chassis There are a variety of PCI Express cards on the market Read the product documentation for your device for additional installation requirements and instructions that are not covered here 2 Insert the ...

Page 96: ...cribed in Chapter 6 5 2 Replacing the Fan Tray Assembly 5 2 1 Removing the Fan Tray Assembly 1 Perform the procedures described in Chapter 4 2 Disconnect the fan power cable from the motherboard 3 Push in on the clasps on both sides of the fan assembly FIGURE 5 3 FIGURE 5 3 Removing the Fan Tray Assembly 4 Remove the fan assembly from the sheet metal mounting brackets Fan tray assembly ...

Page 97: ...until the clasps on each side lock it into place 3 Reconnect the fan power cable to the motherboard 4 Perform the procedures described in Chapter 6 5 3 Replacing the Power Supply 5 3 1 Removing the Power Supply 1 Perform the procedures described in Chapter 4 2 Disconnect the power cable from the motherboard and pull the cable through the midwall 3 Pull the fastener up on the front of the power sup...

Page 98: ...he Power Supply 1 Unpack the replacement power supply 2 Slide the power supply into the chassis and engage the two alignment pins in the rear of the chassis that mate with the power supply 3 Push the fastener down on the front of the power supply to lock it into place in the chassis FIGURE 5 5 Fastener Power supply ...

Page 99: ...pter 6 6 At the sc prompt issue the showenvironment command to verify the status of the power supply 5 4 Replacing the Hard Drive Assembly 5 4 1 Removing the Single Drive Assembly 1 Disconnect the drive cable from the data power connector at the rear of the hard drive FIGURE 5 6 2 Pull the fasteners up on the rear of the single drive assembly and remove the single drive assembly from the chassis F...

Page 100: ...1 Unpack the drive assembly and the dual drive cable The drive assembly should be shipped to you with one or two drives already installed in the assembly depending on the type of drive assembly that you ordered 2 Disconnect the drive cable from the data and power connectors on the motherboard and remove the drive cable from your server FIGURE 5 7 ...

Page 101: ...Chapter 5 Replacing Field Replaceable Units 5 9 FIGURE 5 7 Location of Drive Power and Data Connectors on the Motherboard Power connector Data connector J5002 Data connector J5003 ...

Page 102: ...ive to get a clear view of the data power connector on the lower drive If you have a single drive assembly plug the DRIVE 0 connector into the data power connector at the rear of the drive If you have a dual drive assembly make the following connections to the two drives Plug the DRIVE 0 connector into the data power connector on the lower drive Plug the DRIVE 1 connector into the data power conne...

Page 103: ... so that the cover hangs over the rear of the server by about an inch 2 5 cm 13 Slide the cover forward until it latches into place 14 Reinstall the server in the rack and apply power to the server Refer to the SPARC Enterprise T1000 Server Service Manual for those instructions 15 Label the hard drives if necessary If you installed a single drive 3 5 inch SATA drive assembly then your hard drive s...

Page 104: ...hard drive The procedures that you perform at this point depend on how your data is configured You might need to partition the drive create file systems load data from backups or have the data updated from a RAID configuration 5 5 Replacing a Hard Drive To remove a hard drive from a single drive assembly go to Section 5 5 1 Replacing a Hard Drive in a Single Drive Assembly on page 5 12 To remove a...

Page 105: ...teners up on the rear of the single drive assembly and remove the assembly from the chassis FIGURE 5 9 FIGURE 5 9 Removing the Single Drive Assembly 5 5 1 2 Installing the Hard Drive in a Single Drive Assembly 1 Unpack the replacement single drive assembly 2 Slide the single drive assembly into the chassis until it mates with the front of the chassis FIGURE 5 10 ...

Page 106: ...e cable installed in your system connect the DRIVE 0 connector on the cable to the data power connector at the rear of the drive Do not connect the DRIVE 1 connector on the cable to the data power connector at the rear of the drive in a single drive assembly 6 Perform the procedures described in Chapter 6 7 Perform the necessary administrative tasks to reconfigure the hard drive The procedures tha...

Page 107: ...ing a Hard Drive in a Dual Drive Assembly 1 Perform the procedures described in Chapter 4 2 Disconnect the drive cable from the data and power connectors on the motherboard FIGURE 5 11 FIGURE 5 11 Location of Drive Power and Data Connectors on the Motherboard Power connector Data connector J5002 Data connector J5003 ...

Page 108: ...lly the boot drive 5 Remove the drive from the drive bracket If you are removing the lower drive you must first remove the upper drive before you can remove the lower drive To remove the upper drive drive 1 a Disconnect the drive cable from the data power connector on the upper drive b Push the drive toward the back of the drive bracket and lift the drive away from the bracket To remove the lower ...

Page 109: ...data power connector on the lower drive Make sure the connector is correctly oriented before plugging it into the data power connector on the drive To replace the upper drive drive 1 a Install the replacement drive in the upper drive slot in the drive bracket b Push the drive firmly toward the front of the drive bracket until the hard drive is completely seated c Plug the DRIVE 1 connector on the ...

Page 110: ...efer to FIGURE 5 11 for the location of the J5003 data connector 9 Plug the data connector marked J5002 on the cable to the J5002 data connector on the motherboard the connector closest to the power supply Refer to FIGURE 5 11 for the location of the J5002 data connector 10 Perform the procedures described in Chapter 6 11 Use the Solaris format utility to label the 2 5 inch SAS hard drives Refer t...

Page 111: ... Solaris PSH See Section 3 4 5 Correctable Errors Detected by POST on page 3 35 Caution This procedure requires that you handle components that are sensitive to static discharges that can cause the component to fail To avoid this problem follow the antistatic practices as described in Chapter 4 1 Perform the procedures described in Chapter 4 2 Locate the DIMM that you want to remove Use FIGURE 5 1...

Page 112: ...MM names in messages are displayed with the full name such as MB CMP0 CH0 R1 D1 but this table lists the DIMM name with the preceding MB CMP0 omitted for clarity 3 Note the DIMM location so that you can install the replacement DIMM in the same socket 4 Push down on the ejector levers on each side of the DIMM until the DIMM is released TABLE 5 1 DIMM Names and Socket Numbers Socket Number DIMM Name...

Page 113: ...acement DIMMs and place them on an antistatic mat 2 Ensure that the socket ejector tabs are in the open position 3 Line up the replacement DIMM with the connector 4 Push the DIMM into the socket until the ejector tabs lock the DIMM in place 5 Perform the procedures described in Chapter 6 Note You must replace the top cover as instructed in the Chapter 6 chapter before proceeding with these instruc...

Page 114: ...ble the FRU 8 Perform the following steps to verify the repair a Set the virtual keyswitch to diag so that POST will run in Service mode sc showfaults v ID Time FRU Fault 0 SEP 09 11 09 26 MB CMP0 CH0 R0 D0 Host detected fault MSGID SUN4V 8000 DX UUID f92e9fbe 735e c218 cf87 9e1720a28004 sc showfaults v ID Time FRU Fault 1 OCT 13 12 47 27 MB CMP0 CH0 R0 D0 MB CMP0 CH0 R0 D0 deemed faulty and disab...

Page 115: ...main at the ok prompt If the system is at the ok prompt type boot d Return the virtual keyswitch to normal mode e Issue the Solaris OS fmadm faulty command No memory or DIMM faults should be displayed If faults are reported refer to the diagnostics flow chart in FIGURE 3 1 for an approach to diagnose the fault sc poweron sc console 0 0 POST Passed all devices 0 0 0 0 DEMON Diagnostics Engineering ...

Page 116: ... you do not need to proceed with the following steps because the fault is cleared 11 Run the clearfault command 12 Switch to the system console 13 Issue the fmadm repair command with the UUID Use the same UUID that you used with the clearfault command sc showfaults v ID Time FRU Fault 0 SEP 09 11 09 26 MB CMP0 CH0 R0 D0 Host detected fault MSGID SUN4V 8000 DX UUID f92e9fbe 735e c218 cf87 9e1720a28...

Page 117: ... Hard Drive on page 5 12 6 Remove all DIMMs from the motherboard assembly See Section 5 6 Replacing DIMMs on page 5 19 7 Remove the socketed system configuration SEEPROM from the motherboard and place it on an antistatic mat The system configuration SEEPROM contains the persistent storage for the host ID and Ethernet MAC addresses of the system as well as the ALOM configuration including the IP ad...

Page 118: ... on page 5 5 4 Replace the hard drive and cable See Section 5 5 Replacing a Hard Drive on page 5 12 5 Replace the memory DIMMs See Section 5 6 Replacing DIMMs on page 5 19 6 Replace the socketed system configuration SEEPROM The location of this SEEPROM is shown in Appendix A 7 Perform the procedures described in Chapter 6 8 Boot the system and run POST to verify that the system is fully operationa...

Page 119: ...edures described in Chapter 4 2 Using a small flathead screwdriver carefully pry the battery from the motherboard FIGURE 5 15 FIGURE 5 15 Removing the Clock Battery From the Motherboard 5 8 2 Installing the Clock Battery on the Motherboard 1 Unpack the replacement battery 2 Press the new battery into the motherboard with the facing upward FIGURE 5 16 ...

Page 120: ...e Clock Battery on the Motherboard 3 Perform the procedures described in Chapter 6 4 Use the ALOM setdate command to set the day and time Use the setdate command before you power on the host system For details about this command refer to the Advanced Lights Out Management ALOM CMT Guide ...

Page 121: ...des the finishing tasks in servicing your server 6 1 1 Replacing the Top Cover 1 Place the top cover on the chassis Set the cover down so that the cover hangs over the rear of the server by about an inch 2 5 cm 2 Slide the cover forward until it latches into place 6 1 2 Reinstalling the Server Chassis in the Rack 1 Refer to the SPARC Enterprise T1000 Server Installation Guide for installation inst...

Page 122: ...isconnected the power cord from the power supply you must wait about five seconds before reconnecting the power cord to the power supply Reconnect the power cord to the power supply Note As soon as the power cord is connected standby power is applied Depending on the configuration of the firmware the system might boot ...

Page 123: ...eable units FRUs in the server TABLE A 1 lists the FRUs Note that item number 4 in FIGURE A 1 is a 3 5 inch SATA drive used in the single drive configuration The 2 5 inch SAS drives used in the dual drive configuration look different but would be installed in the same location in the server ...

Page 124: ...A 2 SPARC Enterprise T1000 Server Service Manual April 2007 FIGURE A 1 Field Replaceable Units 1 2 3 4 5 6 8 7 ...

Page 125: ...ray Assembly on page 5 4 A single assembly containing 4 fans FT0 4 Hard drives Section 5 5 Replacing a Hard Drive on page 5 12 One of the following configurations One SATA disk drive 3 5 inch form factor Two SAS disk drives 2 5 inch form factor HDD0 HDD1 5 Power supply unit PS Section 5 3 Replacing the Power Supply on page 5 5 The power supply provides 3 3 Vdc standby power at 3 3 Amps and 12 Vdc ...

Page 126: ...A 4 SPARC Enterprise T1000 Server Service Manual April 2007 ...

Page 127: ...tected faults 3 38 clearing PSH detected faults 3 43 clock battery installing 5 27 removing 5 27 components disabled 3 46 3 47 components displaying the state of 3 46 connecting to ALOM CMT 3 13 console 3 14 console command 3 14 3 29 consolehistory command 3 14 D DDR 2 memory DIMMs 3 7 diag_level parameter 3 23 3 26 diag_mode parameter 3 23 3 26 diag_trigger parameter 3 23 3 26 diag_verbosity para...

Page 128: ...17 motherboard and chassis 5 25 PCI Express card 5 3 power supply 5 6 top cover 5 11 6 1 installing the server in the rack 6 1 L LEDs AC OK 3 4 Power OK 3 4 log files viewing 3 45 M memory configuration 3 7 fault handling 3 6 message ID 3 40 messages file 3 44 motherboard and chassis installing 5 25 removing 5 25 P PCI Express card installing 5 3 removing 5 2 POST detected faults 3 4 3 16 POST see...

Page 129: ...17 showfaults command 3 4 description and examples 3 16 syntax 3 15 troubleshooting with 3 5 showfru command 3 15 3 19 showkeyswitch command 3 15 showlocator command 3 15 showlogs command 3 15 showplatform command 3 15 Solaris log files 3 4 Solaris OS collecting diagnostic information from 3 44 Solaris Predictive Self Healing PSH detected faults 3 4 SunVTS 3 2 3 4 exercising the system with 3 49 r...

Page 130: ...Index 4 SPARC Enterprise T1000 Server Service Manual April 2007 ...

Page 131: ......

Page 132: ......

Reviews: