Voltar ao site

Replacing a physical disk in the Exadata compute node

26 de abril de 2023

Quick blog post showing how to replace a physical disk in an Exadata compute node.

Check the disk enclosure:

[root@cmx1db02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a0
Number of enclosures on adapter 0 -- 1

Enclosure 0:
Device ID : 252
Number of Slots : 8
Number of Power Supplies : 0
Number of Fans : 0
Number of Temperature Sensors : 0
Number of Alarms : 0
Number of SIM Modules : 1
Number of Physical Drives : 4
Status : Normal
Position : 0
Connector Name : Unavailable
Enclosure type : SGPIO
FRU Part Number : N/A
Enclosure Serial Number : N/A
ESM Serial Number : N/A
Enclosure Zoning Mode : N/A
Partner Device Id : Unavailable

Inquiry data :
Vendor Identification : LSI
Product Identification : SGPIO
Product Revision Level : N/A
Vendor Specific :

Exit Code: 0x00

Get the enclosure ID:

[root@cmx1db02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a0 | grep ID
Device ID : 252

Look for the failed disk:

[root@cmx1db02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|predictive|firmware"
Slot Number: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: A2A8
Slot Number: 1
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: A690
Slot Number: 2
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: A2A8
Slot Number: 3
Predictive Failure Count: 5
Last Predictive Failure Event Seq Number: 82935
Firmware state: Unconfigured(bad)
Device Firmware Level: 0E71

If the LED is not turned on, do this to identify the failed disk:

[root@cmx1db02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[252:0] -a0

Get disk status after it is physically replaced:

[root@cmx1db02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|firmware|target|state|predictive"
Slot Number: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: A2A8
Foreign State: None
Slot Number: 1
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: A690
Foreign State: None
Slot Number: 2
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: A2A8
Foreign State: None
Slot Number: 3
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Copyback
Device Firmware Level: 0E71
Foreign State: None

If the disk rebuild in progress you can monitor with this command:

[root@cmx1db02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physdrv [252:3] -a0
Device(Encl-252 Slot-3) is not in rebuild process

Exit Code: 0x00

Check all disk info:

[root@cmx1db02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -physdrv[252:3] -a0
Enclosure Device ID: 252
Slot Number: 3
Enclosure position: 0
Device Id: 14
WWN: 5000C50031DD8F4C
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Logical Sector Size: 512
Physical Sector Size: 512
Firmware state: Copyback
Device Firmware Level: 0E71
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c50031dd8f4d
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST930003SSUN300G0E711050725AMD
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified
Drive Temperature :24C (75.20 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: Unknown
Drive has flagged a S.M.A.R.T alert : No

Exit Code: 0x00

The disk rebuild is now in progress and you can monitor until its completion:

[root@cmx1db02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdcpybk -showprog -physdrv [252:3] -a0
Copyback Progress on Device at Enclosure 252, Slot 3 Completed 48% in 34 Minutes.

Exit Code: 0x00

That is all for today folks.

See you on my next post.

Franky