Nov
27
Sooner or later hard drives fail…
That’s why we use RAID arrays. The best solution is to use hardware RAID – one assisted with specialized processor on the board. In that category I do not include cheap (called Fake RAID) solutions integrated on the motherboard.
But unfortunately sometimes real RAID controllers are too pricey – here on help comes software RAID.
The good news is that it is included in most of the recent OS. Linux does not make an exception and the software included is really well optimized and even recommended to achieve better performance over Fake RAID.
…
Now in case of failure we are protected, but RAID 1 and 5 will protect the data in case of one drive failure so it is better to replace the failed drive as soon as possible, but on other side you do not want to stop the machine right now.
NOTE: If you have IDE HDD do not use following procedure. IDE drives are NOT HOTSWAPPABLE and removing it may cause MORE DAMAGE.
This is valid also for ordinary s-ATA and SCSI drives.
In case that you have hotswappable drive SCA or similar you can replace the drive when the machine is working.
If you are not sure check the documentation that come with your hardware.
And now after all this precautions let’s start:
Determine the failed drive
To check what array and what drive have problem simply type:
cat /proc/mdstat
Here is sample output:
md2 : active raid5 sdd2[4](F) sda2[0] sdc1[2] sdb2[1]
106221312 blocks level 5, 256k chunk, algorithm 2 [4/3] [UUU_]In this case the problem is sdd.
Check drive size and type
For the size type:
fdisk -lAnd look for sdd in the output.
To check exact drive model type:
dmesg|lessand again look just before SCSI device sdd:
Next step is to obtain replacement drive
(ideally the same model)
Dump the partition table from the drive, if it is still readable:
sfdisk -d /dev/sdd > partitions.sddRemove the drive to replace from the array:
mdadm /dev/md2 -r /dev/sdd2Look up the Host, Channel, ID and Lun of the drive to replace,
by looking in
cat /proc/scsi/scsiRemove the drive from the bus
echo "scsi remove-single-device 1 0 3 0" > /proc/scsi/scsiVerify that the drive has been correctly removed
by looking in
cat /proc/scsi/scsiPhysically replace the drive
Unplug the drive from your SCA bay, and insert a new drive
Add the new drive to the bus:echo "scsi add-single-device 1 0 3 0" > /proc/scsi/scsi(this should spin up the drive as well)
Recreate the layout
Re-partition the drive using the previously dumped partition table:
sfdisk /dev/sdd < partitions.sddIf failed drive was unreadable here you need to create new partitions
Add the drive to your array
mdadm /dev/md2 -a /dev/sdd2You can check if the operation was successful by issuing
cat /proc/mdstat Inspired with modifications fromComments
Leave a Reply
You must be logged in to post a comment.