Nov
27
Sooner or later hard drives fail…
That’s why we use RAID arrays. The best solution is to use hardware RAID – one assisted with specialized processor on the board. In that category I do not include cheap (called Fake RAID) solutions integrated on the motherboard.
But unfortunately sometimes real RAID controllers are too pricey – here on help comes software RAID.
The good news is that it is included in most of the recent OS. Linux does not make an exception and the software included is really well optimized and even recommended to achieve better performance over Fake RAID.
…
Now in case of failure we are protected, but RAID 1 and 5 will protect the data in case of one drive failure so it is better to replace the failed drive as soon as possible, but on other side you do not want to stop the machine right now.
NOTE: If you have IDE HDD do not use following procedure. IDE drives are NOT HOTSWAPPABLE and removing it may cause MORE DAMAGE.
This is valid also for ordinary s-ATA and SCSI drives.
In case that you have hotswappable drive SCA or similar you can replace the drive when the machine is working.
If you are not sure check the documentation that come with your hardware.
And now after all this precautions let’s start:
Determine the failed drive
To check what array and what drive have problem simply type:
cat /proc/mdstat
Here is sample output:
md2 : active raid5 sdd2[4](F) sda2[0] sdc1[2] sdb2[1]
106221312 blocks level 5, 256k chunk, algorithm 2 [4/3] [UUU_]
In this case the problem is sdd.
Check drive size and type
For the size type:
fdisk -l
And look for sdd in the output.
To check exact drive model type:
dmesg|less
and again look just before SCSI device sdd:
Next step is to obtain replacement drive
(ideally the same model)
Dump the partition table from the drive, if it is still readable:
sfdisk -d /dev/sdd > partitions.sdd
Remove the drive to replace from the array:
mdadm /dev/md2 -r /dev/sdd2
Look up the Host, Channel, ID and Lun of the drive to replace,
by looking in
cat /proc/scsi/scsi
Remove the drive from the bus
echo "scsi remove-single-device 1 0 3 0" > /proc/scsi/scsi
Verify that the drive has been correctly removed
by looking in
cat /proc/scsi/scsi
Physically replace the drive
Unplug the drive from your SCA bay, and insert a new drive
Add the new drive to the bus:
echo "scsi add-single-device 1 0 3 0" > /proc/scsi/scsi
(this should spin up the drive as well)
Recreate the layout
Re-partition the drive using the previously dumped partition table:
sfdisk /dev/sdd < partitions.sdd
If failed drive was unreadable here you need to create new partitions
Add the drive to your array
mdadm /dev/md2 -a /dev/sdd2
You can check if the operation was successful by issuing
cat /proc/mdstat Inspired with modifications from
Comments
Leave a Reply
