Fixing degraded RAID array problems

I have a 4-disk enclosure in my server that I’ve set up as a pair of RAID-1 devices.  While I also do nightly backups, this set-up has meant that I haven’t needed the backups for the past several years, despite three disk failures in that time.  The problem I encountered recently involves my inability to discover how to force mdadm to sync a new, good disk into a an array where the existing used disk has errors.

My biggest problem with the enclosure is how Linux numbers the drives. For some reason, the disks are numbered – from physical location, top to bottom – sda, sdc, sdb, sdd.

Before going further, here was my configuration:

As you can see, they are combined in physical order, not logical linux order. As rarely as I need to swap drives, I always forget this, so while swapping a drive out recently, I accidentally turned off the wrong disk. sdc had failed, but I counted down from the top (as I always do) and turned of sdb. I turned it back on, but alas, mdadm doesn’t appear to re-enable spares automatically (at least not on my system), so I was left with two degraded arrays: md0[sda1], md1[sdd1].

I completed upgrading md0, replacing both drives with larger capacity ones; that was no problem.  For the record, the sequence for this was:

  1. Remove the bad disk:

    mdadm --manage /dev/md0 --remove /dev/sdc1

  2. Swap out the physical sdc disk with the new disk

  3. Partition sdc (I did not copy the partition table for this; I made a single partition from the whole disk)

  4. Add the new disk:

    mdadm --manage /dev/md0 --add /dev/sdc1

  5. Wait for sync to complete

  6. Swap out sda:

    mdadm --manage /dev/md0 --fail /dev/sda1 mdadm --manage /dev/md0 --remove /dev/sda1

  7. Swap out physical sda disk

  8. Partition the new sda with the partition table from sdc:

    sfdisk -d /dev/sdc | sfdisk /dev/sda

  9. Add the new disk back in:

    mdadm --manage /dev/md0 --add /dev/sda1

  10. Wait for sync to complete

  11. Let mdadm know that the disks are larger:

    mdadm --grow /dev/md0 --size=max

  12. Let LVM know the array is larger:

    pvresize /dev/md0

Pretty easy and straightforward; the most painful part is waiting eight hours for each of the sync operations, but this took md0 from 500GB to 2TB with the new disks.

Now to the problem: when I re-added sdb1 to md1, it would sync for 6 hours and then, at around 90% complete, would come up with disk errors on sdd1.

A small digression: somebody else had exactly the same problem as I did, and posted on serverfault.  If you read that page carefully, you will notice that not a single respondent actually read the details of the post.  Every response was “your drive is teh bad! No can has add bad drive back to array!”  No, our problem is that the spare is good; it’s the active sync that’s bad.

The question is: how do I replace a bad active disk with a good disk in a RAID-1 configuration?  Google was no help, and I stopped short of trying --force with mdadm.  The problem is that mdadm fails to complete a sync and kicks out the good disk when it encounters problems on the bad active disk.

My first attempts were to fix the bad sectors by forcing the drive to re-allocate them.

em.

the drive to re-allocate them.

ate them.

them.

to re-allocate them.