[ale] RAID 5 - do I just trash the array & start over?

Keith Hopkins hne at hopnet.net
Sat Apr 9 20:40:53 EDT 2005


Hi Greg,

   After recently recovering from a multiple drive failure on a Raid5 setup, I think I can say 'its (probably) not toast'.

   First, see which drive you are getting the resets on.  If you are getting resets on multiple drives, do some isolation troubleshooting and consider replacing the cables or even the controller.  Load up a unused drive, and use ddrescue to copy as much data as possible from the resetting drive to the unused drive.  Now, swap your 'fresh' drive with the one which was resetting, being sure to remove the 'resetting' drive completely from the system, and try to `assemble` the the array again.  There is no point in trying to recover your system on flaky hardware.

   MD status is in /proc/mdstat.  Just `cat` it.

--Keith


Gregory C. Johnsom wrote:
> Hello world,
> 
> Cutting to the chase, I have a RAID 5 array created with a "missing" 
> drive and a RAID0 assemblage as placeholders for where the data source 
> drives will go.
> 
> I was not able to set up the RAID atomically, and since starting the 
> process suffered a fried PS.  On the new box, I set up the the RAID 
> using the script I originally developed (creatively named "bootRaid") 
> and tried to  finish the process.  It did not go well, and dmesg started 
> showing a lot of channel resets. (If memory serves).  I dropped the box, 
> and upon reboot md0 (the big RAID5) refused to start.  I've waited 2-3 
> days for the sync to complete, and it has not.  Just before the last 
> operation the array showed several hundred gig of data on it, so it's 
> worth salvaging if I can.
> 
> During this time, I've tried to find anything resembling current 
> guidance on the md drivers and recovery thereof.  I have yet to find 
> anything that will indicate whetehr a re-sync is actually in progress 
> and if so where it stands.
> 
> The system has no OS yet, so I'm running Knoppix3.7 using 2.6 for the 
> EVMS (LVM2?) support.
> 
> Is a dirty, degraded raid5 array toast?
> If yes, I assume this is why most controllers emphasize RAID 01/10.  I 
> favor 01 in this scenario.  What think you?
> 
> If no, how can I get this thing back up and avoid wasting days on "high 
> availability" again?
> 
> Thanks,
> -Greg
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3383 bytes
Desc: S/MIME Cryptographic Signature




More information about the Ale mailing list