Everything, Everything

2024: J F M A M J J A S O N
2023: J F M A M J J A S O N D
2022: J F M A M J J A S O N D
2021: J F M A M J J A S O N D
2020: J F M A M J J A S O N D
2019: J F M A M J J A S O N D
2018: J F M A M J J A S O N D
2017: J F M A M J J A S O N D
2016: J F M A M J J A S O N D
2015: J F M A M J J A S O N D
2014: J F M A M J J A S O N D
2013: J F M A M J J A S O N D
2012: J F M A M J J A S O N D
2011: J F M A M J J A S O N D
2010: J F M A M J J A S O N D
2009: J F M A M J J A S O N D
2008: J F M A M J J A S O N D
2007: J F M A M J J A S O N D
2006: J F M A M J J A S O N D
2005: J F M A M J J A S O N D
2004: J F M A M J J A S O N D
Failed Redundancy/Dead RAID5
Friday 11th July, 2008 13:18 Comments: 5
I recently rebuilt one of my test servers and imported four disks that were setup as software RAID5. It didn't go too well, which I thought was my fault for trying to import them at the same time as a Windows Defender update was being installed via Windows Update (Disk Management was hanging until I eventually managed to cancel the update's installation). I was left with two imported drives, two foreign drives and two missing drives. Obviously, the two foreign drives were the missing drives, but I already had myself a copy of dmpss.exe in order to fix things. I updated the group ID of the two foreign drives to match the new group ID of the successfully imported drives and rebooted the system. It came back up and started resyncing (the content should be fine). All was well until I suddenly lost a disk. Thinking it might have been due to using dmpss.exe, I rebooted the box and managed to get the missing drive to get picked up and the array started resyncing again. And then the disk screwed up again.

To rule out the 8 port controller card (that, to be honest, I trusted completely), I decided to hook the 4 drives to the motherboard (and moved the main hard disk onto the controller card - as the OS has drivers for the controller card so it could boot into Windows okay). Then I saw this:

Computer Says BAD

This isn't what you want to see. I don't know yet if the drive that's playing up is the BAD or DISABLED one listed above, but I haven't touched any of the SMART settings and I wasn't aware you could even disable SMART on these WD drives. Either way, I've ordered two new disks so I can add one to the array and get some redundancy back, which should allow me to pull the second dodgy looking disk out of the array and resync so all is well (or at least SMART says all the drives are OK). The drives are Western Digital RE2-GP (WD1000FYPS) drives with 5 year warranties that I bought earlier this year, so I'll RMA them once I'm done. Then I'll keep one as a cold spare just in case any of the drives go again.

Wish me luck. Once all that's done I'll move the second array across and hope that all those disks are fine.

EDIT: To go with the comment below, here's a pic of the 3rd Western Digital disk. RAID5 doesn't like it when 2 disks are missing and a third is dying. I've only lost about a terabyte of non-essential data, but it's not exactly pleasant.

SMART says BAD
Avatar Robert - Wednesday 16th July, 2008 19:22
It's gone from bad to worse. The first disk is definitely dead. The second disk is effectively dead (it will occasionally get detected, but then stops working). I noticed that a third drive was acting a bit slow after a cold boot, something which I was able to confirm when I hooked it up to my main machine. The SMART info, according to the wonderful free (for personal use) bit of software Active@ Hard Disk Monitor, suggests that the reallocated sector count (in amber, see above) and reallocation counts are looking bad on this disk, the power-off retract count is almost 14x higher than the "OK" drive, and the load/unload cycle count is almost 3x higher (the other WD drive was plugged into the same 8 port controller at the same time). This means that only one of the four Western Digital disks I purchased at the start of this year is still showing as "OK" according to SMART. Thankfully the disks come with a 5 year warranty, but this is terrible and makes me question how reliable the two new disks I purchased and any disks I get back from WD (if I perform an RMA) will be.
Avatar Fab - Thursday 17th July, 2008 11:00
Are you asking yourself the right question? Three faulty disks in one array sounds a bit more than just coincidence. Either WD quality control is really bad or something happened in your machine to 'break' those hard disks. I would explore both possibilities if I were you...

I do have a fair bit of stuff (some copied from you) if you want to copy?
Avatar Robert - Thursday 17th July, 2008 11:22
The second array appears to be fine, and those disks are plugged into an identical 8 port controller. One of the four disks doesn't appear to exhibit any problems, which makes me think that it probably is Western Digital's fault as a fault with the controller/driver should have affected all four drives in the same way. There is probably a problem with this particular batch (or perhaps a larger problem that affects this model?). I don't believe there is anything wrong with the new test server (the motherboard, RAM, CPU, HSF came from my main machine after the recent upgrade to a quad core CPU, and the main hard disk is one of the spare 250GB disks that came from the array that the 1TB drives replaced).

I'll probably let you know what I'm still after shortly before the next LAN party :)
Avatar Fab - Thursday 17th July, 2008 11:35
Not sure if I am going to that yet. Still the logistical issues and I have a date clash on as well. With a bit of cunning planning I might be able to overcome some of these, working on it...
Avatar Robert - Friday 15th May, 2009 13:43
A very popular thread appears to have started at the end of last year: http://www.silentpcreview.com/forums/viewtopic.php?t=51401

From there I've discovered WD made a tool available in January 2009 for three specific models, including the WD1000FYPS-01ZKB0 that I had: http://support.wdc.com/product/download.asp?groupid=609&sid=113&lang=en

It's a shame it's taken them about a year to admit there's something wrong and make a tool available. It seems to have mostly affected Linux users, but I've been running these disks on a Windows platform.
© Robert Nicholls 2002-2024
The views and opinions expressed on this site do not represent the views of my employer.
HTML5 / CSS3