Sometimes you are just trying too hard. In motor racing this is called overdriving the car, trying so hard to go faster that you overdo it and go slower, because you brake so late that you miss the corners, you hit the gas so early that you get wheel spin out of the corners, etcetera.
In IT there is the same phenomenon, however it is not as obvious as in motor racing. There is a lot of great technology, and each and every one of them has its fans, people who love it and say it is the best thing that ever happened. However you have to be careful, because technology has it’s limits. Especially with PCs, if you push the limit too much it will break and you get into trouble. When you get into trouble with PCs, you do not just get into trouble, you get into mayhem.
You probable guessed it by now, I am not writing this as a theory, I am writing it because of experience, recent experience. Due to hard drive space requirements I am running three drives, so when I added the two about a month ago I figured it was a nice opportunity to run in a RAID configuration. The Intel chipset on my motherboard allows me to run multiple levels of RAID on the same hard drive, enabling me to run a small portion RAID 1 for data security, and the rest RAID 0 for high speeds and a lot of space. Of course RAID 0 means a higher chance of lost data, as one broken drive means everything on that RAID volume is lost. However, I figured that chances of a hard drive breaking down are so small, that twice that is still acceptable, to me that is.
Well, I landed in a heap of trouble. First of all, I moved over my windows install from a regular drive to the raid volume. Of course Windows cannot handle this change (why would it, reinstalling it is fun!). It got slow and crappy, so I had to go through a reinstall, even though the previous one was only a couple of months old. Luckily I had skipped it when buying my new video card, otherwise it would have been the second reinstall in a few weeks time.
That is not so bad, right? Indeed it is. However I have a poster of Murphy’s Law on the wall behind my PC for a reason. You guessed it, one of the drives failed me after only a few weeks. However my luck is my luck, so I could recover most data. Most. Out of the 800 gigabytes of data on the volume I was able to save over 799 gigabytes. Only one file did not check out completely, just one file. Not bad right? Well, of course it had to be a file which I cannot easily replace, because it’s source is not easy to access. It could not be one of the 500 MB game demos on that disk, or a 50 KB script downloaded years ago. No, my luck dictates that it had to be one of the files I will have a lot of trouble replacing.
RAID 0 was a bad idea, though I would have had the same problem if it was a single disk. The real problem was recovering the data. To be able to check the disk’s SMART data I needed to take it out of the RAID array. To be able to do that I had to backup all data on the RAID array, as that is destroyed when taking the drive out of RAID. As I do not have that hard drive space available (it is why I bought those two) I had to buy another hard drive and store everything on that.
When I had done that I checked the drive’s SMART data, which signalled a high read error rate. A thorough test using a tool from the drive manufacturer (which lasted two hours) revealed two bad sectors and a dozen more LBA errors. Conclusion: the drive is most likely broken. The tool suggests a low level format, which I will perform this night. A new test will have to reveal whether that will fix the problem. Both the format and the new test will last for hours, rendering my PC useless in the mean time.
The problem was not really worsened by the fact that it was a RAID array, as the bad sector would have destroyed some data if it was in a single drive configuration as well. However, the array did make resolving the problem much, much more difficult. If I had been running three separate drives I would have had enough space on two to temporarily store all data from the third, which would have saved me from purchasing another drive. Meanwhile the windows install from a few weeks ago had to be moved back to a single drive configuration, making it extremely slow and unsuitable for anything but browsing the internet, chatting on IRC and listening to MP3 music. I will have to spend a lot of time reinstalling windows, (most likely) RMA-ing the broken drive, redistributing my data across three drives and getting everything up and running again. I know better ways of spending free time to be honest.
All in all I can only conclude that the more advanced technology I was using (RAID) did not only create the need for reinstalls (a lot of inconvenience), it also made resolving a problem (bad sectors on a HDD) a lot more difficult. In other words: by using RAID in the configuration I was running it I was trying to outsmart technology. I failed. I learned my lesson though, in my new configuration I will be running software raid to protect a small set of important data from HDD drive failure, and RAID 1 for game installs only, where data loss really does not matter.
I thought about protecting all my data using RAID 5, however the high price tag combined with the write speed penalty made me decide against it. RAID 10 has a good performance but is even more expensive. Maybe in a year or so I will reconsider this. Even though software RAID has a big disadvantage (you cannot run your operating system off it) I chose that for one important reason: if your RAID controller breaks down (the motherboard in my case) you will be able to access the array on another PC, you cannot do that using firmware RAID. It would not be the first broken motherboard I had…
Bottom line: do not try to outsmart technology, it will send Murphy after you who will hunt you down and punish you as easy as stealing candy from a baby.
2 comments so far...
raid… only if you got a lot of cash
Nice article. Thanks.
Eugene
leave a reply