Redundancy? We don’t need no stinkin’ redundancy!

We recently experienced a hard drive failure on one of our critical Linux servers. The server stopped responding on a weekend, of course. (Why do failures inevitably occur outside the hours that I'm normally in the office?) Since the server has two drives and was staged by the application vendor, I just assumed it was set up with a RAID1 mirror and at worst I would have to remove the failed drive and reboot to get it back up and running in degraded state. It turns out the drives were set up as a RAID0 volume with no redundancy. When the single drive failed, it took the whole volume down.

I was eventually able to get the drive back online by reseating it and resetting the RAID adapter. I called vendor support to ask why the volume was set up with no redundancy and the answer was, "Our staging group doesn't configure servers that way. We always set them up with redundant RAID volumes."

"Well, thanks for the info, but I have a server here with a RAID0 volume that was provided by your staging group," I said.

"Sorry, there must be some mistake. See, we don't set servers up that way."

"(Sigh) Ok, thanks for your time." Weekend support was obviously not going to be any help.

I went through all the other servers that were a part of that system and were staged at the same time. All but one were configured as RAID0. We had received three additional severs that were staged later. They were configured with RAID1 redundancy, rather than RAID0.

We called our vendor rep Monday morning, and explained that the industry standard is to set your RAID volumes up with redundancy since hard drives tend to fail on occasion. He initiated an investigation into the problem and eventually admitted that there was a period of time that all Linux servers they shipped were configured as RAID0 rather than RAID1, but that the issue had been resolved. (The guy in their staging group who was setting them up that way was probably promoted to a manager or something, and the new guy knew more about industry standards.)

We asked them to provide us with a plan on transitioning to RAID1 on all the affected servers, but have not received a response yet. I suspect we will have to do it ourselves. Sigh. These vendors don't seem to have any contact with reality at times.

