In an earlier blog entry on how we evaluated various virtualization technologies, I also wrote about using Linux Software RAID (md) instead of FAKERAID/HOSTRAID.
RAID as most know it is a technology that allows the grouping for multiple physical hard drives into one logical drive for either redundancy, performance or both. But the technology is not without caveats.
I found a good blog entry that explains in quite a bit of detail why one should stay away from FAKERAID/HOSTRAID and use Software RAID instead. If you need to build a high performance system, stick to a good hardware RAID vendor. Here are two takeaway pointers from the blog entry above:
- “The FRAID driver is both an interface and, more importantly, the RAID logic.” What this means is that without a proper vendor supplied driver (the Linux HOSTRAID driver is not a proper vendor supplied driver), you will likely be running a RAID without you even knowing if it works. In our tests, we unplugged a disk and then attempted to rebuild the array. With the Linux HOSTRAID driver, we were unable to find out if the rebuild of the array was actually happening.
- “15MBps! Welcome back to i486-era Programmed I/O (PIO)! … You have disks today of 50-80+MBps, and you can’t even break old Programmed I/O (PIO) Mode 4 or Mode 5 …” In summary, don’t run software RAID if performance matters. Otherwise if you run software RAID, stick to mirror (RAID-1) or stripe (RAID-0) or a combination of both (RAID-10). Don’t run software RAID 5 because of the way RAID 5 works; a degraded array will greatly impact performance.
LVM is another great technology but just like RAID, it is not without caveats either.
Ever attempted recovery of a filesystem on a disk with LVM using another system that’s also running LVM? What about cloning a disk with LVM from one system to another?
In both cases, I’ve bumped into numerous problems trying to get LVM to work. You’ll have volume groups name conflict (VolGroup00 is the default in RHEL/CentOS and probably some other distros) and booting problems with your Linux kernel complaining about not able to mount /dev/root. The latter would be fixed by running mkinitrd, but this is not common knowledge to most level 1/2 sysadmins.
Evaluate carefully if you need LVM. Don’t use it if you are sure that your server will almost never need more disk space. If you run a growing storage server, consider running Solaris 10 with ZFS instead.
Lastly, I found out that all RHEL/CentOS sets up GRUB incorrectly on systems running software RAID. This was discovered during a DR test. Try this – remove the first mirrored disk from the server and attempt to boot the server. The secondary disk will fail to boot the O/S and you will see a GRUB message saying “GRUB Hard Disk Error”. The disk ain’t corrupted, it’s just that GRUB can’t find your disk.
Most people keep a rescue CD handy and reinstall GRUB during such an event, but here’s an immediate fix to reduce your potential downtime:
- Edit /boot/grub/device.map and make both disks appear as (hd0), i.e.
(hd0) /dev/sda (hd0) /dev/sdb
- (Re)install GRUB on your second disk, i.e.
# grub grub> device (hd0) /dev/sdb grub> root (hd0,0) grub> setup (hd0)
Source: http://grub.enbug.org/MirroringRAID
Justin Lee is a freelance Web 2.0 and Systems Consultant for Securlogic Singapore and currently works closely with core ISP engineering teams in Singapore during his day job.