So here I am once again sitting in a meeting room of an ISP with some 4 or 5 executives, salesguys , product guys and a bunch of other engineers discussing how we should set up HA. The entire project’s network was drawn up on a white board and three engineers stood by the sides. They were pointing at every link and node in the diagram and asking, “is this HA?”
You’d be surprised how long these discussions can take. It went on for about two hours and one dude finally broke the tension and stood up, and we thought he was headed for the washroom but he said instead, “what’s the point of all these? With all these costs, will the service even sell?”
He hit the nail. The room was quiet for a moment.
I’ve been providing freelance systems and network consultation for about 10 years now. Most of my consultation during the early parts of my career were rendered to SMEs. After the meeting ended, I sat down and tried to recall when was HA ever mentioned as a requirement when I was consulting SMEs. I concluded that I was either getting old and forgetful (not true!) or it was never mentioned.
On the other side of the world, ISPs are getting nailed in their butts by government regulations to maintain service uptime, and thus every project I’ve worked on with ISPs had HA as a default requirement.
You won’t believe how much effort people put into designing HA that they forget the basic requirement was to keep the service running, not the device. The final aim? To reduce risks that translate to business losses. Whilst having every extra piece of a device is a good-to-have, HA is really all about (truckloads of) money and balancing the returns.
But SMEs shouldn’t do without HA. Here’s some cheap (and probably free) HA solutions that SMEs can leverage:
- Redundant Array of Inexpensive Disks (RAID). Disks are mechanical and probably make up the highest percentage of componet failure in computers. Since disks are so unbelivably cheap and massive these days, there’s no reason why you shouldn’t RAID your disks. I would expect future desktops and laptops to run RAID as well.If a RAID controller adds significant cost, use software RAID.
- Linux Ethernet Bonding. Most servers come with two network interfaces these days. To reduce the possibility of a port failure (either on your switch or server), use Ethernet bonding in active-standby mode.
- Redundant Power Supplies. Power supplies are the second most commonly failed component as they are subjected to environmental factors such as power surges. If a sever with redundant supply is over budget, consider buying a spare power supply on standby off eBay.
- Virtual Router Redundancy Protocol (VRRP). This is a great technology to keep your network running. If your network setup permits, run the Vyatta open source router instead of the typical junk SOHO router. In fact, run two copies of them on VMware and enable VRRP. When one device goes off, VRRP automatically swings you over to the active device.
- Virtualize, virtualize, virtualize. There’s a ton of virtualization solutions out there. Virtualization can help you reduce your recovery time ten folds. You don’t need to keep the same hardware just to get a service up. In fact, you can even temporarily restore an important service into a Desktop PC!
Justin Lee is a freelance Web 2.0 and Systems Consultant for Securlogic Singapore and currently works closely with core ISP engineering teams in Singapore during his day job.