Why LuxSci Enterprise Class Servers Stay Up when Hardware Fails

Published: January 13th, 2014

LuxSci offers two types of dedicated and shared server options — Business Class and Enterprise Class.  The most notable difference between these options is reliability.  Enterprise Class services (both dedicated and shared) will keep running even if the underlying server hardware fails.  How does that work?

Your Typical Server

Your typical server these days is a virtual private server (i.e. Cloud Server).  With virtual servers, many servers share the same powerful physical machine … each using a “slice” of its CPU, RAM, and other resources.  This is very efficient from a cost and performance perspective.

Your typical server from a Public Cloud provider, a Virtual Private Server provider, etc., is simply that … a slice of the RAM, CPU, and disk space on a single physical machine.

So, what happens when a CPU fails, or a RAM chip goes bad, or the motherboard has a short, etc.?  All of the virtual servers that share that physical machine immediately crash and remain offline until the server provider can diagnose and repair the hardware issue and then reboot them all.

So — with any of these typical servers, which includes LuxSci “Business Class” dedicated and shared services, there could be downtime for emergency maintenance in the event of a hardware failure.  This is familiar to most people as the same thing would happen if your servers were not “virtual” and you were using the entire physical server for yourself. If it has certain hardware issues, then it too will be down until that is resolved.

So a typical physical or virtual server is susceptible to unexpected downtime due to hardware failure.  While this kind of downtime may be infrequent, it does happen and any services on this server that you rely on will be unavailable until the hardware issue is resolved.

LuxSci Enterprise Class Server

LuxSci understands that some customers place a high value on service reliability and cannot tolerate service downtime.  Other customers do not like the chance of downtime, but are willing to accept it instead of paying a premium to protect against it.

In order to ensure that LuxSci’s Enterprise Class shared services and Enterprise Class dedicated servers are as reliable as possible, we use a different architecture to protect them from downtime due to hardware failure.

  1. Multiple powerful underlying physical servers
  2. All of the disk space used is stored in a separate directly attached fast disk array.
  3. The disk array is attached with redundant fiber optic connections to each physical server.
  4. Special software running on this “cluster” of servers can detect a hardware failure on one physical server and restart the affected virtual servers on the other ones automatically and immediately.

What does this mean?

In the case where we need to perform proactive software or hardware maintenance on a physical machine, we can move the virtual servers running on it to other physical machines with no down time and no impact on the customers.  Its the “press of a button”.

In the case where one of the physical servers has failed, all of the affected virtual servers are immediately restarted on a different physical server.  This results in only a minute or so of downtime as the server is rebooted.  The problem physical server itself can then be repaired without the time involved impacting any actual customers.

With this premium behavior, the worst case scenario that can be caused by underlying hardware failure is a minute or so of downtime as your services are automatically restarted on a different machine.  Most people will not even notice.

A Classic Case of “You Get What You Pay For”

In order to protect against hardware failure, the Enterprise Class dedicated and shared servers are provisioned on a more complex and expensive hardware infrastructure that includes:

  1. Clustered underlying physical servers with extra unused capacity so that they can run the virtual servers in the event that one of the physical servers is down or off
  2. Special external disk arrays that support very fast access from all of the physical servers in the cluster (as opposed to using inexpensive local storage)
  3. Very fast and reliable enterprise grade hard drives
  4. Several extra “hot spare” hard drives
  5. Use of small RAID arrays to maximize the speed of RAID rebuilds and to minimize the chance to multiple hard drive failure…. though at the cost of more drives needed.
  6. Redundant network and storage interfaces to prevent issues due to hardware failure at that level

As a result, Enterprise Class servers and disk space costs more than our Business Class servers.  Customers can decide for themselves how important ultra-reliability is and if they are willing to pay extra for it … or if cost is more important and they are happy with all of LuxSci services on a less expensive, less reliable server.

 

Leave a Comment


You must be connected or logged in to post a comment. This is to reduce spam comments.

If you have not previously commented, you can connect using existing social media account, or register with a new username and password.