Are you Prepared for Disaster? Business Continuity Planning for Email Outages

February 9th, 2018

Unexpected email outages happen to every email user. It is not a big deal if it is just for a few minutes or some scheduled time at night. However, if it is in the middle of a workday and employees rely on email, it may be a big problem.

planning for email outages

What do you do if your email stays offline for five minutes, ten minutes, or an hour, and you don’t know when it is coming back?

Planning for Email Outages

True, there is no guarantee that such an issue won’t happen, but it certainly can, and it happens all the time to companies large and small with in-house and outsourced services of all kinds.

A lot can be done to prevent predictable problems like hardware failure. These include redundancy, load balancing, etc. However, there is little that can be done to prevent human error. Yes, organizations can put policies in place, staff can be trained, etc. But it only takes one accidental misstep at the wrong time to initiate downtime immediately or create a time bomb that surfaces unexpectedly in the future.

Additionally, external factors could affect email services:

  • Network or ISP failure causing temporary connectivity problems
  • DNS or Registry failure causes the inbound email not to be delivered or able to reach the email provider’s servers
  • Denial of Service attacks on email providers, DNS, the ISP, networks, etc.
  • Malicious staff with administrative permissions shutting down accounts or deleting things
  • Issues with external email filtering services causing email delivery problems

So, the possibility of downtime exists no matter what. The questions are:

  1. How will the different kinds of downtime impact your business?
  2. How likely do you think that each kind of downtime will occur?
  3. What can you do now so that should any of these issues arise, you can still run your business until the issue is resolved?
  4. Can this be done cost-effectively?

Below we present a series of options for addressing many of these possible issues. There may be other solutions and other issues that could arise. We will try to address some of the most essential and cost-effective solutions.

1. External Email Archival

Email archival solutions save copies of all inbound and outbound emails to an externally hosted service (not in an office or on regular email servers) where:

  • The email cannot be deleted, edited, or lost
  • The email is kept for an extended period (e.g., ten years or “forever”)
  • Users can log in, search for, and download copies of email messages sent and received any time the need arises.

There are a few key points to be aware of:

  1. The archival system and access to it should not be at the office. The data center should be separated from the business email servers so that if either is down, the email archives can still be accessed.
  2. Email should not be deletable or editable so that reliable copies are available, e.g., for legal reasons.
  3. Users should be able to log in to view their email. If everyone can log in and view their archived email, they are self-sufficient, and work can get done.

LuxSci’s Premium Email Archival service meets these criteria and is provided through our partner Sonian.

In the case of a disaster, archival provides emergency access to all old email messages. HIPAA and other regulations require this kind of system.

2. Email Message Continuity Service

When using an advanced inbound email filtering service (as most businesses do these days), inbound emails pass through special email filtering servers before being forwarded to the servers where emails are viewed and stored.

With a Message Continuity service:

  1. These inbound email servers are located in a different data center from the email service provider
  2. They can auto-detect when providers are offline or down (i.e., they can’t deliver new email to it)
  3. Message Continuity is automatically enabled if the email provider is down (it can also be enabled manually on demand).
  4. While Message Continuity is enabled:
    1. All inbound messages are queued/saved on the filtering servers instead of delivered.
    2. Users can log in to a Message Continuity web portal to read these new messages and reply to them/send new email messages while their email provider is down.
  5. Once the issue is over, Message Continuity is disabled and:
    1. All the queued email messages are delivered to the email provider’s servers.
    2. Copies of sent email messages are also delivered there for records.

Message Continuity services provide emergency access to new email messages and enable sending of messages and replies in the case of a disaster.

LuxSci’s Premium Email Filtering service includes Message Continuity as an upgrade option and includes all these features.

3. Backup Email Account

A backup email account is an account either at a different email provider or on a different server from the current email provider. Depending on the situation, the options include:

  1. Duplicate Email Account: Have copies of all inbound email messages go to both accounts so users can switch to the backup account at any time. This works as long as the servers receiving the email and forwarding copies to the backup account are still online, and at least one of the two accounts is online. See Spilt Domain Routing for a detailed description. Many of our customers, including LuxSci support itself, use this as a simple, inexpensive backup mechanism to protect against single server failure.
  2. DNS MX Record Change: Another option is to have a “hot standby” account with another provider. In the case of an emergency, the DNS MX records can be changed to that provider, and new emails can be received there. The downsides of this are:
    1. Copies of old emails will not be there, and it requires manual action to make the switch.
    2. Messages ending up at the new provider may be hard to move back to the old provider after the situation is resolved. However, having this option available in conjunction with the “Duplicate Email Account” option covers various severity scenarios.

4. Reliable DNS

Every week, we see emails not working due to DNS issues (read more about DNS). These are caused by:

  • Slow DNS servers
  • Attacks on DNS servers taking them offline
  • Shoddy DNS service

Choosing a good DNS provider helps ensure that emails can be delivered even after denial of service attacks on DNS and that the service will work as advertised. We highly recommend our DNS service, which we provide through our partner EasyDNS. They use Anycast for denial of service protection. We also recommend setting up redundant DNS services to protect against DNS outages due to denial of server attacks or DNS service provider issues.

5. Data Backups

Ensure that the email provider makes backups of email data so that mail can be restored on accidental deletion and so servers can be entirely restored from backup in the case of a catastrophic server failure.

Additionally, backups should exist in two separate locations — on-site and off-site. On-site backups provide fast recovery for recent data. Off-site backups provide slower recovery for older data and protect in the case of a catastrophic issue with the main live infrastructure that affects both your mail servers and their on-site backups.

LuxSci provides daily on-site and weekly off-site backups for all accounts. On-demand restores of deleted email folders from backups are also free for all accounts. Using a dedicated server from LuxSci? Ask us about custom backup schedules, custom retention periods, and server imaging as further backup options.

6. Dedicated Servers with High Availability Configurations

Typically, inexpensive cloud servers at any provider are single (virtual) machines running on a single computer. If that hardware fails (i.e., there is a “short,” the network cable breaks, the power supply dies, etc.), then the server and email are immediately down until the problem can be diagnosed and repaired. This can take 30 minutes to hours and hours, and this kind of downtime is exempted from most service level agreements as an actual “problem.”

What can you do? Choose an email service with servers that have resiliency against hardware failure. There are many ways to set up high availability solutions. The core principle is that when one server fails, another email server is automatically restarted within seconds. After that, it’s business as usual while the service provider fixes the problem server.

These high availability solutions can be more expensive, but they are entirely worth it when email is a business-critical channel.

Many other considerations should be addressed when developing a complete disaster recovery plan, such as a communication strategy, what to do when the office is offline, and periodically testing and reviewing the plan. If you have not thought about disaster recovery, we recommend setting aside time to conduct a risk analysis and take steps to secure sensitive data.

Contact LuxSci for a Consultation