Datacenter and Infrastructure Failure

Datacenter and infrastructure failure, a broad category, includes a variety of causations for email

downtime. The most common type of infrastructure failure was power outages. Even with organizations

that employ backup generators, the power blip was enough to cause the Exchange server to go down,

thus interrupting email. Typically, however, the power outages, and other infrastructure failures such as

water main breaks, caused extensive email downtime – on average over 75 hours which is an amount of

time most organizations would

fi nd intolerable to be without email.

Exchange and Active Directory Failures

The risk of downtime associated with Microsoft Active Directory (AD) corruption was found readily

apparent in our research accounting for roughly 23% of all email outages. Several customers

experienced signi

fi cant system-wide downtime as the result of Active Directory-related corruption.

In each instance, Exchange-speci

fi c attributes or data was corrupted in a manner that disrupted

communications. In several of these cases, identi

fi cation, repair, and recovery resulted in outages

exceeding 48 hours. Software-related email failure represented 14% of customer outages and was most

commonly the result of con

fi guration errors or software corruption. Other common causes included a

mixture of technological failure and human error: faulty software patches, failed upgrade efforts, and

failure from out of date drives.

Hardware Failure

A wide array of hardware and server related failures contributed to outages of customers. From

catastrophic drive failure, to bad RAM (Random Access Memory), over a quarter of customer-related

outages could be traced to hardware failure. In many of these scenarios, customers had already

taken steps to mitigate hardware related issues by building highly redundant servers, including dual

backplanes, and redundant RAID (Redundant Array of Independent Discs) controllers.

Several items of note:

• Branch of

fi ce messaging servers were often not as fully redundant as their

datacenter counterparts.

• New hardware or recently upgraded hardware was more commonly the source of server

related outages.

• Server sizing issues contributed in several cases to performance degradation or outages.

Internet and Connectivity Failures

Internet and connectivity failures accounted for 11% of the email outages, which included LAN (Local

Area Network) or WAN (Wide Area Network) outages. These connectivity failures prevent users from

accessing an otherwise functioning server. Causes of connectivity loss include hub, switch, or router

failure as well as broken or damaged cable or

fi ber from a variety of causes (or accidents) such as

construction (backhoe) severing cables and damage during moves or maintenance (human error).

In one instance, construction down the street from a surveyed company resulted in the loss of both

primary and secondary WAN connections through two separate providers. Infrastructure failures can

also occur due to datacenter power outages and causes as mundane as termite infestation.

Storage/Database Corruption

Potential downtime due to database corruption is a well-known hazard for mail administrators. With the

typical customer having .75TB or more of messaging data, this downtime can be signi

fi cant. With most

of our customers having complex storage systems, operation and maintenance is a challenge. Storage

and database corruption includes outages caused by SAN device failure taking out local data stores

(technical error), as well as SAN con

fi guration errors (human error) causing data loss windows. When

storage systems fail, companies were faced with retention and policy compliance issues, and were

forced to undergo costly and time consuming recovery operations from backup tapes.