Understanding High Availability and Site Resilience

Applies to: Exchange Server 2010 SP3, Exchange Server 2010 SP2

Topic Last Modified: 2012-01-24

Mailbox databases and the data they contain are one of the most critical components (perhaps the most critical component) of any Exchange organization. In Microsoft Exchange Server 2010, you can protect mailbox databases and the data they contain by configuring your mailbox databases for high availability and site resilience. Exchange 2010 reduces the cost and complexity of deploying a highly available and site resilient messaging solution while providing higher levels of end-to-end availability and supporting large mailboxes. Building on the native replication capabilities introduced in Exchange Server 2007, the new high availability architecture in Exchange 2010 provides a simplified, unified framework for high availability and site resilience. Exchange 2010 integrates high availability into the core architecture of Exchange, enabling customers of all sizes and in all segments to economically deploy a messaging continuity service in their organization.

Looking for management tasks related to high availability and site resilience? Check out Managing High Availability and Site Resilience.

Contents

Key Terminology

Key Characteristics of the Exchange Server 2010 Solution

Database Mobility

Incremental Deployment

Database Availability Groups

Mailbox Database Copies

Active Manager

Changes to High Availability from Previous Versions of Exchange

High Availability for Non-Mailbox Server Roles

Site Resilience

End-to-End Availability

Key Terminology

The following terms apply:

Address Book service: A service on the Client Access server that provides a directory access endpoint for Microsoft Outlook clients.

Continuous replication - block mode: A new form of continuous replication in SP1 whereby as each update is written to the active database copy's active log buffer, it's also shipped to a log buffer on each of the passive mailbox copies. When the log buffer is full, each database copy builds, inspects and creates the next log file in the generation sequence.

Continuous replication - file mode: The name for the original form of continuous replication in the release to manufacturing (RTM) version of Exchange 2010, whereby closed transaction log files are pushed from the active database copy to one or more passive database copies.

Database availability group (DAG): A group of up to 16 Exchange 2010 Mailbox servers that hosts a set of replicated databases.

Database mobility: The ability of a single Exchange 2010 mailbox database to be replicated to and mounted on other Exchange 2010 Mailbox servers.

Datacenter: An Active Directory site.

Disaster recovery: Any process used to manually recover from a failure. This can be a failure that affects a single item, or it can be a failure that affects an entire physical location.

Exchange third-party replication API: An Exchange-provided API that enables use of third-party synchronous replication for a database availability group instead of continuous replication.

High availability: A solution that provides service availability, data availability, and automatic recovery from failures that affect the service or data (such as a network, storage, or server failure).

Incremental deployment: The ability to deploy high availability and site resilience after Exchange 2010 is installed.

Lagged mailbox database copy: A passive mailbox database copy that has a log replay lag time greater than zero.

Mailbox database copy: A mailbox database (.edb file and logs), which is either active or passive.

Mailbox resiliency: The name of a unified high availability and site resilience solution in Exchange 2010.

RPC Client Access service: A service on the Client Access server that provides a MAPI endpoint for Microsoft Outlook clients.

Site resilience: A manual disaster recovery process used to activate an alternate or standby datacenter when the primary datacenter is no longer able to provide a sufficient level of service to meet the needs of the organization. Also includes the process of re-activating a primary datacenter that has been recovered, restored or recreated. You can configure your messaging solution for high availability and enable site resilience using the built-in features and functionality in Exchange 2010.

Shadow redundancy: A transport server feature that provides redundancy for messages for the entire time they are in transit.

*over (pronounced "star over"): Short for switchovers and failovers. A switchover is a manual activation of one or more database copies. A failover is an automatic activation of one or more database copies after a failure.

Return to top

Key Characteristics of the Exchange Server 2010 Solution

Exchange 2007 decreased the costs of high availability and made site resilience more economical by introducing technologies such as cluster continuous replication (CCR) and standby continuous replication (SCR). However, some challenges remained:

Windows failover clustering could be confusing because of its complexity.
Achieving a high level of uptime could require a high level of administrator intervention.
Each type of continuous replication was managed differently and separately.
Recovering from a failure of a single database on a large Mailbox server could result in a temporary disruption of service to all users on the Mailbox server.
The transport dumpster feature of the Hub Transport server could only protect messages destined for mailboxes in a CCR environment. If a Hub Transport server fails while processing messages and can't be recovered, it could result in data loss.

Exchange 2010 includes significant core changes that integrate high availability in the architecture, making it less costly and easier to deploy and maintain than previous versions of Exchange. Exchange 2010 includes a new unified platform for both high availability and site resilience.

With the significant core improvements made to Exchange 2010, the recommended maximum mailbox database size when using continuous replication has increased from 200 gigabytes (GB) in Exchange 2007 to 2 terabytes in Exchange 2010. With more companies realizing the greater value in large mailboxes (from 2 GB through 10 GB), significantly larger database sizes can quickly become a reality. Supporting larger databases means moving away from legacy recovery mechanisms, such as backup and restore, and moving to newer, faster forms of protection, such as data replication and server redundancy. Ultimately, the size of your mailbox databases depends on many factors you derive during the Exchange 2010 planning process for. For detailed planning guidance for mailboxes and Mailbox servers, see Mailbox Server Storage Design.

Exchange 2010 combines the key availability and resilience features of CCR and SCR into a single solution that handles both on-site and off-site data replication. Mailbox servers can be defined as part of a DAG to provide automatic recovery at the mailbox database level instead of at the server level. Other new high availability concepts are introduced in Exchange 2010, such as database mobility and incremental deployment.

Return to top

Database Mobility

Exchange 2007 introduced many architectural changes designed to make deploying high availability and site resiliency solutions for Exchange faster and simpler. These improvements included an integrated Setup experience, optimized configuration settings, and the ability to manage most aspects of the high availability solution using native Exchange management tools.

However, management of an Exchange 2007 high availability solution required complex clustering concepts, such as the concept of moving network identities and managing cluster resources. In addition, when troubleshooting issues related to a clustered mailbox server, Exchange tools and cluster tools were used to review and correlate logs and events from two different sources: one from Exchange and one from the cluster.

Two other limiting aspects of the Exchange 2007 architecture have been evaluated and revised based on customer feedback:

Clustered Exchange 2007 servers require dedicated hardware. Only the Mailbox server role could be installed on a node in the cluster. This meant that a minimum of four Exchange servers were required to achieve full redundancy of the primary components of a deployment, for example, the core server roles (Mailbox, Hub Transport, and Client Access).
In Exchange 2007, failover of a clustered mailbox server occurs at the server level. As a result, if a single database failure occurred, the administrator had to fail over the entire clustered mailbox server to another node in the cluster (which resulted in brief downtime for all users on the server, and not just those users with a mailbox on the affected database), or leave the users on the failed database offline (potentially for hours) while restoring the database from backup.

Exchange 2010 has been engineered with the concept of database mobility. Database mobility expands the system's use of continuous replication by replicating a database to multiple, different servers that are grouped together. This model provides better protection of the database and increased availability. In this model, automatic failover protection and manual switchover control is provided at the mailbox database level instead of at the server level.

In the case of failures, other servers that have copies of the database can mount the database. As a result of this and other architectural changes, failover actions now complete much faster than in previous versions of Exchange. For example, failover of a clustered mailbox server in a CCR environment running Exchange 2007 with Service Pack 1 completes in about 2 minutes (assuming an intra-site failure where the IP address of the clustered mailbox server doesn't change). By comparison, failover of a mailbox database in an Exchange 2010 environment completes within 30 seconds (measured from the time when the failure is detected to when a database copy is mounted, assuming the copy is healthy and up to date with log replay). The combination of database-level failovers and significantly faster failover times improves an organization's overall uptime.

Return to top

Incremental Deployment

Exchange 2010 introduces the concept of incremental deployment, which enables you to deploy service and data availability for all Mailbox servers and databases after Exchange is installed. Service and data redundancy is achieved by using new features in Exchange 2010 such as DAGs and database copies.

In previous versions of Exchange, service availability for the Mailbox server roles was achieved by deploying Exchange in a Windows failover cluster. To deploy Exchange in a cluster, you had to first build a failover cluster, and then install the Exchange program files. This process created a special Mailbox server called a clustered mailbox server (or Exchange Virtual Server in older versions of Exchange). If you had already installed the Exchange program files on a non-clustered server and you decided you wanted a clustered mailbox server, you had to build a cluster using new hardware, or remove Exchange from the existing server, install failover clustering, and reinstall Exchange.

Return to top

Database Availability Groups

A DAG is the base component of the high availability and site resilience framework built into Exchange 2010. A DAG is a group of up to 16 Mailbox servers that hosts a set of databases and provides automatic database-level recovery from failures that affect individual databases. Any server in a DAG can host a copy of a mailbox database from any other server in the DAG. When a server is added to a DAG, it works with the other servers in the DAG to provide automatic recovery from failures that affect mailbox databases, such as a disk failure or server failure.

Exchange 2007 introduced a built-in data replication technology called continuous replication. Continuous replication, which was available in three forms: local, cluster, and standby, significantly reduced the cost of deploying a highly available Exchange infrastructure, and provided a much improved deployment and management experience over previous versions of Exchange. Even with these cost savings and improvements, however, running a highly available Exchange 2007 infrastructure still required much time and expertise because the integration between Exchange and Windows failover clustering wasn't seamless. In addition, customers wanted an easier way to replicate their e-mail data to a remote location, to protect their Exchange environment against site-level disasters.

Exchange 2010 uses the same continuous replication technology found in Exchange 2007. Exchange 2010 combines on-site data replication (CCR) and off-site data replication (SCR) into a single framework called a database availability group (DAG). After servers are added to a DAG, you can add replicated database copies incrementally (up to 16 total), and Exchange 2010 switches between these copies automatically, to maintain availability.

Unlike Exchange 2007, where clustered mailbox servers required dedicated hardware, Mailbox servers in a DAG can host other Exchange roles (Client Access, Hub Transport, and Unified Messaging), providing full redundancy of Exchange services and data with just two servers.

This new high availability architecture also provides simplified recovery from a variety of failures (disk-level, server-level, and datacenter-level), and the architecture can be deployed on a variety of storage types.

For more information about DAGs, see Understanding Database Availability Groups.

Return to top

Mailbox Database Copies

The high availability and site resilience features first introduced in Exchange 2007 are used in Exchange 2010 to create and maintain database copies, so that you can achieve your availability goals in Exchange 2010. Exchange 2010 also introduces the concept of database mobility, which is Exchange-managed database-level failovers.

Database mobility disconnects databases from servers and adds support for up to 16 copies of a single database, and it provides a native experience for adding database copies to a database. In Exchange 2007, a feature called database portability also enabled you to move a mailbox database between servers. A significant distinction between database portability and database mobility, however, is that all copies of a database have the same GUID.

Setting a database copy as the active mailbox database is known as a switchover. When a failure affecting a database occurs and a new database becomes the active copy, this process is known as a failover. This process also refers to a server failure in which one or more servers bring online the databases previously online on the failed server. When either a switchover or failover occurs, other Exchange 2010 server roles become aware of the switchover almost immediately and redirect client and messaging traffic to the new active database.

For example, if an active database in a DAG fails because of an underlying storage failure, Active Manager will automatically recover by failing over to a database copy on another Mailbox server in the DAG. If the database is outside the automatic mount criteria and can't be automatically mounted, you can manually perform a database failover.

For more information about mailbox database copies, see Understanding Mailbox Database Copies.

Return to top

Active Manager

In Exchange 2007 and previous versions, Exchange used the cluster resource management model to install, implement, and manage the Mailbox server high availability solution. Historically, building a highly available Mailbox server involved first building a Windows failover cluster, and then running Exchange Setup in clustered mode. In this mode, the Exchange cluster resource DLL file, exres.dll, would be registered and allow the creation of a clustered mailbox server (called an Exchange Virtual Server in legacy versions). When deploying legacy shared storage clusters or single copy clusters, additional steps for configuring storage were needed before and after failover cluster formation, and after clustered mailbox server and storage group formation.

Exchange 2010 includes a new component called Active Manager that provides functionality that replaces the resource model and failover management features provided by integration with the Cluster service in previous versions of Exchange. For more information about Active Manager, see Understanding Active Manager.

Return to top

Changes to High Availability from Previous Versions of Exchange

There are several changes to the core architecture of Exchange 2010 that have a direct effect on how you configure Exchange for high availability, as well as a direct effect on how you perform site recovery. One significant change is the removal of clustered mailbox servers and the use of the Windows Failover Cluster resource model. Other significant changes include the globalization of databases and enhancements to the built-in continuous replication technology first introduced in Exchange 2007.

Removal of Clustered Mailbox Servers

In Exchange 2010, Exchange is no longer a clustered application, and the cluster resource model is no longer used for Exchange high availability. Exres.dll and all Exchange cluster resources it provided also no longer exist, including clustered mailbox servers. Instead, Exchange 2010 uses its own internal high availability model. Some components of Windows failover clustering are still used, but they are now integrated into other functionality by Exchange 2010.

Globalization of Databases

In Exchange 2010, a database is associated with a single, dedicated log stream, represented by a series of sequentially-named, 1-megabyte (MB) log files. The concept of storage groups has also been removed from Exchange 2010. As a result of these changes, Exchange databases have a dedicated log stream, and no longer share log streams with other databases.

Unlike in previous versions of Exchange, databases are no longer closely tied to a specific Mailbox server. In addition, databases are no longer identified by the Mailbox servers on which they reside, and server names are no longer part of database identities. As a result of these changes, databases are now global objects in Active Directory and in each Exchange organization. When using the Exchange Management Console, databases are now managed from the Mailbox node under the Organization Configuration node.

Each Mailbox server can host a maximum of 100 databases (total combined number of active and passive databases). The total number of databases equals the combined number of active and passive databases on a server. The recovery database doesn't count against the 100 database limit.

Changes to Continuous Replication in Exchange 2010 RTM

The continuous replication technology introduced in Exchange 2007 is also available in Exchange 2010. However, the feature has evolved considerably to support new high availability features and greater scalability. Some of these architectural changes include:

Because storage groups are removed in Exchange 2010, continuous replication now operates at the database level. Exchange 2010 still uses an Extensible Storage Engine (ESE) database that produces transaction logs replicated to one or more other locations and replayed into one or more mailbox database copies. Each mailbox database can have as many as 16 copies.
Log shipping no longer uses Server Message Block (SMB) and Windows file system notifications. Log shipping no longer uses a pull model, where the passive copy pulls a closed log file from the active copy. Instead, the passive copy uses TCP-based notifications to notify the active copy about which log files are required by the passive copy. The active copy then pushes the log files to each configured passive copy through the TCP socket.
Exchange 2010 continuous replication uses one administrator-defined TCP port for data transfer. In addition, Exchange 2010 includes built-in options for network encryption and compression for the data stream.
Seeding is no longer restricted to using only the active copy of the database. Passive copies of mailbox databases can now be specified as sources for database copy seeding and reseeding.
Database copies are for mailbox databases only. For redundancy and high availability of public folder databases, we recommend that you use public folder replication. Unlike CCR, where multiple copies of a public folder database couldn't exist in the same cluster, you can use public folder replication to replicate public folder databases between servers in a DAG.
In Exchange 2007, the Microsoft Exchange Replication service was responsible for replaying logs into passive database copies. When the passive copy was activated, the database cache that had been built by the Microsoft Exchange Replication service as a result of replay activity would be lost when the Microsoft Exchange Information Store service would mount the database. This put the database cache in a state known as a cold state. The database cache, which is used to cache read/write operations, is small in size (cold) during this period. Therefore, it has a significantly diminished ability to reduce read I/O operations. In Exchange 2010, the passive copy replay functionality previously performed by the Microsoft Exchange Replication service has been moved into the Microsoft Exchange Information Store service. As a result, a warm database cache is present and immediately available for use after a failover or switchover occurs.

Several concepts used in Exchange 2007 continuous replication also remain in Exchange 2010. These include the concepts of failover management, divergence, the use of the automatic database mount dial, and the use of replication and client access (MAPI) networks.

Changes to Continuous Replication in Exchange 2010 SP1

In the RTM version of Exchange 2010 and in all versions of Exchange Server 2007, continuous replication operates by shipping copies of the log files generated by the active database copy to the passive database copies. Beginning with Exchange 2010 SP1, this form of continuous replication is known as continuous replication - file mode. SP1 also introduces a new form of continuous replication known as continuous replication - block mode. In block mode, as each update is written to the active database copy's active log buffer, it's also shipped to a log buffer on each of the passive mailbox copies. When the log buffer is full, each database copy builds, inspects and creates the next log file in the generation sequence. In the event of a failure affecting the active copy, the passive copies will have been updated with most or all of the latest updates. The active copy doesn't wait for replication to complete in order to preclude replication issues from affecting the client experience.

Block mode dramatically reduces the latency between the time a change is made on the active copy and when the change is replicated to passive copies. In addition to replicating individual log file writes, block mode also changes the activation process for a passive copy. If a copy is in block mode when a failure occurs, the system uses whatever partial log content is available during the activation process. This eliminates the current log file on the active copy from being a single point of failure.

The initial mode of operation is always file mode. Block mode is only active when continuous replication is up-to-date in file mode. The transition into and out of block mode is performed automatically by the log copier. When the passive copy requests the current log file, it indicates that continuous replication is up-to-date (the copy queue length is 0), and the system should automatically switch from file mode to block mode.

You can determine if a passive database copy is in block mode by monitoring the Continuous replication – block mode Active performance counter under the MSExchange Replication performance object. Each database copy has its own instance of this counter. The value of the counter is set to 1 when the passive copy is in block mode and 0 when the passive copy is in file mode. You can also determine the value of this counter by using the Get-Counter or Get-WMIObject cmdlets, as shown in these examples:

	Copy Code
Get-Counter -ComputerName <DAGMemberName> -Counter "\MSExchange Replication(*)\Continuous replication - block mode Active" Get-WMIObject -ComputerName <DAGMemberName> Win32_PerfRawData_MSExchangeReplication_MSExchangeReplication \| Where-Object {$_.ContinuousReplicationBlockModeActive -eq "1"} \| Where-Object {$_.name -ne "_total"} \| format-table Name,ContinuousReplicationBlockModeActive

Copy Code

Get-Counter -ComputerName <DAGMemberName> -Counter "\MSExchange Replication(*)\Continuous replication - block mode Active"
Get-WMIObject -ComputerName <DAGMemberName> Win32_PerfRawData_MSExchangeReplication_MSExchangeReplication | Where-Object {$_.ContinuousReplicationBlockModeActive -eq "1"} | Where-Object {$_.name -ne "_total"} | format-table Name,ContinuousReplicationBlockModeActive

Changes to Transport Dumpster from Exchange 2007

The Exchange 2010 Hub Transport server role includes a feature called the transport dumpster, which was first introduced in Exchange 2007. The transport dumpster is designed to help protect against data loss by maintaining a queue of all recent e-mail messages sent to users whose mailboxes were protected by CCR or LCR. When a lossy failure occurred in either of these environments, the bulk of the data that would have ordinarily been lost as a result of the failure is automatically recovered by the transport dumpster.

The transport dumpster is used for replicated mailbox databases only. It doesn't protect messages sent to public folders, nor does it protect messages sent to recipients on mailbox databases that aren't replicated. The transport dumpster queue for a specific mailbox database is located on all Hub Transport servers in the Active Directory sites containing the DAG.

In Exchange 2007, messages were retained in the transport dumpster until the administrator-defined time limit or size limit is reached. In Exchange 2010, the transport dumpster now receives feedback from the replication pipeline to determine which messages have been delivered and replicated. As a message goes through Hub Transport servers on its way to a replicated mailbox database in a DAG, a copy is kept in the transport queue (mail.que) until the transaction logs representing the message have been successfully replicated to and inspected by all copies of the mailbox database. After the logs have been replicated to and inspected by all database copies, the messages in those logs are truncated from the transport dumpster. This keeps the transport dumpster queue smaller by maintaining only copies of messages whose transactions logs haven't yet been replicated.

Each DAG's Active Manager tracks the value for the last log inspected time on each passive database copy. The Active Manager client running on the Hub Transport server obtains this information from the DAG's Standby Active Manager (SAM) and converts that information into a time-based watermark. The Hub Transport server then compares the delivery time of messages in the transport dumpster with the watermark. If the delivery time of a message is older than the watermark, then the message is truncated from the transport dumpster.

The transport dumpster has also been enhanced to account for the changes to the Mailbox server role that enable a single mailbox database to move between Active Directory sites. DAGs can be extended to multiple Active Directory sites, and as a result, a single mailbox database in one Active Directory site can fail over to another Active Directory site. When this occurs, any transport dumpster redelivery requests will be sent to both Active Directory sites: the original site and the new site.

Changes to Routing Behavior When Hub Transport and Mailbox are Co-Located in a DAG

When the Hub Transport server is co-located with a Mailbox server that's a member of a DAG, there are changes in routing behavior to ensure that the resiliency features in both server roles will provide the necessary protection for messages sent to and received by users on that server. The Hub Transport server role was modified so that it now attempts to reroute a message for a local Mailbox server to another Hub Transport server in the same site if the Hub Transport server is also a DAG member and it has a copy of the mailbox database mounted locally. This extra hop was added to put the message in the transport dumpster on a different Hub Transport server.

For example, EX1 hosts the Hub Transport server role and Mailbox server role and is a member of a DAG. When a message arrives in transport for EX1 destined for a recipient whose mailbox is also on EX1, transport will reroute the message to another Hub Transport server in the site (for example, EX2), and that server will deliver the message to the mailbox on EX1.

There's a second, similar behavior change related to the Microsoft Exchange Mail Submission service. This service was modified so that it would not submit messages to a local Hub Transport server role when the Mailbox server or Hub Transport server is a member of a DAG. In this scenario, the behavior of transport is to load balance submission requests across other Hub Transport servers in the same Active Directory site, and fall back to a local Hub Transport server if there are no other available Hub Transport servers in the same site.

Return to top

High Availability for Non-Mailbox Server Roles

High availability for the Hub Transport, Edge Transport, Client Access, and Unified Messaging server roles is achieved through a combination of server redundancy, load balancing, and Domain Name System (DNS) round robin, as well as proactive server, service, and infrastructure management. In general, you can achieve high availability for the Client Access, Hub Transport, Edge Transport, and Unified Messaging server roles by using the following strategies and technologies:

Edge Transport You can deploy multiple Edge Transport servers and use multiple DNS MX resource records to load balance activity across those servers. You can also use Network Load Balancing (NLB) to provide load balancing and high availability for Edge Transport servers.
Client Access You can use NLB or a third-party hardware-based network load balancing device for Client Access server high availability.
Hub Transport You can deploy multiple Hub Transport servers for internal transport high availability. Resiliency has been designed into the Hub Transport server role in the following ways:
- Hub Transport server to Hub Transport server (intra-organization) Hub Transport server to Hub Transport server communication inside an organization automatically load balances between available Hub Transport servers in the target Active Directory site.
- Mailbox server to Hub Transport server (intra-Active Directory site) The Microsoft Exchange Mail Submission service on Mailbox servers automatically load balances between all available Hub Transport servers in the same Active Directory site.
- Unified Messaging server to Hub Transport server The Unified Messaging server automatically load balances connections between all available Hub Transport servers in the same Active Directory site.
- Edge Transport server to Hub Transport server The Edge Transport server automatically load balances inbound SMTP traffic to all Hub Transport servers in the Active Directory site to which the Edge Transport server is subscribed.
For additional redundancy (for example, applications that require an SMTP relay), you can create a DNS record (for example, relay.company.com), assign an IP address, and use a hardware load balancer to redirect that IP address to multiple Hub Transport servers. You can also use NLB for the client connectors on Hub Transport servers. When using a hardware load balancer, you need to confirm that no intra-organization traffic will be crossing the hardware load balancer because intra-organization traffic uses built-in load balancing algorithms (as previously described).
Unified Messaging Unified Messaging deployments can be made more resilient by deploying multiple Unified Messaging servers where two or more are in a single dial plan. The Voice over IP (VoIP) gateways supported by Unified Messaging can be configured to route calls to Unified Messaging servers in a round-robin fashion. In addition, these gateways can retrieve the list of servers for a dial plan from DNS. In either case, the VoIP gateways will present a call to a Unified Messaging server and if the call isn't accepted, the call will be presented to another server, providing redundancy at the time the call is established.

Return to top

Site Resilience

Exchange 2010 includes a unified platform for both high availability and site resilience. By combining the native site resilience support in Exchange 2010 with proper planning, a second datacenter can be rapidly activated to serve a failed datacenter's clients. A datacenter or site failure is managed differently from the types of failures that can cause a server or database failover. In a high availability configuration, automatic recovery is initiated by the system, and the failure typically leaves the messaging system in a fully functional state. By contrast, a datacenter failure is considered to be a disaster recovery event. Recovery must be manually performed and completed for the client service to be restored and for the outage to end. The process you perform is referred to as a datacenter switchover. As with many disaster recovery scenarios, prior planning and preparation for a datacenter switchover can simplify the recovery process and reduce the duration of the outage.

For details about planning and deploying site resilience, see Planning for High Availability and Site Resilience, Deploying High Availability and Site Resilience and Datacenter Switchovers.

Return to top

End-to-End Availability

Exchange 2010 also includes many features designed to increase end-to-end availability of the system. These features include:

Shadow redundancy
Online move mailbox
Flexible mailbox protection
Incremental resync
Third-party replication API

Shadow Redundancy

In addition to the transport dumpster and routing behavior enhancements described previously, a new Hub Transport server feature named shadow redundancy has been added. Shadow redundancy provides redundancy for messages for the entire time they are in transit. The solution involves a technique similar to the transport dumpster. With shadow redundancy, the deletion of a message from the transport database is delayed until the transport server verifies that all of the next hops for that message have completed delivery. If any of the next hops fail before reporting successful delivery, the message is resubmitted for delivery to that next hop. For more information about shadow redundancy, see Understanding Shadow Redundancy.

Online Move Mailbox

Exchange 2010 includes a new feature that enables you to move mailboxes asynchronously. In Exchange 2007, when you used the Move-Mailbox cmdlet to move a mailbox, the cmdlet logged on to both the source database and the target database and moved the content from one mailbox to the other mailbox. There were several disadvantages to having the cmdlets perform the move operation:

Mailbox moves typically took hours to complete, and during the move, users weren't able to access their mailbox.
If the Command Prompt window used to run the Move-Mailbox cmdlet was closed, the move was terminated and had to be restarted.
The computer used to perform the move participated in the data transfer. If an administrator ran the cmdlets from his or her workstation, the mailbox data would flow from the source server to the administrator's workstation and then to the target server.

The New-MoveRequest cmdlet in Exchange 2010 can be used to perform asynchronous moves. Unlike in Exchange 2007, the cmdlets don't perform the actual move. The move is performed by the Microsoft Exchange Mailbox Replication service, a new service that runs on a Client Access server. The New-MoveRequest cmdlet sends requests to the Microsoft Exchange Mailbox Replication service. For more information about online mailbox moves, see Understanding Move Requests.

Flexible Mailbox Protection

There are several changes to the core architecture of Exchange 2010 that have a direct effect on how you will protect your mailbox databases and the mailboxes they contain.

One significant change is the removal of storage groups. In Exchange 2010, each database is associated with a single log stream, represented by a series of 1 megabyte (MB) log files. Each server can host a maximum of 100 databases.

Another significant change for Exchange 2010 is that databases are no longer closely tied to a specific Mailbox server. Database mobility expands the system's use of continuous replication by replicating a database to multiple, different servers. This provides better protection of the database and increased availability. In the case of failures, the other servers that have copies of the database can mount the database.

The ability to have multiple copies of a database hosted on multiple servers means that if you have a sufficient number of database copies, you can use these copies as your backups. For more information about this strategy, see Understanding Backup, Restore and Disaster Recovery.

Incremental Resync

Exchange 2007 introduced the concepts of lost log resilience and incremental reseed. Lost log resilience is an internal component of ESE that enables you to recover Exchange mailbox databases even if one or more of the most recently generated transaction log files have been lost or damaged. Lost log resilience enables a mailbox database to mount even when recently generated log files are unavailable. Lost log resilience works by delaying writes to the database until the specified number of log generations have been created. Lost log resilience delays recent updates to the database file for a short time. The length of time that writes are delayed depends on how quickly logs are being generated.

Exchange 2007 also introduced the concept of incremental reseed, which provided the ability to correct divergences in the transaction log stream between a source and target storage group, by relying on the delayed replay capabilities of lost log resilience. Incremental reseed didn't provide a means to correct divergences in the passive copy of a database, after divergent logs had been replayed, which forced the need for a complete reseed. Unlike Exchange 2007, there is no amount of log loss that requires a full reseed in Exchange 2010.

In Exchange 2010, incremental resync is the new name for the feature that automatically corrects divergences in database copies under the following conditions:

After an automatic failover for all of the configured copies of a database
When a new copy is enabled and some database and log files already exist at the copy location
When replication is resumed following a suspension or restarting of the Microsoft Exchange Replication service

As a result of these changes, lost log resilience is now hard-coded to one log file for all Exchange 2010 mailbox databases.

When divergence between an active database and a copy of that database is detected, incremental resync performs the following tasks:

Searches historically in the log file stream to locate the point of divergence.
Locates the changed database pages on the diverged copy.
Reads the changed pages from the active copy, and then copies the necessary log files from the active copy.
Applies the database page changes to the diverged copy.
Runs recovery on the diverged copy and replays the necessary log files into the database copy.

Third-Party Replication API

Exchange 2010 also includes a new third-party replication API that enables organizations to use third-party synchronous replication solutions instead of the built-in continuous replication feature. Microsoft supports third-party solutions that use this API, provided that the solution provides the necessary functionality to replace all native continuous replication functionality that is disabled as a result of using the API. Solutions are supported only when the API is used within a DAG to manage and activate mailbox database copies. Use of the API outside of these boundaries is not supported. In addition, the solution must meet the applicable Windows hardware support requirements (test validation is not required for support).

When deploying a solution that uses the built in third-party replication API, be aware that the solution vendor is responsible for primary support of the solution. Microsoft supports Exchange data for both replicated and non-replicated solutions. Solutions that use data replication must adhere to Microsoft's support policy for data replication, as described in Microsoft Knowledge Base article 895847, Multi-site data replication support for Exchange Server. In addition, solutions that utilize the Windows Failover Cluster resource model must meet Windows cluster supportability requirements as described in Microsoft Knowledge Base article 943984, The Microsoft Support Policy for Windows Server 2008 Failover Clusters.

Microsoft's backup and restore support policy for deployments that use third-party replication API-based solutions is the same as for native continuous replication deployments.

If you are a partner seeking information about the third-party API, contact your Microsoft representative. For information about partner products for Exchange 2010, see Microsoft Exchange Partners.

Return to top