Topic Last Modified: 2011-03-25

The metropolitan site resiliency solution has been tested and is officially supported by Microsoft; however, before deploying this topology, you should consider the following findings and recommendations.

Findings

Cluster failover worked as expected. No manual steps were required, with the exception of Group Chat Server, Archiving Server, and Monitoring Server. Front End Servers were able to reconnect to the back-end database servers after the failover and resume normal service. Microsoft Lync 2010 clients reconnected automatically.
Cluster failback worked as expected. It is important to ensure that storage has resynchronized before failback begins.

Users will see a quick sign out/sign in sequence as they are transferred back to their usual Front End Server, when it becomes available again.
When failover occurred, the Group Chat Channel service Lookup service at the failover site had to be started manually. Additionally, the Group Chat Compliance Server setting had to be updated manually. For details, see Backing Up the Compliance Server in the Operations documentation.

Recommendations

Although testing used two nodes (one per site) in each SQL Server cluster, we recommend deploying additional nodes to achieve in-site redundancy for all components in the topology. For example, if the active SQL Server node becomes unavailable, a backup SQL Server node in the same site and part of the same cluster can assume the workload until the failed server is brought back online or replaced.
Although our testing used components provided by certain third-party vendors, the solution does not depend on or stipulate any particular vendors. As long as components are certified and supported by Microsoft, any qualifying vendor will do.
All individual components of the solution (for example, geographically dispersed cluster components) must be supported and, where appropriate, certified by Microsoft. This does not mean, however, that Microsoft will directly support individual third-party components. For component support, contact the appropriate third-party vendor.
Although a full-scale deployment was not tested, we expect published scale numbers for Lync Server 2010 to hold true. With that in mind, you should plan for enough capacity that sufficient capacity remains to continue operation in the event of failover. For details, see Capacity Planning in the Planning documentation.
The information in this section should be used only as guidance. Before deploying this solution in a production environment, you should build and test it using your own topology.

Note:
Microsoft does not support implementations of this solution where network and data-replication latency between the primary and secondary sites exceeds 20 ms, or when the bandwidth does not support the user model for your organization. When latency exceeds 20 ms, the end-user experience rapidly deteriorates. In addition, Archiving Server and Group Chat Compliance servers are likely to start falling behind, which may in turn cause Front End Servers and Group Chat lookup servers to shut down.

Note:

Microsoft does not support implementations of this solution where network and data-replication latency between the primary and secondary sites exceeds 20 ms, or when the bandwidth does not support the user model for your organization. When latency exceeds 20 ms, the end-user experience rapidly deteriorates. In addition, Archiving Server and Group Chat Compliance servers are likely to start falling behind, which may in turn cause Front End Servers and Group Chat lookup servers to shut down.