Understanding Database Maintenance

Applies to: Exchange Server 2010 SP3

Topic Last Modified: 2012-08-24

Architectural changes that were made to the database engine in Microsoft Exchange Server 2010 significantly improve its performance and robustness. However, these architectural changes also change the behavior of database maintenance tasks from earlier versions of Exchange Server. This topic describes the database maintenance tasks that must be routinely performed against Exchange Server 2010 databases.

All the tasks that are described in this topic are collectively known as background database maintenance.

Database Compaction

Database compaction must be completed to free up unused space in the database file. Database compaction does not return that unused space to the file system. Instead, it frees pages in the database by compacting records into the fewest possible number of pages, and reduces the I/O that is required to access the pages. To do this, the ESE database engine uses the database metadata, which is the information that describes tables in the database. For each table, the ESE database engine examines every page in the table, and tries to move the records to logically ordered pages.

Database compaction is important because it reduces the time associated with backing up the database file, and it helps maintain a predictable database file size. This is important to accurately determine server/storage size.

Database compaction was redesigned for Exchange Server 2010. Most significantly, the operation now gives preference to data contiguity over the amount of compaction. In earlier versions of Exchange Server, the focus was greater on space compaction. This resulted in pages that were always in random order after the process reordered records into free space across pages. In combination with the store schema architecture, this random reordering meant that any request to pull a set of data (such as downloading items inside a folder) always resulted in random I/O. In Exchange Server 2010, random I/O is reduced by keeping records in order on the pages.

Also in earlier versions of Exchange Server, database compaction operations were performed during the online maintenance window. In Exchange 2010, database compaction is now a background process that runs continuously.

Database Defragmentation

Database defragmentation (also known as OLD v2 and B+ tree defragmentation) is a new maintenance task in Exchange Server 2010. Database defragmentation is important to maintain efficient utilization of disk resources over time (i.e., make the I/O more sequential instead of random) and to maintain the compactness of tables that are marked as sequential.

Database defragmentation is a background process that analyzes the database continuously as operations are performed, and then triggers asynchronous work when it is necessary. The process monitors all tables for free pages. If a table reaches a threshold at which a significantly high percentage of the total B+ Tree page count is free, the free pages are returned to the root. The process also works to maintain contiguity throughout a table set with sequential space hints (a table that was created with a known sequential usage pattern). If database defragmentation sees a scan/pre-read on a sequential table, and if the records are not stored on sequential pages within the table, the process defragments that section of the table by moving all the affected pages to a new extent in the B+ tree. You can use performance counters to see the low level of active work performed by database defragmentation when a steady state is reached.

Database defragmentation has the following controls to regulate how it completes tasks:

The max number of outstanding tasks This keeps database defragmentation from doing too much work during the first pass if a very large change has occurred in the database.

A latency throttle of 100ms When the system is overloaded, database defragmentation will delay defragmentation work. Delayed work is completed the next time that the database follows that same operational pattern and the system has more resources.

Online database scanning (Database Checksumming)

Online database scanning (also known as database checksumming) is the process by which the database is read in large chunks, and every page is examined for physical page corruption. The main purpose of checksumming is to detect physical corruption and lost flushes that may not be detected by transactional operations (stale pages).

Note:
In Exchange Server 2007 and earlier versions, the checksumming operations occurred during the backup process. However, this caused a problem for replicated databases because only the copy that was being backed up was checksummed. If the passive copy was backed up, the active copy was not being checksummed. To resolve this issue, a new optional online maintenance task named Online Maintenance Checksum was added to Exchange Server 2007 Service Pack 1 (SP1). For more information, see How to Configure Online Maintenance Database Scanning in Exchange 2007 SP1 and SP2.

Note:

In Exchange Server 2007 and earlier versions, the checksumming operations occurred during the backup process. However, this caused a problem for replicated databases because only the copy that was being backed up was checksummed. If the passive copy was backed up, the active copy was not being checksummed. To resolve this issue, a new optional online maintenance task named Online Maintenance Checksum was added to Exchange Server 2007 Service Pack 1 (SP1). For more information, see How to Configure Online Maintenance Database Scanning in Exchange 2007 SP1 and SP2.

In Exchange 2010, online database scanning checksums the database and performs post Exchange 2010 Store crash operations. Space can leak because of crashes. Online database scanning finds and recovers lost space. The system in Exchange 2010 is designed with the expectation that every database is fully scanned one time every seven days. A warning event is fired if a database is not completely scanned in this timeframe. In Exchange 2010, there are now two modes to run online database scanning on active database copies:

Run as the last task in the scheduled Mailbox Database Maintenance process: You can configure how long it runs by changing the Mailbox Database Maintenance schedule. You can use this option for smaller databases that are less than 1 terabyte and that require less time to be completely scanned.
Run the default behavior in the background 24 hours a day, seven days a week: This option works well for all database sizes, but we recommend this for large database sizes (1-2 TB). Exchange scans the database no more than one time per day. This read I/O is 100 percent sequential (which makes it easy on the disk), and equates to a scanning rate of about 5 megabytes (MB)/sec on most systems.

Note:
The Shell, EMC, and JetStress refer to database checksumming as background database maintenance. To enable database checksumming in the EMC, select the Enable background database maintenance (24 X 7 ESE scanning) check box in Properties. To enable database checksumming in the shell, enter the following cmdlet: Set-MailboxDatabase -Identity MDB1 -BackgroundDatabaseMaintenance $true To enable database checksumming in Jetstress 2010, select the Run background database maintenance check box on the Select Test Type page.

Note:

The Shell, EMC, and JetStress refer to database checksumming as background database maintenance. To enable database checksumming in the EMC, select the Enable background database maintenance (24 X 7 ESE scanning) check box in Properties.
To enable database checksumming in the shell, enter the following cmdlet: Set-MailboxDatabase -Identity MDB1 -BackgroundDatabaseMaintenance $true
To enable database checksumming in Jetstress 2010, select the Run background database maintenance check box on the Select Test Type page.

Page Patching

Page patching replaces corrupted pages with healthy copies. Corrupted page detection is a function of database checksumming. In addition, corrupted pages are detected at run time when the page is stored in the database cache. Page patching works against highly available (HA) database copies. How a corrupted page is repaired depends on whether the HA database copy is active or passive.

Page patching process on active database copies

A corrupted page(s) is detected.
A marker is written into the active log file. This marker indicates the corrupted page number. It also indicates that the page requires replacement.
An entry is added to the page patch request list.
The active log file is closed.
The Replication service sends the log file to passive database copies.
The Replication service on a target Mailbox server receives the sent log file and inspects it.
The Information Store on the target server replays the log file, and replays up to marker, retrieves its healthy version of the page, invokes Replay Service callback, and then ships the page to the source Mailbox server.
The source Mailbox server receives the healthy version of the page, confirms that an entry exists in the page patch request list, and then writes the page to the log buffer. Correspondingly, the page is inserted into the database cache.
The corresponding entry in the page patch request list is removed.
At this point, the database is considered patched. (At some later point, the checkpoint will advance, the database cache will be flushed, and the corrupted page on disk will be overwritten.)
Any other copy of this page (received from another passive copy) will be silently dropped. This is because no corresponding entry exists in the page patch request list.

Page patching process on passive database copies

On the Mailbox server on which the corrupted pages are detected, log replay is paused for the affected database copy.
The replication service coordinates with the Mailbox server that is hosting the active database copy, and it retrieves the corrupted pages and the required log range from the active copy’s database header.
The Mailbox server updates the database header for the affected database copy, and it inserts the new required log range.
The Mailbox server notifies the Mailbox server that is hosting the active database copy about which log files it requires.
The Mailbox server receives the required log files, and it inspects them.
The Mailbox server injects the healthy versions of the database pages it retrieved from the active database copy. The pages are written to the log buffer. Correspondingly, the page is inserted into the database cache.
The Mailbox server resumes log replay.

Page Zeroing

Database Page Zeroing is a security measure by which deleted pages in the database are overwritten with a pattern (zeroed). This makes discovering the data much more difficult.

In Exchange Server 2007 and earlier versions, page zeroing operations occur during the streaming backup process. Because they occur during the streaming backup process, page zeroing does not cause the generation of log files. This raises an issue for replicated databases because the passive copies never have their pages zeroed. Also, the active copies have their pages zeroed if a streaming backup is finished. In Exchange Server 2007 SP1, we introduced a new optional online maintenance task to address this issue: Zero Database Pages during Checksum. When Zero Database Pages during Checksum is enabled, this task zeroes out pages during the online maintenance window, and then logs the changes. The changes are then replicated to the passive copies.

However, in the Exchange Server 2007 SP1 implementation, the zeroing process occurs during a scheduled maintenance window. This creates a delay between the time that a page is deleted and the time that it is zeroed. Therefore, the page zeroing task becomes a runtime event that operates continuously in Exchange Server 2010 SP1. Typically, the task now zeroes out pages at transaction time when a hard delete occurs.

In addition, database pages can be scrubbed during the online checksum process. The pages targeted in this case are as follows:

Deleted records that couldn’t be scrubbed during runtime because of dropped tasks (if the system is too overloaded) or because the store crashed before the tasks got to scrub the data.
Deleted tables and secondary indices. When these are deleted, we do not scrub their contents. Therefore, online checksum detects that these pages don’t belong to any valid object any longer and it scrubs the pages.

For more information about page zeroing in Exchange 2010, see Understanding Exchange 2010 Page Zeroing.

Use performance counters to track the background maintenance tasks

In Exchange Server 2010, events are not recorded for the defragmentation and compaction maintenance tasks. However, you can use performance counters to track the background maintenance tasks. The following table describes the performance counters to use under the MSExchange Database ==> Instances object.

Counter	Description
Database Maintenance Duration	The number of seconds that have passed since the maintenance started for this database. If the value is 0, maintenance has been finished for the day.
Database Maintenance Pages Bad Checksums	The number of non-correctable page checksums encountered during a database maintenance pass
Defragmentation Tasks	The count of background database defragmentation tasks that are currently running
Defragmentation Tasks Completed/sec	The rate at which background database defragmentation tasks are being finished

The following table describes the page zeroing counters to use under the MSExchange Database object:

Counter	Description
Database Maintenance Pages Zeroed	Indicates the number of pages zeroed by the database engine since the performance counter was invoked
Database Maintenance Pages Zeroed/sec	Indicates the rate at which pages are zeroed by the database engine

White space

In a database, you can have thousands of tables. You can have at least one table for every folder in every mailbox. The messages tables, folders tables, and attachments tables represent 90 percent of the space that is used in the database. These tables have the greatest percentage of free space (also known as white space) in the database.

To determine how much white space exists in a database, and reclaim the white space, follow these steps:

Unmount the database.
Complete a space dump by using the Exchange Server Database Utilities (Eseutil) tool together with the /MS switch. For more information about how to do this, see How to Run Eseutil /M in File Dump Mode.

At the end of the dump file is a line similar to the following:

-----------------------------------------------------------------------

253
This is a summation of the total number of pages that are available in all the tables. Multiply this value by 32K to determine the true amount of white space in the database.

Note:
For an example of the Eseutil dump file, see Determining the True Amount of Space in an Exchange Database.

After you determine how much white space is in a database, you may also want to reclaim the white space.

If you encounter a database that has a significant amount of white space, and you do not expect that the typical operations will reclaim the space, we recommend the following steps:

Create a new database and its associated database copies.
Move all mailboxes to the new database.
Delete the original database and its associated database copies.