Applies to: Exchange Server 2010 SP3, Exchange Server 2010 SP2

Topic Last Modified: 2012-11-21

You can use regular expressions in Microsoft Exchange Server 2010 transport rule predicates to match text patterns in different parts of a message (such as message headers, sender, recipients, message subject, and body). Predicates are used by conditions and exceptions to determine whether a configured action should be applied to an e-mail message.

Looking for management tasks related to transport rules? See Managing Transport Rules.

Contents

Simple Expressions vs. Regular Expressions

Regular Expressions in Exchange 2010

Creating a Transport Rule That Uses a Regular Expression

Simple Expressions vs. Regular Expressions

To understand regular expressions, you must first understand simple expressions. A simple expression is a specific value that you want to match exactly in a message. Predicates using simple expressions match specific words or strings. An example of a simple expression is the title of a document that your organization doesn't want to be distributed outside the organization, such as Yearly Sales Forecast.doc. A piece of data in an e-mail message must exactly match a simple expression to satisfy a condition or exception in transport rules.

A regular expression is a concise and flexible notation for finding patterns of text in a message. The notation consists of two basic character types:

  • Literal characters   Text that must exist in the target string. These are normal characters, as typed.

  • Metacharacters   One or more special characters that aren't interpreted literally. These indicate how the text can vary in the target string.

You can use regular expressions to quickly parse e-mail messages to find specific text patterns. This enables you to detect messages with specific types of content, such as social security numbers (SSNs), patent numbers, and phone numbers.

You can't reasonably match this data with a simple expression because a simple expression requires that you enter every possible variation of the value that you want to detect. In many cases, using simple expressions for such applications becomes a logistical challenge, and matching a large number of simple expressions in message content can be resource-intensive. Using regular expressions is generally more efficient. Instead of specifying all possible variations, you can configure the transport rule predicate to search for a text pattern.

Regular Expressions in Exchange 2010

In the Exchange Management Shell, you can use regular expressions in any predicate that accepts the Patterns predicate property. In the Exchange Management Console, you can use regular expressions with any condition or exception that contains the words with text patterns. For more information about predicates, see Transport Rule Predicates.

Caution:
You must carefully test the regular expressions that you construct to make sure that they yield the expected results. An incorrectly configured regular expression could yield unexpected matches and cause unwanted transport rule behavior. This may result in undesirable actions being taken on messages and message content, potentially resulting in data loss when actions such as rejecting or bouncing a message are used. Also, complex regular expressions may affect mail transport performance. Test your regular expressions in a test environment before you implement them in production.

The following table lists the pattern strings that you can use to create a pattern-matching regular expression in Exchange 2010.

Pattern strings

Pattern string Description

\S

The \S pattern string matches any single character that's not a space.

\s

The \s pattern string matches any single white-space character.

\D

The \D pattern string matches any non-numeric digit.

\d

The \d pattern string matches any single numeric digit.

\w

The \w pattern string matches any single Unicode character categorized as a letter or decimal digit.

\W

The \W pattern string matches any single Unicode character not categorized as a letter or a decimal digit.

|

The pipe ( | ) character performs an OR function.

*

The asterisk ( * ) character matches zero or more instances of the previous character. For example, ab*c matches the following strings: ac, abc, abbbbc.

( )

Parentheses act as grouping delimiters. For example, a(bc)* matches the following strings: a, abc, abcbc, abcbcbc, and so on.

\

A backslash is used as an escaping character before a special character. Special characters are characters used in pattern strings:

  • Backslash ( \ )

  • Pipe ( | )

  • Asterisk ( * )

  • Opening parenthesis ( ( )

  • Closing parenthesis ( ) )

  • Caret ( ^ )

  • Dollar sign ( $ )

For example, if you want to match a string that contains (525), you would type \(525\).

^

The caret ( ^ ) character indicates that the pattern string that follows the caret must exist at the start of the text string being matched.

For example, ^fred@contoso matches fred@contoso.com and fred@contoso.co.uk but not alfred@contoso.com.

$

The dollar sign ( $ ) character indicates that the preceding pattern string must exist at the end of the text string being matched.

For example, contoso.com$ matches adam@contoso.com and kim@research.contoso.com, but doesn't match kim@contoso.com.au.

Constructing Regular Expressions

By using the preceding table, you can construct a regular expression that matches the pattern of the data that you want to match. Working from left to right, examine each character or group of characters in the data that you want to match. Read the description of each pattern string to determine how it's applied to the data that you're matching. Then, determine which pattern string in the table represents that character or group of characters, and add that pattern string to the regular expression. When finished, you have a fully constructed regular expression.

This example of a regular expression matches North American telephone numbers in the formats 425 555-0100 and 425.555.0100.

Copy Code
425(\s|.)\d\d\d(-|.)\d\d\d\d

You can expand on this example by adding the telephone format (425) 555-0100, which uses parentheses around the area code. This example of a regular expression matches all three telephone number formats.

\d\d\d((\s|.|-|\)|\)\s)\d\d\d(\s|.|-)\d\d\d\d

You can analyze the previous example as follows:

  • \d\d\d   This portion requires that exactly three numeric digits appear first.

  • ((\s|.|-|\)|\)\s)   This portion requires that a space, a period, or a hyphen exists after the three-digit number. Each character-matching string is contained in the grouping delimiters and is separated by the pipe character. This means that only one of the specified characters inside the grouping delimiters can exist in this location in the string being matched. For the separation between area code and the next three digits, it also looks for a close parenthesis, or close parenthesis and space.

  • \d\d\d   This portion requires that exactly three numeric digits appear next.

  • (\s|.|-)   This portion requires that a space, a period, or a hyphen exists after the three-digit number.

  • \d\d\d\d   This portion requires that exactly four numeric digits appear next.

The above regular expression will match the following sample values:

  • (425)555.0100

  • 425 555 0100

  • 425. 555-0100

  • (425) 555-0100

  • 425-555-0100

  • (425) 555-0100

Creating a Transport Rule That Uses a Regular Expression

This example creates a transport rule in the Shell that uses regular expressions to match SSNs in the subject of an e-mail message.

Copy Code
New-TransportRule -Name "Social Security Number Block Rule" -SubjectOrBodyMatchesPatterns '\d\d\d-\d\d-\d\d\d\d' -RejectMessageEnhancedStatusCode "5.7.1" -RejectMessageReasonText "This message has been rejected because of content restrictions" 

This example lets you view the new transport rule.

Copy Code
Get-TransportRule "Social Security Number Block Rule" | Format-List