Navigation:  Security Menu > Content Filter and AntiVirus > Content Filter Editor > Rules >

Using Regular Expressions in Your Filter Rules

Print this Topic Previous pageReturn to chapter overviewNext page

The Content Filtering system supports regular expression searches, which is a versatile system that makes it possible for you to search not only for specific text strings, but also for text patterns. Regular expressions contain a mix of plain text and special characters that indicate what kind of matching to do, and can thus make your Content Filter rules more powerful and better targeted.

What are Regular Expressions?

A regular expression (regexp) is a text pattern consisting of a combination of special characters known as metacharacters and alphanumeric text characters, or "literals" (abc, 123, and so on). The pattern is used to match against text strings—with the result of the match being either successful or not. Regexps are used primarily for regular text matches and for search and replace.

Metacharacters are special characters that have specific functions and uses within regular expressions. The regexp implementation within the MDaemon Content Filtering system allows the following metacharacters:

\ | () [] ^ $ * + ? . <>

Metacharacter

Description

\

When used before a metacharacter, the backslash ( "\" ) causes the metacharacter to be treated as a literal character. This is necessary if you want the regular expression to search for one of the special characters that are used as metacharacters. For example, to search for "+" your expressions must include "\+".

|

The alternation character (also called "or" or "bar") is used when you want either expression on the side of the character to match the target string. The regexp "abc|xyz" will match any occurrence of either "abc" or "xyz" when searching a text string.

[...]

A set of characters contained in brackets ("[" and "]") means that any character in the set may match the searched text string. A dash ("-") between characters in the brackets denotes a range of characters. For example, searching the string "abc" with the regexp "[a-z]" will yield three matches: "a," "b, " and "c. " Using the expression "[az]" will yield only one match: "a."

^

Denotes the beginning of the line. In the target string, "abc ab a" the expression "^a" will yield one match—the first character in the target string. The regexp "^ab" will also yield one match—the first two characters in the target string.

[^...]

The caret ("^") immediately following the left-bracket ("[") has a different meaning. It is used to exclude the remaining characters within brackets from matching the target string. The expression "[^0-9]" indicates that the target character should not be a digit.

(...)

The parenthesis affects the order of pattern evaluation, and also serves as a tagged expression that can be used in search and replace expressions.

The results of a search with a regular expression are kept temporarily and can be used in the replace expression to build a new expression. In the replace expression, you can include a "&" or "\0" character, which will be replaced by the sub-string found by the regular expression during the search. So, if the search expression "a(bcd)e" finds a sub-string match, then a replace expression of "123-&-123" or "123-\0-123" will replace the matched text with "123-abcde-123".

Similarly, you can also use the special characters "\1," "\2," "\3," and so on in the replace expression. These characters will be replaced only be the results of the tagged expression instead of the entire sub-string match. The number following the backslash denotes which tagged expression you wish to reference (in the case of a regexp containing more than one tagged expression). For example, if your search expression is "(123)(456)" and your replace expression is "a-\2-b-\1" then a matching sub-string will be replaced with "a-456-b-123" whereas a replace expression of "a-\0-b" will be replaced with "a-123456-b"

$

The dollar sign ("$") denotes the end of the line. In the text string, "13 321 123" the expression "3$" will yield one match—the last character in the string. The regexp "123$" will also yield one match—the last three characters in the target string.

*

The asterisk ("*") quantifier indicates that the character to its left must match zero or more occurrences of the character in a row. Thus, "1*abc" will match the text "111abc" and "abc."

+

Similar to the asterisk quantifier, the "+" quantifier indicates that the character to its left must match one or more occurrences of the character in a row. Thus, "1+abc" will match the text "111abc" but not "abc."

?

The question mark ("?") quantifier indicates that the character to its left must match zero or one times. Thus, "1*abc" will match the text "abc," and it will match the "1abc" portion of "111abc."

.

The period or dot (".") metacharacter will match any other character. Thus ".+abc" will match "123456abc," and "a.c" will match "aac," abc," acc," and so on.

Eligible Conditions and Actions

Regular expressions may be used in any Header filter rule Condition. For example, any rule using the "if the FROM HEADER contains" condition. Regular expressions may also be used in the "if the MESSAGE BODY contains" condition.

Regular expressions may be used in two Content Filter rule Actions: "Search and Replace Words in a Header" and "Search and Replace Words in the Message Body."

Regular expressions used in Content Filter rule conditions are case insensitive. Case will not be considered.

Case sensitivity in regular expressions used in Content Filter rule actions is optional. When creating the regexp within the rule's action you will have the option to enable/disable case sensitivity.

Configuring a Regexp in a Rule's Condition

To configure a header or message body condition to use a regular expression:

1. On the Create Rule dialog, click the checkbox that corresponds to the header or message body condition that you wish to insert into your rule.
2. In the summary area at the bottom of the Create Rule dialog, click the "contains specific strings" link that corresponds to the condition that you selected in step 1. This will open the Specify Search Text dialog.
3. Click the "contains" link in the "Currently specified strings..." area.
4. Choose "Matches Regular Expression" from the drop-down list box, and click OK.
5. If you need help creating your regexp or want to test it then click "Test regular expression." If you do not need to use the Test Regular Expression dialog then type your regexp into the text box provided, click Add, and then go to step 8.
6. Type your regular expression into the "Search expression" text box. To simplify the process we have provided a shortcut menu that can be used to easily insert the desired metacharacters into your regexp. Click the ">" button to access this menu. When you choose an option from this menu its corresponding metacharacter will be inserted into the expression and the text insertion point will be moved to the appropriate place required by the character.
7. Type any text that you wish to use to test your expression in the text area provided, and click Test. When you are finished testing your expression, click OK.
8. Click OK.
9. Continue creating your rule normally.

Configuring a Regexp in a Rule's Action

To configure a "Search and Replace Words in…" action to use a regular expression:

1. On the Create Rule dialog, click the checkbox that corresponds to the "Search and Replace Words in..." action that you wish to insert into your rule.
2. In the summary area at the bottom of the Create Rule dialog, click the "specify information" link that corresponds to the action that you selected in step 1. This will open the Search and Replace dialog.
3. If you chose the "Search...header" action in step 1, then use the drop-down list box provided to choose the header that you wish to search, or type a header into the box if the desired header isn't listed. If you did not choose the "Search...header" action in step 1 then skip this step.
4. Type the search expression that you wish to use in this action. To simplify the process we have provided a shortcut menu that can be used to easily insert the desired metacharacters into your regexp. Click the ">" button to access this menu. When you choose an option from this menu its corresponding metacharacter will be inserted into the expression and the text insertion point will be moved to the appropriate place required by the character.
5. Type the replace expression that you wish to use in this action. As with the search expression we have provided a metacharacter shortcut menu for this option as well. Leave this text box blank if you wish to delete a matched sub-string instead of replace it with more text.
6. Click "Match case" if you want the expression to be case sensitive.
7. Click Regular expression if you want the search and replace strings to be treated as regular expressions. Otherwise each will be treated as a simple sub-string search and replace—it will look for an exact literal match of the text rather than process it as a regular expression.
8. If you do not need to test your expression then skip this step. If you do need to test your expression then click "Run Test." On the Search and Replace Tester dialog, type your search and replace expressions and the text that you wish to test with, then click Test. When you are finished testing your regexps click OK.
9. Click OK.
10. Continue creating your rule normally.

MDaemon's regexps implementation uses the PERL Compatible Regular Expression (PCRE) library. You can find more information on this implementation of regexps at: http://www.pcre.org/ and http://perldoc.perl.org/perlre.html.

For a comprehensive look at regular expressions, see: Mastering Regular Expressions, Third Edition published by O'Reilly Media, Inc.