Navigation:  Security Menu > Spam Filter > Spam Filter >

Bayesian Classification

Print this Topic Previous pageReturn to chapter overviewNext page

Bayesian Classification is unavailable when you have configured MDaemon to use another server's MDaemon Spam Daemon (MDSpamD) for Spam Filter processing. All Bayesian learning will be performed on the other server. See the Spam Daemon screen for more information.

The Spam Filter supports Bayesian learning, which is a statistical process that can optionally be used to analyze spam and non-spam messages in order to increase the reliability of spam recognition over time. You can designate a folder for spam messages and non-spam message that will can be scanned manually or automatically at regular intervals. All of the messages in those folders will be analyzed and indexed so that new messages can be compared to them statistically in order to determine the likelihood that they are spam. The Spam Filter can then increase or decrease a message's spam score based upon the results of its Bayesian comparison.

The Spam Filter will not apply a Bayesian classification to messages until a Bayesian analysis has been performed on the number of spam and non-spam messages designated on the Bayesian Auto-learning screen. This is necessary in order for the Spam Filter to have a sufficient pool of statistics to draw from when making the Bayesian comparison. Once you have given the system these messages to analyze, it will be sufficiently equipped to begin applying the results of a Bayesian comparison to each incoming message's spam score. By continuing to analyze even more messages the Bayesian classifications will become more accurate over time.

Bayesian Classification

Enable Bayesian classification

Click this check box if you want each message's spam score to be adjusted based on a comparison to the currently known Bayesian statistics.

Schedule Bayesian learning for midnight each night

When this option is active, once each day at midnight the Spam Filter will analyze and then delete all messages contained in the spam and non-spam folders specified below. If you wish to schedule Bayesian learning for some other time interval then clear this option and use the Schedule Bayesian learning for once every XX hours option below. If you do not wish Bayesian learning to ever occur automatically, then clear this option and specify "0" hours in the option below.

Schedule Bayesian learning for once every XX hours (0=never)

If you wish Bayesian learning to occur at some time interval other than once each night at midnight, then clear the above option and specify a number of hours in this option instead. Each time that number of hours has elapsed, the Spam Filter will analyze and then delete all messages contained in the spam and non-spam folders specified below. If you do not wish Bayesian learning to ever occur automatically, then clear the above option and specify "0" hours in this option.

If for some reason you do not want the messages to be deleted after they are analyzed then you can prevent that by copying LEARN.BAT to MYLEARN.BAT in the \MDaemon\App\ subfolder and then deleting the two lines that begin with "if exist" near the bottom in that file. When the MYLEARN.BAT file is present in that folder MDaemon will use it instead of LEARN.BAT. See SA-Learn.txt in your \MDaemon\SpamAssassin\ subfolder for more information.

For more detailed information on heuristic spam filtering technology and Bayesian learning, visit:
 
http://www.spamassassin.org/doc/sa-learn.html.

Don't learn from messages larger than XX bytes (0=no limit)

Use this option to designate a maximum message size for Bayesian analysis. Messages larger this value will not be analyzed. Specify "0" in this option if you do not wish to implement any size restriction.

Learn

Click this button to initiate a manual Bayesian analysis of the designated folders rather than waiting for the automatic analysis.

Enable spam and ham forwarding addresses

Click this check box if you wish to allow users to forward spam and non-spam (ham) messages to designated addresses so that the Bayesian system can learn from them. The default addresses that MDaemon will use are "SpamLearn@<domain.com>" and "HamLearn@<domain.com>". Messages sent to these addresses must be received via SMTP from a session that is authenticated using SMTP AUTH. Further, MDaemon expects the messages to be forwarded to the above addresses as attachments of type "message/rfc822". Any message of another type that is sent to these email addresses will not be processed.

You can change the addresses MDaemon uses by adding the following key to the CFilter.INI file:

[SpamFilter]

SpamLearnAddress=MySpamLearnAddress@

HamLearnAddress=MyNonSpamLearnAddress@

Note: the last character of these values must be "@".

Create

Click this button to create spam and non-spam Public IMAP Folders automatically, and to configure MDaemon to use them. The following folders will be created:

\Bayesian Learning.IMAP\

Root IMAP folder

\Bayesian Learning.IMAP\Spam.IMAP\

This folder is for false negatives (spam that doesn't score high enough to get flagged as such).

\Bayesian Learning.IMAP\Non-Spam.IMAP\

This folder is for false positives (non-spam messages that erroneously score high enough to get flagged as spam).

By default, access permission to these folders is only granted to local users of local domains and is limited to Lookup and Insert. The postmaster's default permissions are Lookup, Read, Insert, and Delete.

Path to known spam folder (false negatives):

This is the path to the folder that will be used for Bayesian analysis of known spam messages. Only copy messages to this folder which you consider to be spam. You should not automate the process of copying messages to this folder unless doing so via the Bayesian Auto-learning or Spam Honeypots options. Automating this process by some other means could potentially cause non-spam messages to be analyzed as spam, which would decrease the reliability of the Bayesian statistics.

Path to known non-spam folder (false positives):

This is the path to the folder that will be used for Bayesian analysis of messages that are definitely not spam. Only messages that you do not consider to be spam should be copied to this folder. You should not automate the process of copying messages to this folder unless doing so via the Bayesian Auto-learning options. Automating this process by some other means could potentially cause spam messages to be analyzed as non-spam, which would decrease the reliability of the Bayesian statistics.

Pub Folder

Click one of these buttons to designate one of your existing Public Folders as the Bayesian directory. This is an easy way for your users to place their messages incorrectly categorized as spam or non-spam into your Bayesian directories for analysis. Note, however, that giving access to more people increases the likelihood that some messages will be put into the wrong folders thus skewing the statistics and decreasing reliability.

If you rename a Public folder via a mail client, Windows Explorer, or some other means, then you must manually reset this path to the appropriate new folder name. If you rename a folder but do not change its path here, the Spam Filter will continue to use this path for the Bayesian folder instead of the new one.

See: