MailEnable Enterprise Guide
Spam Training Utility

MailEnable provides a command line utility that can be used to manage spam/non-spam dictionaries. This program is called MESPAMCMD.EXE and is located in the MailEnable BIN directory.

The spam training utility only works on the files stored on the hard disk.  The auto-training feature should be disabled, or the MTA service stopped before any manual update of the dictionary occurs. 

MESPAMCMD -[options] [dictionary, paths]

[c] = Create Dictionary

[v] = Verify messages in the specified folder against the nominated Dictionary

[s] = Score a single message against the nominated Dictionary

[m] = Merge Spam and NoSpam folders into nominated Dictionary

[r] = Notifies the spam filter to reload the dictionary

[p] = Prunes the Dictionary to allow insertion of more words

Example:

MESPAMCMD -c C:\TEST\ME.TAB C:\TEST\SPAM C:\TEST\NOSPAM

An example command line for compiling a dictionary based on the example shown follows:

MESPAMCMD -c C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\MailEn~1.TAB  C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\Spam C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\NoSpam

Note: The Spam Training Command Line Utility must use short style file paths (i.e.: the paths cannot contain spaces)

Using XML or Tab delimited files

Filtering dictionaries can be constructed as either XML or TAB delimited files.

XML files are slower to load, but may be more desirable if externally managing the dictionary. Tab files are much more efficient (faster loading), so it is advisable to use the default TAB files. The filter determines whether the file is XML or TAB delimited by the file extension. The format for the XML files is:

<ELEMENTS>

  <ENTRIES W="[number of ham emails]" B="[number of spam emails]">

  <E W="[number in ham emails]" B="[number in spam emails]">word</E>

  <E W="[number in ham emails]" B="[number in spam emails]">word</E>

  …

  …

  </ENTRIES>

</ELEMENTS>

Verifying a dictionary

The command line utility can be used to validate a directory of messages against the dictionary. This will provide a percentage probability of spam for each message in the folder.

MESPAMCMD -v MailEn~1.TAB C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\Test         

Scoring a message

Scoring a single message is much like verifying a directory, except the second parameter is a message file rather than a directory.

An example of scoring a message follows:

MESPAMCMD -s MailEn~1.TAB C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\Test\1A38DF23D30845E0B5FF51530A266.MAI

Merging a dictionary

Merging a dictionary is much like creating a new dictionary, except that messages in the Spam and NoSpam directories are appended to the dictionary rather than re-creating it. This is useful to add new messages to the dictionary to refine Spam detection.

An example for merging new content with an existing spam dictionary follows:

MESPAMCMD -m MailEn~1.TAB C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\Spam C:\Progra~1\MailEn~1\Dictio~1\NewDic~1\NoSpam

Reload a dictionary

If changes are made to a dictionary while the spam filter is running, it will not automatically reload it unless it is notified, as the dictionary is held in memory. The dictionary can be reloaded by either restarting the MTA service or using the –r option of the mespamcmd program to tell the spam filter to reload it.

MESPAMCMD –r

Pruning a dictionary

Pruning a directory involves removing any items from the dictionary that will not be able to be used effectively to determine spam or non-spam.  This is done by removing items which very rarely occur, and items which occur almost equally in spam and non-spam emails. To prune, provide the path and filename to a dictionary file. After pruning, this file will be overwritten with the new dictionary.

MESPAMCMD -p MailEn~1.TAB

Checking the dictionary

To check the dictionary, open up the DIC.tab file in the following location using Notepad; 
C:\Program Files\Mail Enable\Dictionaries\DIC.tab

To check the integrity of the file make sure the first line shows the number of good and bad messages that have been added into the dictionary.  The first number will equal the amount of messages that were in the SPAM folder and the second column equaling the NOSPAM folder.  The first number in the line should equal the amount of bad messages (spam) merged into the dictionary the second number should match the good messages (ham). Each number after this first line equals the amount of good and bad words/tokens were found as a total in each message.

 

 


© MailEnable Pty. Ltd. All Rights Reserved.