Bayesian Dictionary Specifications

Discussion forum for Enterprise Edition.
Post Reply
jfenwickar
Posts: 14
Joined: Fri Oct 07, 2005 9:19 pm
Location: Fort Smith, Arkansas, USA
Contact:

Bayesian Dictionary Specifications

Post by jfenwickar » Mon Jan 19, 2009 8:06 pm

The first line shows the emails added fom ham and spam honeypots.

Is it necessary that the numbers shown on this line reflect actual emails added?

Or is this number only used to report the "Bayesian Dict. Ham" and "Bayesian Dict. Spam" numbers in the MailEnable System Overview?

MailEnable-Ian
Site Admin
Posts: 9227
Joined: Mon Mar 22, 2004 4:44 am
Location: Melbourne, Victoria, Australia

Post by MailEnable-Ian » Mon Jan 19, 2009 10:50 pm

Hi,

The first line within the Bayesian dictionary file is the frequency of tokens that are added from messages. The dictionary allows MailEnable to analyze messages and provide a probability of a message being spam.

For example, if the token “FREE” occurs mostly in spam emails, but rarely in good emails and a new message has the token “FREE” in it, it is likely to be spam.

The token values within the dictionary are not used by the overview utility. These values are taken from the MailEnable registry.
Regards,

Ian Margarone
MailEnable Support

jfenwickar
Posts: 14
Joined: Fri Oct 07, 2005 9:19 pm
Location: Fort Smith, Arkansas, USA
Contact:

first line of MailEnable.TAB

Post by jfenwickar » Tue Jan 20, 2009 12:23 am

My first line looks like:

Token 16883 12293

If I stop the MTA, then change the values then restart the MTA, the MailEnable System Overview changes to show the new numbers I typed.

I have been unable to discern any effect these numbers have on the Bayesian filtering other than allowing the training addition of new Spam entries when I increase the Ham number to equal to or greater than the Spam number.

The filter appears to be just as effective if I set these numbers to 0.

Am I missing something?

MailEnable-Ian
Site Admin
Posts: 9227
Joined: Mon Mar 22, 2004 4:44 am
Location: Melbourne, Victoria, Australia

Post by MailEnable-Ian » Wed Jan 21, 2009 12:31 am

Hi,

As mentioned the token values are taken from the dictionary file and then loaded into the MailEnable registry branch. EG:

HKEY_LOCAL_MACHINE\SOFTWARE\Mail Enable\Mail Enable\Agents\MTA\Filters\MTAFILTER\Counters

If you stop the MTA and then change the values manually in the file and restart the MTA, the values will be loaded into the registry for the overview utility to read, thus the reason why you see the values changing in the overview utility. If you change these values to 0 then this will effect the auto training and the bayesian filter scoring, so I would suggest not modifying them manually.
Regards,

Ian Margarone
MailEnable Support

jfenwickar
Posts: 14
Joined: Fri Oct 07, 2005 9:19 pm
Location: Fort Smith, Arkansas, USA
Contact:

Training

Post by jfenwickar » Wed Jan 21, 2009 12:33 am

Ahh, I see.

So, manual training is the way to go to get large bulks of email examined rather then processing them through the queues.

I had pretty much decided that already.

Thanks for the reply and help.

Jesse

Post Reply