Bayesian Dictionary Will Not Update! WHY?

Discussions on webmail and the Professional version.
michelkenny
Posts: 42
Joined: Mon Sep 11, 2006 7:36 pm

Post by michelkenny »

I made a little modification to the script, in case anyone is interested. Instead of moving the excess messages to the old folders, never to be processed, I changed the script so that it moves them to an "excess" folder. Then next time you run the script it will move them back to the "new" folders, awaiting move message of the other kind so that they can be processed. Then the leftovers go back into the excess folder. This way your excess messages will eventually get processed when there are enough messages of the other kind. It just felt like a waste to move them to the old folders if ever they were disproportionate for a certain period of time. Now this could be very inefficient on large servers, but mine is a small one and it seems to work well.

Anyways, here it is in case someone is interested, or if the author wants to improve it :)

Code: Select all

@echo off
echo ****************************************************************
echo ******************   BAYESIAN FILTER UPDATE SCRIPT  ******************
echo ****************************************************************
echo.


rem ****************************************************************
rem ***************  CUSTOMIZE VARS FOR YOUR SYSTEM  *******************
rem ****************************************************************

rem ***  Set the Domains (ie, "Post-Office Names") you use as Spam and Ham Sources
rem ***  These must  match the spellings of the folders in the "Postoffices" folder
rem ***  Use spaces to separate Domain/Postoffice Names

SET Domains=mydomain.com yourdomain.net herdomain.org hisdomain.biz


rem ***  Set the 'minimum number of messages' required before updating the dictionary
rem ***  Ham & Spam messages will ACCUMULATE each time - until there are enough!

SET  MinMsgsBeforeProcessing=1000


rem ***  NOTE:  Be careful to NOT have any 'trailing spaces' when setting variables below

rem ***  Set the 'PostOffice'  and 'Dictionary' folders - WITHOUT a trailing "\"
SET  PO_Folder=E:\Email\Postoffices
SET  DIC_Folder=E:\Email\Dictionaries

rem ***  Set the 'Dictionary' path exactly the same as in the 'Bayesian Filter' Properties
SET Dictionary=%DIC_Folder%\Bayesian_Dic.TAB

rem ***  Set the 'Mailbox Names' you are using to collect Spam and Ham messages
rem ***  It is assumed you use the 'Inbox' of these mailboxes to store these emails
rem ***  If this is not so, then adjust the MOVE command(s) in the next section
SET  SpamMailbox=SPAM
SET  HamMailbox=HAM

rem ***  Set Spam and Ham 'Dictionary' and 'Storage' folders - WITHOUT a trailing "\"
SET  NewSpam=%DIC_Folder%\NewSpam
SET  NewHam=%DIC_Folder%\NewHam
SET  OldSpam=%DIC_Folder%\OldSpam
SET  OldHam=%DIC_Folder%\OldHam
SET  ExcessSpam=%DIC_Folder%\ExcessSpam
SET  ExcessHam=%DIC_Folder%\ExcessHam



rem ****************************************************************
rem **************  COLLECT THE HAM AND SPAM MESSAGES  ******************
rem ****************************************************************

rem ***  Move email stored by your configured filters to the dictionary folders

echo.
echo    MOVE  messages from Spam Source folder(s) to dictionary Spam folder...
echo.
FOR  %%D  IN  (%Domains%)  DO   MOVE /Y  "%PO_Folder%\%%D\MAILROOT\%SpamMailbox%\Inbox\*.mai"  "%NewSpam%\"

echo.
echo    MOVE messages from Ham Source folder(s) to dictionary Ham folder...
echo.
FOR  %%D  IN  (%Domains%)  DO   MOVE /Y  "%PO_Folder%\%%D\MAILROOT\%HamMailbox%\Inbox\*.mai"  "%NewHam%\"




rem *******************************************************************
rem **************  MOVE THE EXCESS HAM AND SPAM MESSAGES  ******************
rem *******************************************************************

echo.
echo    MOVE messages from excess Spam folder to dictionary Spam folder...
echo.
MOVE /Y  "%ExcessSpam%\*.mai"  "%NewSpam%\"

echo.
echo    MOVE messages from excess Ham folder to dictionary Ham folder...
echo.
MOVE /Y  "%ExcessHam%\*.mai"  "%NewHam%\"




rem *******************************************************
rem ***  IMPORTANT NOTE - DO NOT IGNORE OR THIS WON'T WORK *****
rem *******************************************************
rem
rem ***  You MUST add this registry entry to enable 'Delayed Expansion' ***
rem
rem    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Command Processor]
rem    "DelayedExpansion"=dword:00000001
rem
rem *******************************************************


rem ***  This section will count and move EXCESS Ham OR Spam to the storages folders
rem ***  The target is 1-to-1 -- equal quantities of Ham and Spam messages


rem ***  Initialize counter variables
SET  NewSpam_Count=0
SET  NewHam_Count=0
SET  ExcessSpam_Count=0
SET  ExcessHam_Count=0

rem ***  NOTE: "SET /A" lets us do math!  Use SET /? to see all available options

rem ***  COUNT the messages in the Dictionary folders
FOR  %%M  IN  ("%NewHam%\*.mai")  DO  SET /A  NewHam_Count += 1
FOR  %%M  IN  ("%NewSpam%\*.mai")  DO  SET /A  NewSpam_Count += 1

echo.
echo    There are %NewHam_Count% Ham messages and %NewSpam_Count% Spam messages
echo.


rem ***  If either message count is below the minimum required, then just exit

IF  %NewHam_Count%  LSS  %MinMsgsBeforeProcessing%  (
   ECHO There are only %NewHam_Count% Ham Messages - EXITING...
   GOTO  DONE
)

IF  %NewSpam_Count%  LSS  %MinMsgsBeforeProcessing%  (
   ECHO There are only %NewSpam_Count% Spam Messages - EXITING...
   GOTO  DONE
)


rem ***  Calculate number of 'Excess Messages' - only 1 can be 'positive'
SET /A  ExcessHam_Count=NewHam_Count - NewSpam_Count
SET /A  ExcessSpam_Count=NewSpam_Count - NewHam_Count

rem *** MOVE any Excess Messages to Storage - of whichever type is 'excess'

IF  %ExcessHam_Count%  GTR 0  (
   echo There are %ExcessHam_Count% TOO MANY HAM Messages
   FOR  %%M  IN  ("%NewHam%\*.mai")  DO  (
      rem ***  MUST delimit with "!" here (delayed expansion) to update in a loop
      IF  !ExcessHam_Count!  GTR 0  MOVE /Y  "%%M"  "%ExcessHam%\"
      SET /A  ExcessHam_Count = !ExcessHam_Count! - 1
   )
)

IF  %ExcessSpam_Count%  GTR 0  (
   echo There are %ExcessSpam_Count% TOO MANY SPAM Messages
   FOR  %%M  IN  ("%NewSpam%\*.mai")  DO  (
      rem ***  MUST delimit with "!" here (delayed expansion) to update in a loop
      IF  !ExcessSpam_Count!  GTR 0  MOVE /Y  "%%M"  "%ExcessSpam%\"
      SET /A  ExcessSpam_Count = !ExcessSpam_Count! - 1
   )
)




rem ****************************************************************
rem ***************  UPDATE & RELOAD THE DICTIONARY  *******************
rem ****************************************************************

echo.
echo.
echo ***  Tell  MTA filter to write current dictionary from memory to file.
echo.
MESPAMCMD  -w

echo.
echo ***  Process the new Spam and Ham messages to update the dictionary.
echo.
MESPAMCMD  -m  "%Dictionary%"  "%NewSpam%"  "%NewHam%"

echo.
echo ***  Tell MTA filter to reload the dictionary from the newly updated file
echo.
MESPAMCMD  -r
echo.




rem ****************************************************************
rem ****************  EMPTY THE DICTIONARY FOLDERS  ********************
rem ****************************************************************

echo.
echo ***  Copy all the mail from the dictionary folders to the storage folders
echo ***  These will come in handy if you ever need to create a new dictionary

echo.
echo    MOVE NewHam to OldHam ...
echo.
MOVE /Y  "%NewHam%\*.*"  "%OldHam%\"

echo.
echo    MOVE NewSpam to OldSpam ...
echo.
MOVE /Y  "%NewSpam%\*.*"  "%OldSpam%\"




rem ***************  FINISHED -- PAUSE IF TESTING THE SCRIPT  *******************
:DONE

rem ***  Enable PAUSE while testing the batch file so you can see the output
rem ***  Otherwise the DOS window immediately closes upon completion
echo.
PAUSE


EXIT
Last edited by michelkenny on Wed Sep 27, 2006 1:17 am, edited 1 time in total.

ALLPRO
Posts: 23
Joined: Sun Apr 23, 2006 11:44 pm

Re: Revised Script

Post by ALLPRO »

Michael Kenny wrote:
I made a modification to the script... Instead of moving the excess messages to the old folders, never to be processed, I changed the script so that it moves them to an "excess" folder. Then next time you run the script it will move them back to the "new" folders
IF your server has roughly equal Spam/Ham or the ratio fluctuates back and forth, then this modification can milk a few more samples out of your email.

However in my case, there is always 5 or 10 times more Spam than Ham, so there is no need for me to reuse the 'extra' Spam in the future.

Ciao,

/Kevin

michelkenny
Posts: 42
Joined: Mon Sep 11, 2006 7:36 pm

Re: Revised Script

Post by michelkenny »

ALLPRO wrote:IF your server has roughly equal Spam/Ham or the ratio fluctuates back and forth, then this modification can milk a few more samples out of your email.

However in my case, there is always 5 or 10 times more Spam than Ham, so there is no need for me to reuse the 'extra' Spam in the future.
Yeah I guess that's true. I made the changes for myself since I initially copied all of my users sent items, which total 6000. I didn't want to waste them all, so I'm just letting the spam catch up (I only have about 2000 saved).

JasonCMX
Posts: 33
Joined: Fri Apr 09, 2004 12:22 pm
Location: Michigan, USA

Post by JasonCMX »

I'm getting the error out when at this line.
IF %ExcessHam_Count% GTR 0 (

The error:
0 was unexpected at this time.

ALLPRO
Posts: 23
Joined: Sun Apr 23, 2006 11:44 pm

Troubleshooting

Post by ALLPRO »

JasonCMX wrote:I'm getting the error out when at this line.
IF %ExcessHam_Count% GTR 0 (

The error:
0 was unexpected at this time.
I cannot say what is causing this error? The message doesn't make sense because '0' IS a valid 'count' value -- ie, there is nothing wrong with this equation: 'IF 0 GTR 0' - it would simply be false.

I suggest you double-check all the calculated values, like 'ExcessHam_Count'. Most variable values are echo'd to the screen so you can see them, and it is easy to add more output if needed.

You should also isolate the problem code to be sure it is where you think it is. Start commenting-out the script starting from the end. Keep disabling sections until the error goes away, and re-enable things line by line until you can confirm the exact line, and even exact word, that is causing the error. THEN...

A) Confirm that the variables in the offending code have been calculated correctly. For example, a 'count' should never be an 'empty string'.

B) Test your DOS environment by copying just the offending line into a new batch file - replacing all variables with hard-coded values. This will determine whether the error is caused by a bad value, OR whether your DOS environment will not run the command itself!

I hope these trouble-shooting tips help. These kinds of problems is why I wrote the script to output everything as it goes - so that bad variable values can be spotted.

kazmax
Posts: 14
Joined: Mon Feb 03, 2003 11:59 am
Location: Bracknell

Post by kazmax »

I am just setting up this script on my server. I nearly have it working, but have a question.

I have configured a filter called 'Spam' on the 'Filters' node of the 'Post Offices'. When the message has over a certain spam probability (90%), it does the following:

1) adds a prefix to the subject of the message
2) Adds a header (X-Spam) to the message
3) Mark message as spam
4) Forward message to 'spam@mydomain.com'

The above works fine.

I have another filter called 'Ham', which does the following

1) Process for ALL messages
2) Forward message to 'Ham@mydomain.com'

I'm having a slight problem with the latter, although it works as expected. Presumably I need to forward only HAM messages to the SPAM folder? But there is no check for 'If the message has UNDER a certain SPAM probability'.

So how do I configure a filter to forward HAM messages which haven't failed the SPAM check?

I feel I'm missing something obvious!

Andrew

ALLPRO
Posts: 23
Joined: Sun Apr 23, 2006 11:44 pm

Post by ALLPRO »

kazmax wrote:I have another filter called 'Ham', which does the following

1) Process for ALL messages
2) Forward message to 'Ham@mydomain.com'
Including 'incoming' messages would feed the dictionary lots of Spam, because the current filters are NOT 100% accurate, which is why you are setting up the dictionary! Therefore, sending all the 'missed Spam' to the dictionary seems counter-productive to me.

This is why I collect Ham ONLY from 'outgoing' messages - then I know the messages are (theoretically) 100% Spam-free. I just added a filter to 'copy' all outgoing messages to my Ham mailbox.

Someone more knowledgeable may offer different advice, but this is what I do.

/Kevin

kazmax
Posts: 14
Joined: Mon Feb 03, 2003 11:59 am
Location: Bracknell

Post by kazmax »

Something seems a little odd here. I've implemented the script detailed above. These settings are relevant to what follows:

SET DIC_Folder=D:\MailEnable\Dictionaries
SET Dictionary=%DIC_Folder%\Bayesian_Dic.TAB

So I have my Bayesian filter file set to:

D:\MailEnable\Dictionaries\Bayesian_Dic.TAB

I also have this batch job executing once an hour. And I'm witnessing this file getting smaller, not larger.

It was about 1.3Mb this morning. Mid-afternoon it was 546Kb. Just checked again and it is 346Kb. This doesn't seem right to me.

In the MailEnable Professional plug-in I have the dictionary set to the same file via the following:

MailEnable:MailEnable Management:Servers:localhost:Filters

Auto-training is off.

Am I supposed to be seeing this bayesian file shrink like this, or am I looking at a potential problem?

Andrew

kazmax
Posts: 14
Joined: Mon Feb 03, 2003 11:59 am
Location: Bracknell

Post by kazmax »

Forgot to mention, I implemented the change suggested previously. My Ham emails are now only the emails sent from validated email addresses, not everything as I indicated before.

Andrew

Post Reply