filtering blacklisted spam domains

Discussions on webmail and the Professional version.
Brad
Posts: 169
Joined: Fri Mar 04, 2005 4:56 pm

filtering blacklisted spam domains

Post by Brad »

Does anyone use the list of spam domains listed http://wiki.mozilla.org/Spam_blacklist to block spam.

I have written an asp script that strips out the domains (and surrounds them with asterisks) from
http://www.joewein.de/sw/bl-log.htm
and writes them to a text file that I use on a filter.

I names the filter "KNOWN SPAM URLS" and I check the message body for those domains and ME is filtering them very well since the mentioned webpage is updated daily with the latest registered domains of the spammers.

I was wandering if any has done anything like this with the domains listed at http://wiki.mozilla.org/Spam_blacklist or anything else similar to this idea.

trusnock
Posts: 132
Joined: Tue Jan 31, 2006 8:42 pm

Filtering blacklisted spam domains

Post by trusnock »

Brad,
Would you consider sharing that ASP script? That sounds like a great idea. The mozilla.org list uses RegExp matching, so unless the MailEnable filters understand Regular Expressions, it will not be easy to use the complete Mozilla list in a filter. But perhaps we could write a pickup event that uses a Windows port of grep or sed. This might be slow though... Has anybody tried this?

-Tom Rusnock

Brad
Posts: 169
Joined: Fri Mar 04, 2005 4:56 pm

Post by Brad »

well, I would love to, but since this does involve stripping out data from someone elses webpage. I don't feel at liberty to publish the code that would allow everyone to daily hit his website with multiple hits with this script.
If I did, I'm sure it wouldn't be long before he changed his site to block this kind of operation.

However, I will send you the Filter file that I am using that contains all the known URLs that are spam, just send me a pm with your email address and I will send you the latest file that I am using on the filter at that point.

I may try to setup a page on one of my website that would allow anyone to download the file daily (since it would be my bandwith then) if there is enough people that would use the file. I just don't have the time to set that up at the moment.

....

Brad
Posts: 169
Joined: Fri Mar 04, 2005 4:56 pm

Post by Brad »

okay, I found the time to setup my google website, and while I was testing that out I added the ME filter file to the page.

http://bradbamford.googlepages.com

trusnock
Posts: 132
Joined: Tue Jan 31, 2006 8:42 pm

Post by trusnock »

Great, thanks. By the way, according to this page (http://www.joewein.de/sw/bl-text.htm), it sounds like he's ok with people downloading the "new additions" list several times per day. If this is effective, though, I certainly think he deserves the donations he's asking for... We'll see what our users think of the results!
-Tom

moegal
Posts: 118
Joined: Mon Feb 09, 2004 10:30 pm

what filter setting

Post by moegal »

What filter settings are you using with this list? Is it

"Where the from header line contains specific words"

Thanks in advance,

Marty

Brad
Posts: 169
Joined: Fri Mar 04, 2005 4:56 pm

Post by Brad »

Actually I am using it where the message body contains this text.

rarely will spammers send spam from there own ips in which there website is hosted so using this in the header wouldn't block very much.

The file comain domains which are known to be sent by spammers to trying to get you to click a link pointing you to their domain (website).

jbrochu
Posts: 113
Joined: Fri Mar 24, 2006 10:19 pm

Post by jbrochu »

Hey Brad. Thought I would say thanks for the post. I put this in place yesterday. Works good. For now I guess I will just download your txt file once a week.

I think everyone should use this file.

jbrochu
Posts: 113
Joined: Fri Mar 24, 2006 10:19 pm

Post by jbrochu »

Brad do you experience cpu spikes while using this file? It is over 14,000 lines and using external word files is slow to begin with. I think its a great list but it may not be practical.

Thoughts?

Joe

Brad
Posts: 169
Joined: Fri Mar 04, 2005 4:56 pm

Post by Brad »

Actually I have experienced that problem, ME says they are changing the way filters are executed in Version 2 and it should more efficiently handle filtering emails.
What I am doing now is following suggestions from ME on how to control CPU spikes, search the forms there’s a lot of information about this, it basically will just take your pc longer to process the filters, but remain healthy. (so it is a sacrifice)

Also, what I will probably start doing is only using the latest domains (which are the most actively used by spammers anyway). That would work more efficiently with the current ME version. I'll probably try to rotate the latest 30 days worth of newly created spam domains. (Which would be much fewer lines in the file).

Also, I run this filter last after all my other filters have run, and on most of the filters above I have selected "Stop Processing Filters". So this filter is only run on files that have made it past all my other "faster" spam filters.
but the amount of spam that is getting past my other filters and caught by this filter is astonishing. So much so, that it is worth the trouble of slowing down the filter speed in order to catch this spam.

This filter catches several thousand messages a day that I would have missed otherwise. And if I had missed them, I wouldn't look very good to my customers... So what do you do?

I do whatever it takes, until something better comes along.

jbrochu
Posts: 113
Joined: Fri Mar 24, 2006 10:19 pm

Post by jbrochu »

Hey Brad I have a few questions:

So if I have a filter that checks the subject line and if its positive and the email is per say deleted as I told it to, that this same email continues on through the other filters below it? I thought that if it was positive on a filter and I told it to delete it, that it wouldn't go through the remaining filters.

Next, will you be supplying the smaller file with the popular spam domains, on your google site like you do now with the large file?

I was gonna cut it in thirds maybe, and if you confirm that you actually have to stop processing filters even though the email trips a filter before the rest, i will configure each of the 3 new filters containing the 14,000+ lines and if it hits the first 5,000 and its positive i will stop processing the remaining 10,000. Make sense?

Thanks

moegal
Posts: 118
Joined: Mon Feb 09, 2004 10:30 pm

ran too high

Post by moegal »

I tried the list and it took a steady 25% cpu with no end in site. I had to remove the filter.

Marty

Brad
Posts: 169
Joined: Fri Mar 04, 2005 4:56 pm

Post by Brad »

jbrochu wrote: So if I have a filter that checks the subject line and if its positive and the email is per say deleted as I told it to, that this same email continues on through the other filters below it? I thought that if it was positive on a filter and I told it to delete it, that it wouldn't go through the remaining filters.
Well, I have my filter set to add a X-SPAM header to message, copy to Quarantine, deleted the message, then stop processing filters.

The reason is that I noticed in the logs that the messages returned positive in more than one filter even though I deleted the message. I don't know if it is because of the copy to quarantine or not, so you could check the logs yourself, or just add stop processing filters just to be safe.
jbrochu wrote: Next, will you be supplying the smaller file with the popular spam domains, on your google site like you do now with the large file?
Possibly in the future, It will be when I can automate the file to be updated on my weberver, i don't have the time to update it on google as much as I should. I plan on having it available as soon as I can find the time to offer those. the more demand the faster i would work to get it.
jbrochu wrote: I was gonna cut it in thirds maybe, and if you confirm that you actually have to stop processing filters even though the email trips a filter before the rest, i will configure each of the 3 new filters containing the 14,000+ lines and if it hits the first 5,000 and its positive i will stop processing the remaining 10,000. Make sense?
Make sense, but breaking up the file into 3 may or may not help the CPU because now the filter is having to open, read and close 3 files instead of just one. I know there is a mathematical solution to which one would be more efficient, but I don't know what would be the breaking point of the equation of file size vs number smaller files having to be opened. My guess would be that your solution of 3 files would be more efficient.
Last edited by Brad on Wed Mar 29, 2006 3:01 am, edited 1 time in total.

jbrochu
Posts: 113
Joined: Fri Mar 24, 2006 10:19 pm

Post by jbrochu »

Well heres what I have done:

I downloaded Brads latest file and split it into 3 files, URL-1.txt URL-2.txt and URL-3.txt. I put 5000 lines of domains in each file and dumped the remaining 280 or so domains.

I setup a filter for each list. These are my last 3 filters. I set each to notify me, delete msg and stop processing filters.

I am guessing from what Brad said, that even if I put for actions notify address and delete, that any filters that follow that one would still be checked. With this in mind, breaking the file up should work better. If it finds the domain in the first url file then that saved checking 10,000 domains and or lines.

Brad
Posts: 169
Joined: Fri Mar 04, 2005 4:56 pm

Post by Brad »

I just updated the large file (which is now larger) and also split the file into 2 files, one that is the last 30 days and one that is post 30 days.

Post Reply