Koozali.org: home of the SME Server

Spam Filter

Offline sonoracomm

  • *
  • 208
  • +0/-0
    • http://www.sonoracomm.com
Spam Filter
« on: May 23, 2006, 10:05:05 PM »
Hi all,

Does anyone have any recommendations for configuring spam filtering on SME 7?  The configuration panel text, unlike most server-manager panels, doesn't really help me, though once I understand the options better, it will probably make sense.

For me, experimentation with settings has not been easy or helpful.  Changes I have experimented with don't seem to work as you would think, though my testing was on RC1.

Is there any documentation that anyone could point me to specific to SME 7?

Yes, I've spent the requisite two hours serching the forum, documentation and buglist for info.  Most of what you find is not for SME 7.  Although I found several posts by knowledgeable folks admonishing posters to "search the forum" for answers, I didn't find any basic configuration information.

Thanks in advance,

G

Offline idyll

  • ***
  • 113
  • +0/-0
there is no documentation speccific to SME
« Reply #1 on: May 24, 2006, 02:52:11 PM »
Hello.

I would assume you refer specifically to the page titled "Email settings" ? I also assume you are running the latest build? Assuming these are true...

A good starting point is do you have any experience with spamassassin? If you don't, then the spamassassin wiki is a good place to get a base education. If you do, the terminology should make more sense than your post implies.

The Spam Sensitivity is designed to get you started by determining the low and high thresholds for the system to use to sort or delete messages tagged with a specific numeric "rating" based upon characteristics associated with 'spam".

Thus the "low" number will be the number which is the "low " mark for determining if email is SPAM or not. The "high" number is the one to be careful with as email tagged higher than this integer is automatically deleted. Mail with ratings between these two numbers is sorted to the "junkmail" folder per user, optionally.

Does this 10,000 foot view help?

regards,

patrick
...

Offline sonoracomm

  • *
  • 208
  • +0/-0
    • http://www.sonoracomm.com
Spam Filter
« Reply #2 on: May 24, 2006, 07:54:52 PM »
Thanks much for your help!

As you surmised, I am not familiar with SA.

I was hoping for a configuration guide for SA as implemented in SME 7...so as not to reinvent the wheel, so to speak.  However, your info gave me more to work on, and I spent another couple of hours writing this howto:

http://www.sonoracomm.com/index.php?option=com_content&task=view&id=49&Itemid=32

I would appreciiate any feedback to make this howto more accurate.  Please send corrections/additions to gcooper(at)sonoracomm(dot)com.

Thanks again for pointing me in the right direction,

G

p.s.  Here's a ASSP on SME howto I wrote.  I recently updated it for SME 7, but it needs testing.

http://www.sonoracomm.com/index.php?option=com_content&task=view&id=48&Itemid=32

Offline raem

  • *
  • 3,972
  • +4/-0
Spam Filter
« Reply #3 on: May 24, 2006, 11:06:07 PM »
sonoracomm

>... for older versions of SME, I still recommend ASSP.

I wouldn't recommend using anything older than 6.x as they are insecure.

ASSP & similar became pretty much deprecated by features incorporated into and added to sme 6.0.x eg RBL, pattern matching etc, of course in conjunction with knuddi's antivirus & spam filter contribs and the additional tweaks mentioned in this howto.

http://mirror.contribs.org/smeserver//contribs/rmitchell/smeserver/howto/Mail%20system%20tweaks%20HOWTO%20for%20sme%20server.htm
...

Offline raem

  • *
  • 3,972
  • +4/-0
Spam Filter
« Reply #4 on: May 24, 2006, 11:12:47 PM »
sonoracomm

The howto looks good. It will help novices tryng to understand the concepts.

In my opinion RBL rejection is a fundamental part of spam & virus management.
I would strongly suggest adding the commands to your howto for enabling and configuring RBL on sme7. As this is not enabled by default, and users may be unsure which RBL 's to use, some information would be good. Search here on RBL to find the commands which I (& others) have posted previously.
...

Offline sonoracomm

  • *
  • 208
  • +0/-0
    • http://www.sonoracomm.com
Spam Filter
« Reply #5 on: May 25, 2006, 12:20:15 AM »
Hi Ray,

Thanks for your input.

Well, I spent another hour researching...and updated the howto.  I just added commands to enable the default blacklists as I didn't find any suggestions as to particular lists.

If you have any specific suggestions, they most welcome.

As for ASSP on SME 6.x, it is incorrect to write it off as being obsoleted by RBLs.  Believe me when I tell you that if SA was not already included in SME 7, I'd be installing ASSP.  And I have literally dozens of installations in the field (SME and Windows) to base my opinion on.  ASSP is small, fast, elegant, easy to configure and uses many advanced techniques to fight spam.  It is said that the technology in ASSP obsoleted RBLs...and the results are easy to see.

I look forward to gaining more experience with SA so that I'll have a more balanced perspective...  ;-)

Thanks again and more (specific) comments/changes/fixes to this howto are most welcome, from anyone.

G

Offline idyll

  • ***
  • 113
  • +0/-0
RBLs
« Reply #6 on: May 25, 2006, 01:16:21 AM »
Do a search and you'll find these two topics....

Specifically why RBL are NOT enabled by default on the server? And, can they slow down mail delivery? For perspective they are useful for troubleshooting and to understand why the developers opted not to enable them by default.  I use RBLS and they kick ass.

Bayes "learning" is also disabled by default. If enabled, then the SA Bayes filtering needs to "see" or learn 200 SPAM and 200 HAM before it will try to learn new spam signatures. This is when SA filtering becomes really, really granular and effective.  I find this feature to be glossed on 50% of the SA installations I've seen. No clue why.

Brian Read posted a script Knuddi and I devised to "teach" SA about email which should have been tagged as spam. It's a simple, cron-driven PERL script. You have the user create a LearnAsSpam folder and they can drop this not-tagged email into the folder. It is collected on a cron and taught to the Bayes process. Brian is creating the awesome SPAM statistics contrib and the LearnAsSpam PERL scrip is in his contrib repository under his name.

You should use both of the scripts if you administer multiple machines or domains, etc.

regards,

patrick
...

Offline sonoracomm

  • *
  • 208
  • +0/-0
    • http://www.sonoracomm.com
Re: RBLs
« Reply #7 on: May 25, 2006, 02:52:22 AM »
Hi Patrick,

Thanks much for the input.  I'd like to bounce a few comments off of you and Ray for your thoughts.

When I first ventured out as a spam fighter, I used RBLs exclusively.  I found back then that they stopped about 57% of the junk. and that seems to match Ray's more contemporary statistics.  Not bad results for a truly minimal effort..  Then I started using ASSP which implemented RBLs...then it grew out of them after a while.  As long as the RBLs are up to date and well-maintained, they provide a great value...but I'm no longer familiar with the reliability of individual RBLs and I'd appreciate the recommendations of folks who are in order to improve the howto..

As for performance, I've noted much greater hits when spam was allowed in.  Also, as we are in the business, we routinely replace our customers' servers at three years of age (unless they are beefy, industrial SCSI RAID5 boxes, which we replace at 3-4 years of age).  This means that horsepower has not been/is not a real problem for me. I'll opt for the RBLs, unless there is a problem.

Bayesian "learning" filtering has always been part of my ASSP installations.  I always used the sample spam tar file to prepopulate the spam/notspam databases so ASSP took off running.  Maybe we should use the same technique with SA? Here's a URL to the sample database, if you're interested.

http://easynews.dl.sourceforge.net/sourceforge/assp/asspsmpl-0.1.tgz

I'm mainly looking for a way to add reasonably reliable spam filtering with as little maintenance overhead as possible.  I want to set it and forget it...unless the customer complains and is willing to pay for more of my time to fix the problem.  

Would simply turning on the Bayesian filtering work to that end?  I realize it won't be optimized for some time, but my customers seem OK with that concept...as long as the bulk of the junk gets knocked down immediately.   Should I change my recommendations in the howto to allow _less_ (or no) spam to be rejected in order to populate the Bayesian database?

As far as reporting spam/ham, I have found that it is the rare user who cares about this function.  They just delete the spam and move on.  Also, if the filter is working properly, it's a rare message that gets through as a false-negative (based on my experience with ASSP).  In this context, I would question the value of time spent setting up the LearnAsSpam functionality (and training users to use it).  Perhaps it's different for SA...

Thanks again to both you and Ray for your insights.

G

Offline raem

  • *
  • 3,972
  • +4/-0
Re: RBLs
« Reply #8 on: May 25, 2006, 03:09:08 AM »
sonoracomm & idyll

A contribs forum search on spamhaus found this recent post

http://forums.contribs.org/index.php?topic=32054.0

which lists the RBL's I use.

sbl-xbl.spamhaus.org
relays.ordb.org
dnsbl.njabl.org
whois.rfc-ignorant.org


Do either/both of you use pattern matching executable content rejection (a contrib for sme 6 & default for sme7) as part of your spam & virus defence system.

It's like saying RBL knocks the stuffing out of them and pattern matching finishes them off.
About the only thing getting through to my servers are html viruses and these are caught by clamav. A handful of spam messages are filtered to the junkmail folders and 99% or more of these are correctly identified. False positives are infrequent.
...

Offline raem

  • *
  • 3,972
  • +4/-0
Spam Filter
« Reply #9 on: May 25, 2006, 03:17:29 AM »
...

Offline idyll

  • ***
  • 113
  • +0/-0
reply
« Reply #10 on: May 25, 2006, 03:32:45 AM »
I use the default RBLs. The only RBL I would warn about would be SPAMCOP. They are really aggressive and I just prefer slightly less arbitrary email nuking  ;-)

I have noted almost zero pattern matched rejections. I believe I have seen four. I have them all enabled except for ZIP2, which is useful to allow to pass. All of the others are enabled. I rearely see virus, maybe one a month.

Ray - as I am using Brian Read's contrib to gather statistics, do you know the command syntax to specifically read pattern-matched rejections? I may be rejecting more but I have no clue how to gather that data manually as I rely upon Brian's statistic contrib.

In my case, the need to train Bayes is important as the spammers are continually tweaking their email to slip by my filters. I can share examples of this. Sonoracomm - the database you allude to is not going to plug into SA. But thanks for the creative thought!

My SA settings are custom, and rather harsh. I consider 4 to be the low edge and twelve the high edge. I incur zero errant deletions, but I still see perhaps 10 or less SPAM get by per week overall. These I drop into the folder I mentioned, and in a very short time these are learned and not seen again. It works great for the newer, more sophisticated spam email.

Have you seen the new type which are generally drug or "enhancement" related, which have a link and then a series of nonsense words? These can get by, very often, unless the SA system is trained, if they actually luck onto your literal email name. If they don't add up enough tokens, there is nothing to stop them. So they must be learned.

At least, this has been my experience.

I also have virtually no false positives.

Overall, I am /extremely/ pleased with the SME7 filtering capabilities. I don't believe it can get any better in any practical sense. Maybe I just like fiddling, but the auto-learn aspect, to me, is a small tweak which pays a nice dividend as well.

I previously used ASSP and found it to be very difficult to manage with a user base of diverse recipients. That may well have been just my inability to grok the concept. I also had serious CPU redlining, runaway race conditions which the developer never seemed willing to assist troubleshooting. In fact, the lack of developer support was my biggest beef. I found the overall Spam Assassin community to be not only vast, but far more interested in helping. Again, this is just my experience and I am pleased you have made it work to your satisfaction.

How many times has a great product been less than successful because of issues other than the product, per say? Many times, he says.....

Thanks for assembling the material you have assembled, I am SURE this will be read by hundreds of new users, to the mutual benefit of all.

regards,

patrick
...

Offline idyll

  • ***
  • 113
  • +0/-0
editorial
« Reply #11 on: May 25, 2006, 03:42:52 AM »
Here's my editorial take....

the whole issue of SPAM detection, etc. is rapidly approaching the levels associated with deep inspection on firewalls.

Our commercial installations (multinational) require almost universal "reject all, accept the following exceptions" in order to preclude denial or service and the raft of other automated attacks.

Thus in my world, aggressive filtering and equally aggressive whitelisting is a sane appoach to existing in the hostility of the net. It is not grossly intensive to maintain but it sure as heck is not set and forget.

regards,

patrick
...

Offline sonoracomm

  • *
  • 208
  • +0/-0
    • http://www.sonoracomm.com
Spam Filter
« Reply #12 on: May 25, 2006, 04:24:12 PM »
Thank you both again!

I made some changes to the howto based on both of your comments.

I also decided to keep this a simple (quickie) howto and point foks at other resources if they want to do more.  I decided to draw the line at enabling the Byesian filter as things went beyond 'quickie' at that point.  I'd include the Bayesian filtering if someone suggested some text for this document that made it look easy.  I made the cutoff when it appeared you have to enable console access for each user...  I know the RemoteUserAccess contrib does this easily, but...

Could you please check the howto one more time for accuracy?  I would sure appreciate it!

http://www.sonoracomm.com/index.php?option=com_content&task=view&id=49&Itemid=32

Thanks again for all your time and trouble,

G

p.s.  Updated howto 6/16/06 to include Bayesian filtering.

Offline azche24

  • *
  • 163
  • +0/-0
    • http://az-law.de
Thanks
« Reply #13 on: May 28, 2006, 03:48:28 PM »
Hi Sonoracomm and Ray Mitchell,

i would like to say thanks:

Information on Spamfilters in SME 7 was pretty wide-spread and is now bundled nicely in your howtos.

I do not know the reason, why RBL is not ON in SME7 by default. But it should be possible even for linux newbies to implement that from your howtos.
Alexander Ziemann, Berlin - DE

Offline raem

  • *
  • 3,972
  • +4/-0
Re: reply
« Reply #14 on: May 29, 2006, 01:41:19 AM »
idyll

>..... do you know the command syntax to specifically read pattern-matched rejections?

Apply these to sme 7 server.

On a sme 6.0 server the (smtpfront-qmail/current) log file entry looks like this:

2006-05-28 18:12:41.622887500 smtpfront-qmail[2580]: 554 We don't accept email with executable content TVqQAAMAA - TVqQAAMAA (#5.3.4)


Here is an entry for the "badhelo" rejection in mailfront

2006-05-29 08:17:52.198154500 smtpfront-qmail[24711]: 553 Sorry, I don't believe that is who you are.


Here is an entry for the invalid recipient rejection from the dungog-mailblocking contrib, default behaviour now in sme 7.

2006-05-29 08:51:16.217505500 smtpfront-qmail[25639]: 553 Sorry, that is an invalid e-mail address
...