Koozali.org: home of the SME Server

BayesFiltering for SpamAssasin

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #30 on: October 28, 2006, 04:47:14 PM »
I just used d_gerst's script and had several problems. I thought I’d share my solutions in the hopes of helping others, and to get feedback.

d_gerst seems to use different locations for the learning folders than what I believe to be are the standard locations. That’s not a big deal, as long as you account for it when tracking down problems and making changes...

I found that there's another way to enable bayes filtering using a Michael Weinberger (michaelw) contrib. As outlined here:  http://forums.contribs.org/index.php?topic=33824.0  Anyway, now that it’s installed, I’m going to stick with the d_gerst script, and hopefully it will all work after fixing the minor problems...

I downloaded and ran http://www.gerst.no-ip.com/SME7/spamassassin/install_sa-learn.sh

1) The folders hamlearning and junkmailmissed weren’t created, but no big deal, you run this script by hand to create them:
sh /etc/mail/spamassassin/bayes_filter.sh
If you want to wait, the next time cron runs they will be created. Log back in to webmail (horde) to see the changes.
   
2) On my SME server, running ‘config show spamassassin’ showed UseBayes set to 0. I believe the script has a bug. It expands the template for /etc/mail/spamassassin/local.cf and then sets UseBayes 1. I think this should be done the other way around. You can correct this by simply running ‘expand-template /etc/mail/spamassassin/local.cf’. (don’t edit the local.cf directly) ‘config show spamassassin’ should now show UseBayes set to 1

3) the log file /var/log/spamd/current showed errors when new mail arrived (scroll to the bottom of the log):

@4000000045423f1e3a2ff53c [8059] warn: bayes: cannot open bayes databases /etc/mail/spamassassin/bayes_* R/O: tie failed: Permission denied
@4000000045423f1e3aa2008c [8059] info: spamd: checking message <1644134838.20061027131629@greatbigsuccess.com> for qpsmtpd:1005
@4000000045423f1e3ac96e4c [8059] warn: bayes: cannot open bayes databases /etc/mail/spamassassin/bayes_* R/O: tie failed: Permission denied
@4000000045423f1f1ef4cc0c [8059] warn: bayes: cannot open bayes databases /etc/mail/spamassassin/bayes_* R/O: tie failed: Permission denied
@4000000045423f251a3ac66c [8059] warn: bayes: cannot open bayes databases /etc/mail/spamassassin/bayes_* R/W: tie failed: Permission denied

I had to borrow from mmccarn’s post to correct this:

chown spamd.spamd /etc/mail/spamassassin/bayes_*
chmod 750 /etc/mail/spamassassin/bayes_*
chown spamd.spamd /etc/mail/spamassassin/bayes.mutex

Anyway, everything seems to be working now. Reports sent from cron are showing tokens being gathered by sa-learn. You can see the same thing by running:  sa-learn --dump magic

Using webmail, I just place emails marked with ***SPAM*** by spamassassin in the junkmail folder, and place any marked with ***SPAM*** that aren’t spam in the hamlearning folder, and any that are missed in the junkmailmissed folder.

Guess its going to take a while for enough samples to get processed for it to learn, but I’m willing to wait...

Offline jonic

  • *
  • 103
  • +1/-0
BayesFiltering for SpamAssasin
« Reply #31 on: October 29, 2006, 03:46:46 PM »
Quote
2) On my SME server, running ‘config show spamassassin’ showed UseBayes set to 0. I believe the script has a bug. It expands the template for /etc/mail/spamassassin/local.cf and then sets UseBayes 1. I think this should be done the other way around. You can correct this by simply running ‘expand-template /etc/mail/spamassassin/local.cf’. (don’t edit the local.cf directly) ‘config show spamassassin’ should now show UseBayes set to 1


I just installed the script myself, and  ‘config show spamassassin’ showed UseBayes set to 1. However when I manually ran '/etc/mail/spamassassin/bayes_filter.sh' I got an error saying 'ERROR: configuration specifies 'use_bayes 0', sa-learn disabled'. I then ran 'expand-template /etc/mail/spamassassin/local.cf’, and got rid of the error.

I checked '/var/log/spamd/current' and the only error I've seen was :
'warn: pyzor: check failed: internal error', and I really don't know what to do about it.

Offline okepc

  • ***
  • 118
  • +0/-0
    • http://www.okepc.nl
BayesFiltering for SpamAssasin
« Reply #32 on: October 31, 2006, 05:02:27 PM »
I also have the error: pyzor: check failed: internal error'
I think it has something to do with:
http://www.nabble.com/Sporadic-pyzor-errors-t313050.html
Gonna test it soon...

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #33 on: October 31, 2006, 05:10:58 PM »
Ive seen that error on mine too - but never gotten it enough times to be concerned...

djhyper

BayesFiltering for SpamAssasin
« Reply #34 on: November 01, 2006, 09:29:35 PM »
I tried those patches but i'm still getting those pyzor errors.

Offline jvels

  • ***
  • 130
  • +0/-0
    • http://vels.dk
BayesFiltering for SpamAssasin
« Reply #35 on: November 02, 2006, 11:10:12 AM »
Some one there have the script? The link is broken :(

Does the script work or?

Offline okepc

  • ***
  • 118
  • +0/-0
    • http://www.okepc.nl
BayesFiltering for SpamAssasin
« Reply #36 on: November 02, 2006, 01:12:38 PM »
Only one of the patches were applied.
The rest is probably intergrated already.
But i solved the problem!
When i did pyzor ping....
It said something like server unavailable.
Searching on sourceforge pyzor it learned me that the server configured with pyzor discover is down!
A working server is 82.94.255.100:24441
Put that in your /.pyzor/servers file (im my case there where 2 one in /root and one in a other location which i cant't remember)
Pyzor ping says 200 ok
Spamassassin -D --lint says pyzor is working correctly
p.s you could get an occasionally timeout on pyzor cause the server is high loaded.

Dirk

Offline okepc

  • ***
  • 118
  • +0/-0
    • http://www.okepc.nl
BayesFiltering for SpamAssasin
« Reply #37 on: November 02, 2006, 01:13:47 PM »

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #38 on: November 02, 2006, 06:34:34 PM »
d_gerst's website www.gerst.no-ip.com is gone. I could gather all the files and scripts and zip them, but since I didnt write them, I'm not sure if I should.

You might want to try the Michael Weinberger smeserver-spamassassin-features-0.0.2-0.noarch.rpm contrib instead. Search these forums for help using it...

http://mirror.contribs.org/smeserver/contribs/michaelw/sme7/

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #39 on: November 13, 2006, 05:51:50 PM »
I modified Daniel Gerst's /etc/mail/spamassassin/bayes_filter.sh script.

The original was only gathering ham tokens from the hamlearning folder, which is where you're supposed to place good mail that was marked as spam (false positives).

I'm the only user on my SME mail server, so the tokens for ham weren't adding up very quickly. Spamassassin rarely marks good mail as spam (false positives) for me. Bayes filtering doesn't even start working until it gathers over 200 tokens for ham, and I was stuck at 100.

So, I started reading up on bayes filtering. Unless I got it wrong, ham can also include good mail, such as the mail you keep in your inbox.

I modified the script to include my Inbox by adding these two lines in the '# Start Script' section:

ham='/home/e-smith/files/users/'$user'/Maildir'
CollectSpam $user "Ham" $ham $learn_ham

Place those lines just above these lines:

ham='/home/e-smith/files/users/'$user'/Maildir/.hamlearning'
CollectSpam $user "Ham" $ham $learn_ham

I keep my inbox free of spam, but since the scripts run late at night, theres a chance that a few missed spam will sneak in during the night, and sa-learn will learn them as ham.

But, according to these docs, as long as the missed spam gets moved to the junkmailmissed folder, spamassassin will correct it automatically the next time the scripts run:

http://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html

Quote: "If you have previously learnt any of the messages as ham, SpamAssassin will forget them first, then re-learn them as spam."

I'd appreciated anyone who knows better letting me know if Im wrong about anything I've written.

Also, there's mention of passing the tokens to checksum services such as dcc, pyzor, or razor, by using the 'spamassassin –r' option. Mentioned on this webpage under the 'Training plus reporting' section:

http://wiki.apache.org/spamassassin/BayesInSpamAssassin
 
I don't believe this is necessary. I believe that since razor and pyzor are already running and checking all incoming mail, they are already learning. Or am I wrong?

Offline devtay

  • *
  • 145
  • +0/-0
Problems with Spamfilter
« Reply #40 on: November 15, 2006, 08:59:39 PM »
At the risk of being told to search and read the forums, I have an issue with the spamfilter-stats-7.pl script. :D

First, I have installed michealw's rpm for spamassassin features to setup my bayes learning.

Second, I have completed the Bayesian portion of sonoracom's how to.

Third, I can run (and it seems to work) the LearnAsSpam.pl script.

Here is my problem:
perl /usr/bin/spamfilter-stats-7.pl
Use of uninitialized value in string lt at /usr/bin/spamfilter-stats-7.pl line 222, <> line 1.

After typing the execution line, you have to hit enter to get the error. Further, execution halts again until you hit enter again. This continues for the forseeable future
 :D until you break out of the script.

Here is what I have checked so far:
[root@mail ~]# ls -l /var/log/qpsmtpd/*
-rwxr--r--  1 smelog smelog 4998001 Nov 14 02:36 /var/log/qpsmtpd/@400000004559800229646d8c.s
-rwxr--r--  1 smelog smelog 4998345 Nov 14 04:04 /var/log/qpsmtpd/@400000004559949d39a6b484.s
-rwxr--r--  1 smelog smelog 4998111 Nov 14 05:13 /var/log/qpsmtpd/@400000004559a4dc0d02297c.s
-rwxr--r--  1 smelog smelog 4998309 Nov 14 06:46 /var/log/qpsmtpd/@400000004559baa51b6cb2d4.s
-rwxr--r--  1 smelog smelog 4998301 Nov 14 07:43 /var/log/qpsmtpd/@400000004559c82039a6acb4.s
-rwxr--r--  1 smelog smelog 4998113 Nov 14 08:52 /var/log/qpsmtpd/@400000004559d83d2ce2c6ac.s
-rwxr--r--  1 smelog smelog 4998042 Nov 14 10:01 /var/log/qpsmtpd/@400000004559e8521b5d265c.s
-rwxr--r--  1 smelog smelog 4998287 Nov 14 11:34 /var/log/qpsmtpd/@400000004559fe4b1fccb964.s
-rwxr--r--  1 smelog smelog 4998075 Nov 14 13:05 /var/log/qpsmtpd/@40000000455a139c1884255c.s
-rw-r--r--  1 smelog smelog  119394 Nov 14 13:09 /var/log/qpsmtpd/current
-rw-------  1 smelog smelog       0 Aug 18 08:35 /var/log/qpsmtpd/lock
-rw-r--r--  1 smelog smelog       0 Aug 18 08:35 /var/log/qpsmtpd/state

[root@mail ~]# config show qpsmtpd
qpsmtpd=service
    Bcc=enabled
    BccMode=cc
    BccUser=logger
    DNSBL=enabled
    LogLevel=8
    MaxScannerSize=25000000
    RBLList=sbl-xbl.spamhaus.org,whois.rfc-ignorant.org,dnsbl.njabl.org,relays.ordb.org
    RHSBL=enabled
    RequireResolvableFromHost=no
    SBLList=dsn.rfc-ignorant.org
    access=public
    status=enabled
[root@mail ~]# config show spamassassin
spamassassin=service
    DNSAvailable=yes
    MessageRetentionTime=7
    OkLanguages=all
    OkLocales=all
    RejectLevel=12
    ReportSafe=0
    Sensitivity=custom
    SkipRBLChecks=0
    SortSpam=enabled
    Subject=[SPAM]
    SubjectTag=enabled
    TagLevel=5
    UseBayes=1
    status=enabled

I am running SME7 with SA 3.1.6. I am also using an SSH connection into the server as root. I have tried running this same script as a user, but access restrictions won't let me (from what I read you can now run it as root anyways). I did a chmod +x and chmod 777 just for good measure and no change.

Any help would be appreciated. I have been trying to get this stats script to run for sometime now with no joy.

Later,
Dev
You can't stop what's coming. It ain't all waiting on you.

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #41 on: November 15, 2006, 10:58:43 PM »
I use the sa-update script and spamfilter-stats-7.pl, but not LearnAsSpam.

Spamfilter-stats-7.pl runs without bayes filtering. And it runs without installing anything else, although I guess you should have at least DNSBL enabled.

My server has all the current updates, but my SA version is only 3.1.3. How did you manage 3.1.6?

The report generated by spamfilter-stats-7.pl shows:

/usr/bin/spamfilter-stats-7.pl Version : 0.5.1
Clam Version : ClamAV 0.88.6/2195/Tue Nov 14 12:53:04 2006
SpamAssassin Version : SpamAssassin version 3.1.3
running on Perl version 5.8.5

Here's my other settings:

qpsmtpd=service
    Bcc=disabled
    BccMode=cc
    BccUser=maillog
    DNSBL=enabled
    LogLevel=8
    MaxScannerSize=25000000
    RBLList=sbl-xbl.spamhaus.org,relays.ordb.org   (the other rbl lists can be a little too aggressive for me, and the more checks you perform, the slower)
    RHSBL=enabled
    RequireResolvableFromHost=no
    SBLList=dsn.rfc-ignorant.org
    access=public
    status=enabled

spamassassin=service
    DNSAvailable=yes
    MessageRetentionTime=15
    OkLanguages=all
    OkLocales=all
    RejectLevel=20
    ReportSafe=0
    Sensitivity=custom
    SkipRBLChecks=0
    SortSpam=enabled
    Subject=[SPAM]
    SubjectTag=enabled
    TagLevel=8
    UseBayes=1
    bayes_auto_learn=1
    bayes_auto_learn_threshold_nonspam=0.1
    bayes_auto_learn_threshold_spam=12.0
    required_score=5
    status=enabled
    use_auto_whitelist=0

Offline devtay

  • *
  • 145
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #42 on: November 16, 2006, 03:21:54 AM »
Quote from: "compdoc"


My server has all the current updates, but my SA version is only 3.1.3. How did you manage 3.1.3.




I am also at 3.1.3, I was reading on the spamassassin wiki before this post and it stuck in my head. Sorry. It looks like our settings are pretty close. I remember seeing the rpm change the autolearn properties, but they are not showing up in my config show statements.

So from what you are saying there are actually two separate options to control spam.

1. the sa-learn stuff from Michealw's RPM
2. the sonoracom howto stuff on their website using learnasspam

I am new to SME and have found there is alot of information on this site. Sometimes there is too much for me to accuratley comprehend. As such, I went the "easy way" and tried to go by the sonoracom stuff. I found that I had to enable Bayes because it was not enabled in the how to (that was not fun - the databases are not very user-friendly).

Anyways, I like the sonora stuff, but the only way to change spam to ham is to whitelist it. I would rather have a solution that works the same as the LearnAsSpam script except call it LearnAsHam. My Linux scripting skills are pretty basic so it will be some time before I can write one.
You can't stop what's coming. It ain't all waiting on you.

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #43 on: November 16, 2006, 04:49:53 AM »
From what I can tell from the source code, michealw's RPM only does a couple of things. Mainly, it runs these commands:

config setprop spamassassin UseBayes 1
config setprop spamassassin BayesAutoLearnThresholdSpam 4.00
config setprop spamassassin BayesAutoLearnThresholdNonspam 0.10
config setprop qpsmtpd DNSBL enabled
config setprop qpsmtpd RHSBL enabled

Then it sets some permissions, and not a lot else.

Spamassassin is already installed in SME, and the last two commands enabling DNSBL & RHSBL are really all you need to do to get the RBL lists and Spamassassin working.

Bayes filtering takes it a step further and helps fine-tune based on the spam you get, but you dont really need it. And michealw's RPM doesnt install the scripts needed to run sa-learn, which gathers the bayes data from users. At least I dont think it does...

Be sure to watch the logs for errors, and to make sure your server isnt bouncing good mail. Open the file /var/log/spamd/current  - thats the log that shows each incoming message, and what happened to it. (scroll to the bottom of the file) It also shows the value assigned to the message. Like:

result: Y 38

In the server-manager, you enable Spam filtering, and set Spam sensitivity to Custom. The rejection level will reject any mail above the number you set. Mine's at 20, so a message of 38 (like the one above) will never make it to a mailbox.

As for Spamfilter-stats-7.pl, did you follow the instructions here:

http://mirror.contribs.org/smeserver/contribs/bread/mailstats/install_howto.txt

Offline jvels

  • ***
  • 130
  • +0/-0
    • http://vels.dk
BayesFiltering for SpamAssasin
« Reply #44 on: November 16, 2006, 09:14:21 AM »
OKay, so is no reason to install the rpm?

If I only need to run:
Code: [Select]

config setprop qpsmtpd DNSBL enabled
config setprop qpsmtpd RHSBL enabled


Or?