Koozali.org: home of the SME Server

BayesFiltering for SpamAssasin

d_gerst

BayesFiltering for SpamAssasin
« on: August 18, 2006, 10:16:35 PM »
Hello,

Are there some interests on a installationscript for activation bayesfiltering with spamassasin?

Swert Knudsen doesn't activate this feature in his contribs.

What's that?
So you could train your spamassasine to missed spam and junkmails. Every user could do this on the SME6/7 server, all you have todo is to move the missed spammail into a special folder. SpamAssasin will be trained automatically with this mails, to identify this kind of used pattern next time when receiving mails from the internet.

If so, just let me know and I'll put it online.

Best regards,
Daniel

Offline mercyh

  • *
  • 824
  • +0/-0
    • http://mercyh.org
BayesFiltering for SpamAssasin
« Reply #1 on: August 18, 2006, 10:52:46 PM »
d_gerst,

There has been a lengthy off topic thread going on this. I would say that there is definitely interest.

Check out the thread here.

http://forums.contribs.org/index.php?topic=32158.0

Royce H.

d_gerst

BayesFiltering for SpamAssasin
« Reply #2 on: August 19, 2006, 12:11:13 PM »
Hello,

Ok, I'll write tomorrow an installationscript, cause I have to test it on SME7 (actually I only test it on SME6, but it should be the same),  but without GUI.

Supported Features will be:
- Automatically create new Folder "JunkMailMissed" for each User, which is also be used to train Neuronal Networks (BayesFilter)
- Automatically train Spamassasine with SA-Learn (JunkMail and JunkMailMissed are used to train Spamassasin)
- CronJob for autolearning.

Best regards,
Daniel

d_gerst

BayesFiltering for SpamAssasin
« Reply #3 on: August 20, 2006, 05:53:41 PM »
Hello,

as i promised here it is, a installation script for spamassassines BayesFilter on SME7.

Features:
- Every SME-User will get new folders junkmailmissed and hamlearning
- Folders junkmail, junkmailmissed and hamlearning will be used to train Spamassassin BayesFilter
- all modification to run BayesFilter will be done by installations script
- Learning will be driven by a cronjob, which executes a learning script
- report of learning will be sent to admin account

Installation:
wget http://www.gerst.no-ip.com/SME7/spamassassin/install_sa-learn.sh
sh install_sa-learn.sh

Please test it an report bugs or success.

Best regards,
Daniel

Offline cool34000

  • *
  • 339
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #4 on: August 21, 2006, 01:56:25 AM »
Thx for the rpm...

I've installed it with no errors and all seemed to run fine.

Nevertheless, i can't find the folders witch were supposed to be created...
Where are they ?

Do I have to use IMAP instead on POP to see them ?
Tried anyway on the webmail but it's the same : no folders but junkmail...

Thx

d_gerst

BayesFiltering for SpamAssasin
« Reply #5 on: August 21, 2006, 06:47:34 AM »
Hello cool34000,

Yes you have to use IMAP instead of POP3

Best regrads,
Daniel

d_gerst

BayesFiltering for SpamAssasin
« Reply #6 on: August 21, 2006, 07:38:13 AM »
Hello,

Please reinstall the script cause I found a little bug in the bayes script. I would apologize me for this.

Best regards,
Daniel

Offline Mace

  • **
  • 65
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #7 on: August 21, 2006, 11:35:21 AM »
Thank you! This is a great contrib.

Regards,
Sterling

Offline cool34000

  • *
  • 339
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #8 on: August 21, 2006, 01:50:24 PM »
Thx for your reply, i'll try with IMAP after work and let you know if it worked...
No one is complaining, so I guess it should work for me too !

Offline mercyh

  • *
  • 824
  • +0/-0
    • http://mercyh.org
BayesFiltering for SpamAssasin
« Reply #9 on: August 21, 2006, 05:45:16 PM »
Cool34000,

Quote
Do I have to use IMAP instead on POP to see them ?


You will need to use IMAP or webmail to move any missed spam to the junkmailmissed folder. (This cannot be done with POP as far as I know.) So you will need to setup your users with IMAP accounts to make it work. On the users I have that normally use POP and are not the type that are easily trained to change  :roll: , I have setup another e-mail account with IMAP and only set the junkmailmissed folder to show. The user then has a new tree in their e-mail client that only contains that folder. They can then drop any mail that they want learned as spam into that folder.

d_gerst

BayesFiltering for SpamAssasin
« Reply #10 on: August 21, 2006, 08:48:16 PM »
Hello,

I still miss the logging of spamd in /var/log/maillog, on a SME6 system logging works fine and you see what be done by spamd, but on a SME7 system I don't see the logging output.

On SME6 you have to chwown of bayes_db files to work correctly (i saw this on the logging output of maillog). So howto enable the logging for spamd to see if it is correctly working on SME7?

Best regards,
Daniel

Offline cool34000

  • *
  • 339
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #11 on: August 21, 2006, 10:47:00 PM »
Tried IMAP a looooooong time ago... The whole thing i remembered about it is that IT'S STRANGE !!! When i first saw it, i though that it was a future design... made in 1960 ! Remember old versions of lotus... See what i mean ?  :-D

Then tonight, i retryed it to meet the conf... And WOW !!!
Now i know it's more powerfull than POP !!! But still, it's a really strange way to show mails !

I could get used to it, but not my customers !!!
After 3 years of explaining them webmail and outlook, someone asked me to trash his (windows) recyle bin because someone (trying to mail him...) told him that his mailbox was full... Since a week...:hammer:


IMAP is too powerfull, better chance in getting them back to handwriting, problems were just about ink and papers :roll:

Offline cool34000

  • *
  • 339
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #12 on: August 21, 2006, 11:20:57 PM »
I forgot the essential :

Folders that you've told are still not created for me...

How wierd is that :
My SME is a french version, so horde is translated in french (think this could have a matter)

In the webmail, i see 3 folders :
- Boite de réception (that's the translation of inbox in french)
- Courrier indésirable (that's the translation of junkmail in french)
- junkmail (not junkmailmissed)

In Outlook, i see that :
Boite de réception (that's the translation of inbox in french)
Courrier indésirable (that's the translation of junkmail in french)

Why am i seeing only 2 folders in outlook and 3 in the webmail ?

I didn't have the junkmail folder (the one written in english) before i installed your script (tried also with the new one), so maybe your install assumes that junkmail's folder have to be named junkmail somewhere in your code ? Am i right ?

d_gerst

BayesFiltering for SpamAssasin
« Reply #13 on: August 22, 2006, 06:24:38 AM »
Hello,

Just use winscp or putty and have a look into user folders /home/e-smith/files/users/username/Maildir there must be the folders.

For your outlook you must set something like display folder, to show it.

Best regards,
Daniel

Offline azche24

  • *
  • 163
  • +0/-0
    • http://az-law.de
BayesFiltering for SpamAssasin
« Reply #14 on: August 22, 2006, 03:10:34 PM »
Hi,
Quote from: "d_gerst"
/home/e-smith/files/users/username/Maildir there must be the folders.


Great contrib! The folders are created at first run of sh_bayesfilter.sh ; so you have to wait for the next day or have to run script manually first.
Alexander Ziemann, Berlin - DE

d_gerst

BayesFiltering for SpamAssasin
« Reply #15 on: August 22, 2006, 05:19:39 PM »
Hello,

That correct, folder will be automatically created. My script will check if folder exists. I would apologize for not thinking about.

Danke schön für den netten Hinweis, mercyh. Hab ich garnicht im ersten Moment dran gedacht.

But did someone see logfiles that produced from spamd on SME7. I still missed the output how long it took to scan mails with bayes? On SME6 productivity it will be generated.

Best regards,
Daniel

d_gerst

BayesFiltering for SpamAssasin
« Reply #16 on: August 22, 2006, 06:19:45 PM »
Hello,

There's a little bug in the crontab file it should be 30 0 * * * *.

Please Reinstall the script or modify the file:
1.) /etc/cron.d/sa-bayes_learning
2.) service crond restart

Sorry!

Best regards,
Daniel

Offline cool34000

  • *
  • 339
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #17 on: August 22, 2006, 08:11:56 PM »
You were right azche24, I went to /etc/mail/spamassassin and launched manually the script bayes_filter.sh

Folders were created !!! As it's a test server, it was not online yesterday at 0:30, so folders couldn't be created anyway !!! Bad luck !!!
All is OK in the webmail now !

Thanks to you d_gerst, by default outlook don't show all dirs !
I went to my SME IMAP inbox in Outlook, there is an option in IMAP FOLDERS : a box that need to be unchecked. It's something about "only showing subscriptionned folders". I still don't know what are those types of folders, but yours aren't ! Something like newsletters maybe ? Whatever, this is not a topic about IMAP !

Finally, it works ! So thanks to everyone for helping and sharing ! :pint:
... Just hope one day I can help someone too :-?

Offline cool34000

  • *
  • 339
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #18 on: August 22, 2006, 10:18:33 PM »
I want to rename folders because their names aren't very explicit in french...

First of all, i want to be sure that i understand well...
junkmail is spam folder
junkmailmissed is where i put my false negative spam
hamlearning is where i put false positive spam
Is that right ?

Anyway, I edited bayes_filter.sh and found the these lines :
Code: [Select]
ham='/home/e-smith/files/users/'$user'/Maildir/.hamlearning'
Code: [Select]
spam='/home/e-smith/files/users/'$user'/Maildir/.junkmail'
Code: [Select]
missed_spam='/home/e-smith/files/users/'$user'/Maildir/.junkmailmissed'

I want the folders to be named respectivly :
.SPAM faux positif
.Courrier indésirable
.SPAM faux négatif

Really easy you gonna tell me... But how can I put spaces in folder names in the script ??? It considers everything else after my space as an arguments... What's the good syntax ?
I'm not trained with Linux syntax and it does not looks like m$ !!!

And is that possible ? I mean for example, if I use the "auto-sort junkmail to junkmail folder" in sme's server-manager, will it works ?

Offline Smitro

  • *
  • 348
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #19 on: August 24, 2006, 05:31:56 AM »
I was just about to try this out, then I thought... wouldn't it be good if we could compact this down a little. Make it easier for people with more than the average amount of folders.

Is it worthwile changing the layout of the folders mentioned to something like:
Code: [Select]

Junkmail ->
            - Junkmail
            - Spam Learning
            - Ham Learning

So one folder with 3 subfolders.  Or junk mail could be in the top folder and there could be 2 sub folders.

Just a suggestion... thought it might clean peoples folder up a little.

I don't get enough junk that I need to use these folders every time I login.

What do you think?
.........

Offline cool34000

  • *
  • 339
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #20 on: August 24, 2006, 01:38:44 PM »
That would be great !

Offline Smitro

  • *
  • 348
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #21 on: August 29, 2006, 02:28:37 PM »
d_gerst, your thoughts?

or, could you let us know how it's done.

Thanks.
.........

Offline azche24

  • *
  • 163
  • +0/-0
    • http://az-law.de
Auto-Delete Learned MissedSpam
« Reply #22 on: September 04, 2006, 08:12:59 AM »
Hi,

is there a way to delete the learned spam or ham in the folders created and used by this script?

This does not work here  :-(

And the "LearnAsSpam" script another user posted here could do that.
Alexander Ziemann, Berlin - DE

Offline cool34000

  • *
  • 339
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #23 on: September 05, 2006, 01:40:17 AM »
There is a cron to delete junkmail, don't know if it's activated by default, but maybe you could look there...

Offline azche24

  • *
  • 163
  • +0/-0
    • http://az-law.de
BayesFiltering for SpamAssasin
« Reply #24 on: September 05, 2006, 08:55:20 AM »
Hi,
Quote from: "cool34000"
There is a cron to delete junkmail

no:

Code: [Select]
/etc/cron.d/purge_junkmail

and it´s related script only cleans up the ../user/../junkmail folders.

I have to clean up my "junkmailmissed" and "hamlearning" folders. purge_junkmail does not do that.
Alexander Ziemann, Berlin - DE

XAPBob

BayesFiltering for SpamAssasin
« Reply #25 on: September 12, 2006, 11:21:06 PM »
Just a thought - it might be nice to move the files to *.bup instead of deleting them - and to kick the relevant script:
Quote
I have checked the firewall logs & smoothwall is definitely blocking the connect.

from the install script.

And I picked up:
Code: [Select]
Argument spam
netset: cannot include 192.168.1.25/32 as it has already been included
ERROR: configuration specifies 'use_bayes 0', sa-learn disabled

Learn missed spam...
No missed spam files available.

------------------------------------------------------------------------------

Show statistics...

ERROR: Bayes dump returned an error, please re-run with -D for more information
------------------------------------------------------------------------------


Which is peculiar - the script tries to set UseBayes, the second script then complains about use_bayes

Offline robwellesley

  • *
  • 92
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #26 on: October 03, 2006, 11:20:37 PM »
So here's an idea,

Is it possible then to create a user called learnspam and have other users forward spam to that user, and have the spamLearner check only that mailbox?

This would work better for pop3 users.

Same for Ham.

cron to delete mail nightly.

Rob

Offline brianr

  • *
  • 988
  • +2/-0
BayesFiltering for SpamAssasin
« Reply #27 on: October 04, 2006, 12:19:11 PM »
sounds possible, however when this has been discussed before, it has been pointed out that SA might "learn" than spam is email which is forwarded, i guess the underlying issue is whether the headers from the original email are preserved and also how the body is forwarded.  I understand that some (proprietary) email clients are a bit lax in this area!!
Brian j Read
(retired, for a second time, still got 2 installations though)
The instrument I am playing is my favourite Melodeon.
.........

Offline robwellesley

  • *
  • 92
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #28 on: October 04, 2006, 01:22:48 PM »
Yes,
I've since seen this...

http://spamassassin.apache.org/full/3.0.x/dist/doc/sa-learn.html#effective_training

...Point 4 - which is much better as folk are more likely to be motivated to forward incorrectly identified HAM as it assists them.

In my experience Most users are far to busy to engage in sorting SPAM/HAM into separate folders

Offline hardijs

  • ***
  • 77
  • +0/-0
junkmailmissed etc
« Reply #29 on: October 19, 2006, 08:36:15 AM »
I have a question - does the script "manage" the junkmailmissed (etc) messages - like erasing them or is it the responsibility of the one who put it in there?
I am gettin an error in the daily report to the admin:

Code: [Select]

Clean up spam messages from learning folders
/etc/mail/spamassassin/bayes_filter.sh: line 152: /bin/rm: Argument list too long


Is this filtering userlevel (ie each user will have its own spam training) or is it systemwide - ie one user trains - all users benefit?
So  the system to recognise messages from x to be spam whereas some other user receives and is expectin that as x  as a news source

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #30 on: October 28, 2006, 04:47:14 PM »
I just used d_gerst's script and had several problems. I thought I’d share my solutions in the hopes of helping others, and to get feedback.

d_gerst seems to use different locations for the learning folders than what I believe to be are the standard locations. That’s not a big deal, as long as you account for it when tracking down problems and making changes...

I found that there's another way to enable bayes filtering using a Michael Weinberger (michaelw) contrib. As outlined here:  http://forums.contribs.org/index.php?topic=33824.0  Anyway, now that it’s installed, I’m going to stick with the d_gerst script, and hopefully it will all work after fixing the minor problems...

I downloaded and ran http://www.gerst.no-ip.com/SME7/spamassassin/install_sa-learn.sh

1) The folders hamlearning and junkmailmissed weren’t created, but no big deal, you run this script by hand to create them:
sh /etc/mail/spamassassin/bayes_filter.sh
If you want to wait, the next time cron runs they will be created. Log back in to webmail (horde) to see the changes.
   
2) On my SME server, running ‘config show spamassassin’ showed UseBayes set to 0. I believe the script has a bug. It expands the template for /etc/mail/spamassassin/local.cf and then sets UseBayes 1. I think this should be done the other way around. You can correct this by simply running ‘expand-template /etc/mail/spamassassin/local.cf’. (don’t edit the local.cf directly) ‘config show spamassassin’ should now show UseBayes set to 1

3) the log file /var/log/spamd/current showed errors when new mail arrived (scroll to the bottom of the log):

@4000000045423f1e3a2ff53c [8059] warn: bayes: cannot open bayes databases /etc/mail/spamassassin/bayes_* R/O: tie failed: Permission denied
@4000000045423f1e3aa2008c [8059] info: spamd: checking message <1644134838.20061027131629@greatbigsuccess.com> for qpsmtpd:1005
@4000000045423f1e3ac96e4c [8059] warn: bayes: cannot open bayes databases /etc/mail/spamassassin/bayes_* R/O: tie failed: Permission denied
@4000000045423f1f1ef4cc0c [8059] warn: bayes: cannot open bayes databases /etc/mail/spamassassin/bayes_* R/O: tie failed: Permission denied
@4000000045423f251a3ac66c [8059] warn: bayes: cannot open bayes databases /etc/mail/spamassassin/bayes_* R/W: tie failed: Permission denied

I had to borrow from mmccarn’s post to correct this:

chown spamd.spamd /etc/mail/spamassassin/bayes_*
chmod 750 /etc/mail/spamassassin/bayes_*
chown spamd.spamd /etc/mail/spamassassin/bayes.mutex

Anyway, everything seems to be working now. Reports sent from cron are showing tokens being gathered by sa-learn. You can see the same thing by running:  sa-learn --dump magic

Using webmail, I just place emails marked with ***SPAM*** by spamassassin in the junkmail folder, and place any marked with ***SPAM*** that aren’t spam in the hamlearning folder, and any that are missed in the junkmailmissed folder.

Guess its going to take a while for enough samples to get processed for it to learn, but I’m willing to wait...

Offline jonic

  • *
  • 103
  • +1/-0
BayesFiltering for SpamAssasin
« Reply #31 on: October 29, 2006, 03:46:46 PM »
Quote
2) On my SME server, running ‘config show spamassassin’ showed UseBayes set to 0. I believe the script has a bug. It expands the template for /etc/mail/spamassassin/local.cf and then sets UseBayes 1. I think this should be done the other way around. You can correct this by simply running ‘expand-template /etc/mail/spamassassin/local.cf’. (don’t edit the local.cf directly) ‘config show spamassassin’ should now show UseBayes set to 1


I just installed the script myself, and  ‘config show spamassassin’ showed UseBayes set to 1. However when I manually ran '/etc/mail/spamassassin/bayes_filter.sh' I got an error saying 'ERROR: configuration specifies 'use_bayes 0', sa-learn disabled'. I then ran 'expand-template /etc/mail/spamassassin/local.cf’, and got rid of the error.

I checked '/var/log/spamd/current' and the only error I've seen was :
'warn: pyzor: check failed: internal error', and I really don't know what to do about it.

Offline okepc

  • ***
  • 118
  • +0/-0
    • http://www.okepc.nl
BayesFiltering for SpamAssasin
« Reply #32 on: October 31, 2006, 05:02:27 PM »
I also have the error: pyzor: check failed: internal error'
I think it has something to do with:
http://www.nabble.com/Sporadic-pyzor-errors-t313050.html
Gonna test it soon...

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #33 on: October 31, 2006, 05:10:58 PM »
Ive seen that error on mine too - but never gotten it enough times to be concerned...

djhyper

BayesFiltering for SpamAssasin
« Reply #34 on: November 01, 2006, 09:29:35 PM »
I tried those patches but i'm still getting those pyzor errors.

Offline jvels

  • ***
  • 130
  • +0/-0
    • http://vels.dk
BayesFiltering for SpamAssasin
« Reply #35 on: November 02, 2006, 11:10:12 AM »
Some one there have the script? The link is broken :(

Does the script work or?

Offline okepc

  • ***
  • 118
  • +0/-0
    • http://www.okepc.nl
BayesFiltering for SpamAssasin
« Reply #36 on: November 02, 2006, 01:12:38 PM »
Only one of the patches were applied.
The rest is probably intergrated already.
But i solved the problem!
When i did pyzor ping....
It said something like server unavailable.
Searching on sourceforge pyzor it learned me that the server configured with pyzor discover is down!
A working server is 82.94.255.100:24441
Put that in your /.pyzor/servers file (im my case there where 2 one in /root and one in a other location which i cant't remember)
Pyzor ping says 200 ok
Spamassassin -D --lint says pyzor is working correctly
p.s you could get an occasionally timeout on pyzor cause the server is high loaded.

Dirk

Offline okepc

  • ***
  • 118
  • +0/-0
    • http://www.okepc.nl
BayesFiltering for SpamAssasin
« Reply #37 on: November 02, 2006, 01:13:47 PM »

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #38 on: November 02, 2006, 06:34:34 PM »
d_gerst's website www.gerst.no-ip.com is gone. I could gather all the files and scripts and zip them, but since I didnt write them, I'm not sure if I should.

You might want to try the Michael Weinberger smeserver-spamassassin-features-0.0.2-0.noarch.rpm contrib instead. Search these forums for help using it...

http://mirror.contribs.org/smeserver/contribs/michaelw/sme7/

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #39 on: November 13, 2006, 05:51:50 PM »
I modified Daniel Gerst's /etc/mail/spamassassin/bayes_filter.sh script.

The original was only gathering ham tokens from the hamlearning folder, which is where you're supposed to place good mail that was marked as spam (false positives).

I'm the only user on my SME mail server, so the tokens for ham weren't adding up very quickly. Spamassassin rarely marks good mail as spam (false positives) for me. Bayes filtering doesn't even start working until it gathers over 200 tokens for ham, and I was stuck at 100.

So, I started reading up on bayes filtering. Unless I got it wrong, ham can also include good mail, such as the mail you keep in your inbox.

I modified the script to include my Inbox by adding these two lines in the '# Start Script' section:

ham='/home/e-smith/files/users/'$user'/Maildir'
CollectSpam $user "Ham" $ham $learn_ham

Place those lines just above these lines:

ham='/home/e-smith/files/users/'$user'/Maildir/.hamlearning'
CollectSpam $user "Ham" $ham $learn_ham

I keep my inbox free of spam, but since the scripts run late at night, theres a chance that a few missed spam will sneak in during the night, and sa-learn will learn them as ham.

But, according to these docs, as long as the missed spam gets moved to the junkmailmissed folder, spamassassin will correct it automatically the next time the scripts run:

http://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html

Quote: "If you have previously learnt any of the messages as ham, SpamAssassin will forget them first, then re-learn them as spam."

I'd appreciated anyone who knows better letting me know if Im wrong about anything I've written.

Also, there's mention of passing the tokens to checksum services such as dcc, pyzor, or razor, by using the 'spamassassin –r' option. Mentioned on this webpage under the 'Training plus reporting' section:

http://wiki.apache.org/spamassassin/BayesInSpamAssassin
 
I don't believe this is necessary. I believe that since razor and pyzor are already running and checking all incoming mail, they are already learning. Or am I wrong?

Offline devtay

  • *
  • 145
  • +0/-0
Problems with Spamfilter
« Reply #40 on: November 15, 2006, 08:59:39 PM »
At the risk of being told to search and read the forums, I have an issue with the spamfilter-stats-7.pl script. :D

First, I have installed michealw's rpm for spamassassin features to setup my bayes learning.

Second, I have completed the Bayesian portion of sonoracom's how to.

Third, I can run (and it seems to work) the LearnAsSpam.pl script.

Here is my problem:
perl /usr/bin/spamfilter-stats-7.pl
Use of uninitialized value in string lt at /usr/bin/spamfilter-stats-7.pl line 222, <> line 1.

After typing the execution line, you have to hit enter to get the error. Further, execution halts again until you hit enter again. This continues for the forseeable future
 :D until you break out of the script.

Here is what I have checked so far:
[root@mail ~]# ls -l /var/log/qpsmtpd/*
-rwxr--r--  1 smelog smelog 4998001 Nov 14 02:36 /var/log/qpsmtpd/@400000004559800229646d8c.s
-rwxr--r--  1 smelog smelog 4998345 Nov 14 04:04 /var/log/qpsmtpd/@400000004559949d39a6b484.s
-rwxr--r--  1 smelog smelog 4998111 Nov 14 05:13 /var/log/qpsmtpd/@400000004559a4dc0d02297c.s
-rwxr--r--  1 smelog smelog 4998309 Nov 14 06:46 /var/log/qpsmtpd/@400000004559baa51b6cb2d4.s
-rwxr--r--  1 smelog smelog 4998301 Nov 14 07:43 /var/log/qpsmtpd/@400000004559c82039a6acb4.s
-rwxr--r--  1 smelog smelog 4998113 Nov 14 08:52 /var/log/qpsmtpd/@400000004559d83d2ce2c6ac.s
-rwxr--r--  1 smelog smelog 4998042 Nov 14 10:01 /var/log/qpsmtpd/@400000004559e8521b5d265c.s
-rwxr--r--  1 smelog smelog 4998287 Nov 14 11:34 /var/log/qpsmtpd/@400000004559fe4b1fccb964.s
-rwxr--r--  1 smelog smelog 4998075 Nov 14 13:05 /var/log/qpsmtpd/@40000000455a139c1884255c.s
-rw-r--r--  1 smelog smelog  119394 Nov 14 13:09 /var/log/qpsmtpd/current
-rw-------  1 smelog smelog       0 Aug 18 08:35 /var/log/qpsmtpd/lock
-rw-r--r--  1 smelog smelog       0 Aug 18 08:35 /var/log/qpsmtpd/state

[root@mail ~]# config show qpsmtpd
qpsmtpd=service
    Bcc=enabled
    BccMode=cc
    BccUser=logger
    DNSBL=enabled
    LogLevel=8
    MaxScannerSize=25000000
    RBLList=sbl-xbl.spamhaus.org,whois.rfc-ignorant.org,dnsbl.njabl.org,relays.ordb.org
    RHSBL=enabled
    RequireResolvableFromHost=no
    SBLList=dsn.rfc-ignorant.org
    access=public
    status=enabled
[root@mail ~]# config show spamassassin
spamassassin=service
    DNSAvailable=yes
    MessageRetentionTime=7
    OkLanguages=all
    OkLocales=all
    RejectLevel=12
    ReportSafe=0
    Sensitivity=custom
    SkipRBLChecks=0
    SortSpam=enabled
    Subject=[SPAM]
    SubjectTag=enabled
    TagLevel=5
    UseBayes=1
    status=enabled

I am running SME7 with SA 3.1.6. I am also using an SSH connection into the server as root. I have tried running this same script as a user, but access restrictions won't let me (from what I read you can now run it as root anyways). I did a chmod +x and chmod 777 just for good measure and no change.

Any help would be appreciated. I have been trying to get this stats script to run for sometime now with no joy.

Later,
Dev
You can't stop what's coming. It ain't all waiting on you.

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #41 on: November 15, 2006, 10:58:43 PM »
I use the sa-update script and spamfilter-stats-7.pl, but not LearnAsSpam.

Spamfilter-stats-7.pl runs without bayes filtering. And it runs without installing anything else, although I guess you should have at least DNSBL enabled.

My server has all the current updates, but my SA version is only 3.1.3. How did you manage 3.1.6?

The report generated by spamfilter-stats-7.pl shows:

/usr/bin/spamfilter-stats-7.pl Version : 0.5.1
Clam Version : ClamAV 0.88.6/2195/Tue Nov 14 12:53:04 2006
SpamAssassin Version : SpamAssassin version 3.1.3
running on Perl version 5.8.5

Here's my other settings:

qpsmtpd=service
    Bcc=disabled
    BccMode=cc
    BccUser=maillog
    DNSBL=enabled
    LogLevel=8
    MaxScannerSize=25000000
    RBLList=sbl-xbl.spamhaus.org,relays.ordb.org   (the other rbl lists can be a little too aggressive for me, and the more checks you perform, the slower)
    RHSBL=enabled
    RequireResolvableFromHost=no
    SBLList=dsn.rfc-ignorant.org
    access=public
    status=enabled

spamassassin=service
    DNSAvailable=yes
    MessageRetentionTime=15
    OkLanguages=all
    OkLocales=all
    RejectLevel=20
    ReportSafe=0
    Sensitivity=custom
    SkipRBLChecks=0
    SortSpam=enabled
    Subject=[SPAM]
    SubjectTag=enabled
    TagLevel=8
    UseBayes=1
    bayes_auto_learn=1
    bayes_auto_learn_threshold_nonspam=0.1
    bayes_auto_learn_threshold_spam=12.0
    required_score=5
    status=enabled
    use_auto_whitelist=0

Offline devtay

  • *
  • 145
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #42 on: November 16, 2006, 03:21:54 AM »
Quote from: "compdoc"


My server has all the current updates, but my SA version is only 3.1.3. How did you manage 3.1.3.




I am also at 3.1.3, I was reading on the spamassassin wiki before this post and it stuck in my head. Sorry. It looks like our settings are pretty close. I remember seeing the rpm change the autolearn properties, but they are not showing up in my config show statements.

So from what you are saying there are actually two separate options to control spam.

1. the sa-learn stuff from Michealw's RPM
2. the sonoracom howto stuff on their website using learnasspam

I am new to SME and have found there is alot of information on this site. Sometimes there is too much for me to accuratley comprehend. As such, I went the "easy way" and tried to go by the sonoracom stuff. I found that I had to enable Bayes because it was not enabled in the how to (that was not fun - the databases are not very user-friendly).

Anyways, I like the sonora stuff, but the only way to change spam to ham is to whitelist it. I would rather have a solution that works the same as the LearnAsSpam script except call it LearnAsHam. My Linux scripting skills are pretty basic so it will be some time before I can write one.
You can't stop what's coming. It ain't all waiting on you.

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #43 on: November 16, 2006, 04:49:53 AM »
From what I can tell from the source code, michealw's RPM only does a couple of things. Mainly, it runs these commands:

config setprop spamassassin UseBayes 1
config setprop spamassassin BayesAutoLearnThresholdSpam 4.00
config setprop spamassassin BayesAutoLearnThresholdNonspam 0.10
config setprop qpsmtpd DNSBL enabled
config setprop qpsmtpd RHSBL enabled

Then it sets some permissions, and not a lot else.

Spamassassin is already installed in SME, and the last two commands enabling DNSBL & RHSBL are really all you need to do to get the RBL lists and Spamassassin working.

Bayes filtering takes it a step further and helps fine-tune based on the spam you get, but you dont really need it. And michealw's RPM doesnt install the scripts needed to run sa-learn, which gathers the bayes data from users. At least I dont think it does...

Be sure to watch the logs for errors, and to make sure your server isnt bouncing good mail. Open the file /var/log/spamd/current  - thats the log that shows each incoming message, and what happened to it. (scroll to the bottom of the file) It also shows the value assigned to the message. Like:

result: Y 38

In the server-manager, you enable Spam filtering, and set Spam sensitivity to Custom. The rejection level will reject any mail above the number you set. Mine's at 20, so a message of 38 (like the one above) will never make it to a mailbox.

As for Spamfilter-stats-7.pl, did you follow the instructions here:

http://mirror.contribs.org/smeserver/contribs/bread/mailstats/install_howto.txt

Offline jvels

  • ***
  • 130
  • +0/-0
    • http://vels.dk
BayesFiltering for SpamAssasin
« Reply #44 on: November 16, 2006, 09:14:21 AM »
OKay, so is no reason to install the rpm?

If I only need to run:
Code: [Select]

config setprop qpsmtpd DNSBL enabled
config setprop qpsmtpd RHSBL enabled


Or?

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #45 on: November 16, 2006, 02:04:26 PM »
yes. These commands are helpful:

config show qpsmtpd
config setprop qpsmtpd DNSBL enabled
config setprop qpsmtpd RHSBL enabled
#to change the DNSBL lists used:
config setprop qpsmtpd RBLList sbl-xbl.spamhaus.org,relays.ordb.org
#after setting any properties, you must run:
signal-event email-update
svc -t /service/qpsmtpd

Dont change the SBLList list used by RHSBL (the one server listed is the only one). I've read the RHSBL isnt as fast or as useful as DNSBL, but you decide...

There's also a white/black list you can modify to specifically allow/block domains, but it can be a lot of work to maintain. You must use caps for the words White and Black:

#adds to list:
db spamassassin setprop wbl.global *informit.com White *gfi.com White

#replaces list:
db spamassassin set wbl.global list *informit.com White *gfi.com White *800-flowers.net White *heartdetectives.com Black

#after making changes you must:
expand-template /etc/mail/spamassassin/local.cf
svc -t /service/spamd

#also handy:
db spamassassin show
config show spamassassin

Offline devtay

  • *
  • 145
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #46 on: November 17, 2006, 05:27:21 PM »
Quote from: "compdoc"

As for Spamfilter-stats-7.pl, did you follow the instructions here:

http://mirror.contribs.org/smeserver/contribs/bread/mailstats/install_howto.txt


Yes, I did follow those instructions. Actually, I used the instructions at http://www.sonoracomm.com/index.php?option=com_content&task=view&id=49&Itemid=32  but they are for the same thing. I had to add to the instructions by chmod'ing both of the perl scrips as well.

I thought I may have missed something, so I used the link you posted and started over. First, I rm'd the files that were originally installed. Second, I checked for my bayes filtering to be enabled and the thresholds to be set. Third, I cd to the directories in the install_howto and wget the files into each of the directories. Fourth, I used the chmod commands on both of the scripts. I then typed LearnAsSpam.pl at the root prompt and it ran. Then, I typed spamfilter-stats-7.pl at the root prompt and nothing happened. I then hit the return key and got the error:
 
Use of uninitialized value in string lt at /usr/bin/spamfilter-stats-7.pl line 222, <> line 1.

This seems like a programming error to me. I recall from some of my programming experience that some languages don't like using variables that are not initialized. Further, it seems the program is going into an indefinite loop. Maybe it has to do with Linux and/or a bad permission I have set on my system. I will be the first to admit that my Linux skills are beginner. That is why I chose to use SME. Any ideas?  :?
You can't stop what's coming. It ain't all waiting on you.

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #47 on: November 17, 2006, 08:58:46 PM »
the file mailstats.cron calls the script spamfilter-stats-7.pl, and passes file names to it to be processed. This is the actual code in mailstats.cron:

perl /usr/bin/spamfilter-stats-7.pl /var/log/qpsmtpd/*.s /var/log/qpsmtpd/current

'/var/log/qpsmtpd/*.s' means all files in the directory ending with .s
and '/var/log/qpsmtpd/current' means the current, live log.

When the current log grows to a certain size, it's renamed with an .s extention and a new log is started. If there was a problem, like a loss of power, the file gets named with a .u and isnt used again. You can delete any .u files, unless you want to keep them for their records.

Anyway, until your server receives enough mail to create a .s file, spamfilter-stats-7.pl  wont be able to report on any. And it quits with an error.

If you want to run the script by hand, paste that code listed above...

Oh, and spamfilter-stats-7.pl runs without enabling bayes and without using the LearnAsSpam.pl script. It can be run on any server, and doesnt depend on those other scripts he wrote.

Offline raem

  • *
  • 3,972
  • +4/-0
BayesFiltering for SpamAssasin
« Reply #48 on: November 18, 2006, 05:28:18 PM »
devtay

> I used the instructions at http://www.sonoracomm.com/index.php?option=com_content&task=view&id=49&Itemid=32  

That how to has all the steps needed, you should have followed it exactly but you didn't, as you introduced other steps of your own creating.


> I had to add to the instructions by chmod'ing both of the perl scrips as well.

You did not need to do that.


> I thought I may have missed something....

There is the problem, if you did all the steps in the how to, then your system is configured and you did not need to do anything else.

You tried running the scripts manully, but that is not what you are supposed to do.
Those scripts get run by cron jobs, the how to steps simply copy those scripts to your server.

You should install the rpm as per the very start of the howto
rpm -Uvh
http://mirror.contribs.org/smeserver/contribs/
michaelw/sme7/smeserver-spamassassin-features-0.0.2-0.noarch.rpm

Manually create the LearnAsSpam folder in your IMAP email client, and then wait 24 hours for the scripts to run etc, you will receive email reports.
...

Offline compdoc

  • ****
  • 211
  • +0/-0
BayesFiltering for SpamAssasin
« Reply #49 on: November 19, 2006, 05:51:03 PM »
If anyone is still using the d_gerst scripts to enable your bayes filtering, you might want to make some changes.

d_gerst created a security risk by placing the bayes token database files in the /etc/mail/spamassassin folder.  The risk comes from anyone changing

permissions on this folder to correct errors in the /var/log/spamd/current log that prevented bayes from working. He should have just used the default

location..

And if you didnt change the permissions, then your bayes filtering isnt working anyway...

If you still want to use d_gerst's bayes_filter.sh script, you have to make changes in it to correct the paths. If interested, I'll post what you need

to do...

The following isnt a script. Sorry, but you have to enter the commands by hand. If you use Windows, and edit over SSL, don't use Wordpad as

your editor. Use Notepad, or Crimson Editor...

Code: [Select]

service spamd stop

# writes the bayes_journal file to the database

sa-learn --sync

# moves files

cd /etc/mail/spamassassin
mv bayes.mutex /var/spool/spamd/.spamassassin
mv bayes_seen /var/spool/spamd/.spamassassin
mv bayes_toks /var/spool/spamd/.spamassassin


# removes temp learning directories. These
# are automatically added back by the d_gerst
# script if you still use it.

rmdir spam
rmdir missedspam
rmdir ham


# change owner/group back to defaults

chown root.root /etc/mail/spamassassin
chown root.root /etc/mail/spamassassin/bayes_filter.sh

# If you want to delete the d_gerst scripts and put
# your server back to the way it was:

rm /etc/mail/spamassassin/bayes_filter.sh
rm /etc/cron.d/sa-bayes_learning

# restore templates
# These templates are used to build the spamassassin
# config file local.cf. If you dont use bayes filtering
# then they arent used.

cd /etc/e-smith/templates/etc/mail/spamassassin/local.cf

# restore 10paths
# Edit the file 10paths, and replace the existing text.
# These arent commands! Paste this text in:

bayes_path /var/spool/spamd/.spamassassin/bayes
bayes_file_mode 750
auto_whitelist_path /var/spool/spamd/.spamassassin/auto-whitelist
auto_whitelist_file_mode 750


# 10internal_networks
# The file 10internal_networks was deleted by the script
# install_sa-learn.sh
# If you want to continue using the d_gerst scripts, leave
# it deleted.
# To restore it, create a file named 10internal_networks and
# paste this in:

{ "internal_networks $LocalIP" }



# del 71BayesFilter
# install_sa-learn.sh added the file 71BayesFilter.
# It sets the bayes_path incorrectly. Delete it:

rm /etc/e-smith/templates/etc/mail/spamassassin/local.cf/71BayesFilter

expand-template /etc/mail/spamassassin/local.cf

# Your /etc/mail/spamassassin/local.cf file should now look similar to this:

#------------------------------------------------------------
#       !!DO NOT MODIFY THIS FILE!!
#
# Manual changes will be lost when this file is regenerated.
#
# Please read the developer's guide, which is available
# at http://wiki.contribs.org/development/
#
# Copyright (C) 1999-2006 Mitel Networks Corporation
#------------------------------------------------------------
dns_available yes
lock_method flock
ok_languages all
ok_locales all
bayes_path /var/spool/spamd/.spamassassin/bayes
bayes_file_mode 750
auto_whitelist_path /var/spool/spamd/.spamassassin/auto-whitelist
auto_whitelist_file_mode 750
report_safe 0
required_hits 6
rewrite_header Subject [SPAM]
skip_rbl_checks 0
clear_trusted_networks
trusted_networks 192.168.1.5 127.
use_auto_whitelist 0
use_bayes 1


# Start it all back up

signal-event email-update
svc -t /service/qpsmtpd
service spamd start

# check the tokens are still there

sa-learn --dump magic

# shows something like this:

0.000          0          3          0  non-token data: bayes db version
0.000          0       3625          0  non-token data: nspam
0.000          0        286          0  non-token data: nham
0.000          0     126486          0  non-token data: ntokens
0.000          0 1161470280          0  non-token data: oldest atime
0.000          0 1163950938          0  non-token data: newest atime
0.000          0 1163921405          0  non-token data: last journal sync atime
0.000          0 1163292565          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction count