Koozali.org: home of the SME Server

[SOLVED]Disabling and re-enabling bayesian filtering makes bayesian inoperable?

Offline Michail Pappas

  • *
  • 339
  • +1/-0
Hello all,

I have a production 9.2 system, running stock software. Since I've not reset the Bayes database for more than 5 years, I've followed the wiki instructions to reset them at https://wiki.contribs.org/SME_Server:Documentation:FAQ:Section04#Reset_the_Bayes_Database but it feels like bayesian does not work.

Specifically, after doing the final signal-event mail-update step, contents of /var/spool/spamd/.spamassassin/ do not change (either the timestamp or the size).

Any ideas on what might be wrong? Thanks in advance.

« Last Edit: November 17, 2020, 08:37:23 AM by Michail Pappas »

Offline ReetP

  • *
  • 3,722
  • +5/-0
What do your logs say?
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline mmccarn

  • *
  • 2,626
  • +10/-0
I found that the files on my server in the bayes db folder had not been updated since 2018...
Code: [Select]
# ls -l /var/spool/spamd/.spamassassin/
total 9304
-rw-r----- 1 spamd spamd        6 Dec  7  2018 bayes.mutex
-rw-r----- 1 spamd spamd  1306624 Dec  7  2018 bayes_seen
-rw-r----- 1 spamd spamd 10571776 Dec  7  2018 bayes_toks

Some new db configuration settings were added to smeserver-spamassassin, and the default for bayes autolearn was changed:
Code: [Select]
# rpm -q --changelog smeserver-spamassassin |head -8
* Wed Jun 28 2017 Jean-Philipe Pialasse <tests@pialasse.com> 2.4.0-9.sme
- disable auto_learn by default when enabling Bayes [SME: 10360]
- added properties UseBayesAutoLearn, BayesAutoLearnThresholdSpam and
BayesAutoLearnThresholdNonSpam

* Wed Mar 08 2017 Daniel Berteaud <daniel@firewall-services.com> 2.4.0-8.sme
- Rewrite spamd run script to add support for --allow-tell [SME: 10138]

Adding two new config db settings and restarting email looks like it got bayes working again for me:
Code: [Select]
config setprop spamassassin UseBayesAutoLearn 1
config setprop spamd SpamLearning enabled
signal-event email-update


Offline Michail Pappas

  • *
  • 339
  • +1/-0
Thanks for sharing this mate!
Adding two new config db settings and restarting email looks like it got bayes working again for me:
Code: [Select]
config setprop spamassassin UseBayesAutoLearn 1
config setprop spamd SpamLearning enabled
signal-event email-update
I've made these changes, I definitely did not have the UseBayesAutoLearn and SpamLearning configuration options in my setup. Did the email-update reconfiguration as per instructions, but I still do not see something changing:
Code: [Select]
2020-11-16 09:47:30.851104500 Nov 16 09:47:30.851 [29113] info: spamd: clean message (-5.8/6.0) for qpsmtpd:1005 in 5.2 seconds, 24004 bytes.
2020-11-16 09:47:30.851372500 Nov 16 09:47:30.851 [29113] info: spamd: result: . -5 - DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,RCVD_IN_SORBS_WEB,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL scantime=5.2,size=24004,user=qpsmtpd,uid=1005,required_score=6.0,rhost=127.0.0.1,raddr=127.0.0.1,rport=41520,mid=<ef22726427df11ebbac4000af7a3a6f4-914527e0@facebookmail.com>,autolearn=no autolearn_force=no
I believe that with a score of -5 this should be trained as ham, but autolearn was not set for this message. Furthermore, the /var/spool/spamd/.spamassassin bayes_* files remain with the same (old) modification date.

Offline Michail Pappas

  • *
  • 339
  • +1/-0
IIRC, I had set years ago the spam threshold to 6 and the ham to 0 or something. In the db configuration settings added were two new ones, BayesAutoLearnThresholdSpam and BayesAutoLearnThresholdNonSpam. Which is somewhat strange, since things did work before even though these were set.

Digging around I saw that if these two are not configured, then scores of 6 and -1.15 for spam and ham are set in /etc/mail/spamassassin/local.cf. Which also seems ok...

Still not sure if bayesian training is in place or not. Bayes definitely does not work, since I do not observe the usual BAYES_xx SA tags in my emails (see previous message).

I'm adding my own /etc/mail/spamassassin/local.cf file here:
Code: [Select]
#------------------------------------------------------------
#              !!DO NOT MODIFY THIS FILE!!
#
# Manual changes will be lost when this file is regenerated.
#
# Please read the developer's guide, which is available
# at http://www.contribs.org/development/
#
# Copyright (C) 1999-2006 Mitel Networks Corporation
#------------------------------------------------------------
# bayes_learn_to_journal 1

dns_available yes
internal_networks x.y.z.w
lock_method flock
loadplugin     Mail::SpamAssassin::Plugin::TextCat
ok_languages en el
ok_locales en el gr
bayes_path /var/spool/spamd/.spamassassin/bayes
bayes_file_mode 750
report_safe 0
required_score 6
rewrite_header Subject [SPAM]
skip_rbl_checks 0
clear_trusted_networks
trusted_networks x.y.z.w 127/8 x.y/16

use_bayes 1
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam  -1.15
bayes_auto_learn_threshold_spam 6

add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ autolearn=_AUTOLEARN_
add_header all Details _REPORT_
« Last Edit: November 16, 2020, 09:42:20 AM by Michail Pappas »

Offline ReetP

  • *
  • 3,722
  • +5/-0
https://wiki.contribs.org/Email#Bayesian_Filtering
https://wiki.contribs.org/Learn

And check the bugs on the pages to see if they affect you (I haven't gone that far)

If that isn't it then you should really open a bug, though it won't get fixed on v9 now.

Sorry to be brief, but amongst a pile of other stuff I am I'm currently trying to get the latest spamassassin built on v10!

We probably ought to check this as well - I'll see if it is on v10 and get a bug listing added for it as well (so much to do, so few hands, so little time)

https://wiki.contribs.org/Sme-unjunkmgr

(answer - no it isn't but we can look at doing that)
« Last Edit: November 16, 2020, 11:30:45 AM by ReetP »
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline Michail Pappas

  • *
  • 339
  • +1/-0
No worries, SA works fine even without an Bayesian, just wanted to improve upon it (plus it worked before resetting the Bayesian database, hence for inquries as to what happened).

For the sake of completeness, my current config is the following:
Code: [Select]
# config show spamassassin
spamassassin=service
    DNSAvailable=yes
    MessageRetentionTime=90
    OkLanguages=en el
    OkLocales=en el gr
    RejectLevel=6.01
    ReportSafe=0
    Sensitivity=custom
    SkipRBLChecks=0
    SortSpam=enabled
    Subject=[SPAM]
    SubjectTag=enabled
    TagLevel=6
    TrustedNetworks=127/8 192.168/16
    UseAutoWhitelist=0
    UseBayes=1
    UseBayesAutoLearn=1
    status=enabled

I've not set BayesAutoLearnThresholdSpam and BayesAutoLearnThresholdNonspam since it defaults nicely to 6 and -1.15 respectively.

Offline mmccarn

  • *
  • 2,626
  • +10/-0
I think you need to train the bayes database with 200 spam and 200 ham before autolearn will kick in.

My system is pretty low on spam:
Code: [Select]
# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0         68          0  non-token data: nspam
0.000          0      11247          0  non-token data: nham
0.000          0     129974          0  non-token data: ntokens
0.000          0 1446239702          0  non-token data: oldest atime
0.000          0 1605205640          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0 1605367708          0  non-token data: last expiry atime
0.000          0   11059200          0  non-token data: last expire atime delta
0.000          0     318499          0  non-token data: last expire reduction count

On my system, running 'sa-learn' manually on a mail folder updates two of the bayes files
(I've already "trained" using this folder, hence the result showing "Learned tokens from 0 message(s)" and the unchanged time/date on "bayes_seen"):
Code: [Select]
# cd /home/e-smith/files/users/mmccarn/Maildir/.junkmail/cur
# date; sudo -u spamd sa-learn --showdots --spam *
Mon Nov 16 08:24:00 EST 2020
..........................
Learned tokens from 0 message(s) (26 message(s) examined)

# ls -l /var/spool/spamd/.spamassassin/
total 5012
-rw-r----- 1 spamd spamd    2670 Nov 16 08:24 bayes.mutex
-rw-r----- 1 spamd spamd 1306624 Nov 14 10:47 bayes_seen
-rw-r----- 1 spamd spamd 5013504 Nov 16 08:24 bayes_toks

Offline Michail Pappas

  • *
  • 339
  • +1/-0
I think you need to train the bayes database with 200 spam and 200 ham before autolearn will kick in.
Definitely, I'm still learning here:
Code: [Select]
# sa-learn --dump magic
netset: cannot include 127.0.0.0/8 as it has already been included
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0         10          0  non-token data: nham
0.000          0       2536          0  non-token data: ntokens
0.000          0 1605517125          0  non-token data: oldest atime
0.000          0 1605598209          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction count
The statistics above are from an almost 24 hour period for a server receiving around 1700 messages daily. Now, I'd swear that I used to see much, much more spam...

In any case, I've found why the bayes_* files remained with the same timestamp. These files changed when I made sa-learn --sync. They also change with sa-learn when invoked with --dump magic...

Consider thread solved :)

Offline ReetP

  • *
  • 3,722
  • +5/-0
This thread has been interesting as I've never used Bayes before.

If I ever get round to it I am trying to build the latest Spamassassin on v10.

Please keep an eye out as I'll need some testing!!!

I'll post a bug number here when I have I have imported it.
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline ReetP

  • *
  • 3,722
  • +5/-0
OK,

For your delectation & delight.

In SME 10 we have added smeserver-dovecot-extras

Import:
https://bugs.contribs.org/show_bug.cgi?id=11032

Still got a bug we are trying to resolve:
https://bugs.contribs.org/show_bug.cgi?id=11170

There is a test rpm in my test repo.

And also smeserver-unjunkmgr:
https://bugs.contribs.org/show_bug.cgi?id=11178

This needs testing & fixing.

If I get 5 minutes I'll try and fix the latest spamassassin too....

Please, get a VM and test.
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline ReetP

  • *
  • 3,722
  • +5/-0
Spamassassin 3.4.4 is done.

https://bugs.contribs.org/show_bug.cgi?id=11206

Badly needs:

Testing.
Refining.

Also needs testing with dovecot extras.

Please.

Get involved & help.
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline Michail Pappas

  • *
  • 339
  • +1/-0
Great news mate! Pity I can't afford any time to test :(

Offline ReetP

  • *
  • 3,722
  • +5/-0
Great news mate! Pity I can't afford any time to test :(

I was going to have a good long rant here, but I won't.

It's ironic that you found time to fix your spamassassin. How did you manage that?

What makes you think any of us have the time to build v10?

Koozali SME is built by volunteers in their own time. None of work for Koozali.

We also have pandemics, (and Brexit) and wives and children and family and friends and jobs.

We just don't make excuses.

As you have PM'd me for a Rocket account I'll set that up, and then perhaps you can come and see what we have been up to and see the hours that people have spent in their spare time trying to get this done.

...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline Michail Pappas

  • *
  • 339
  • +1/-0
I was going to have a good long rant here, but I won't.
Good for both of us. ;)

Quote
It's ironic that you found time to fix your spamassassin. How did you manage that?
I didn't actually... Left it in that state, no time to cook things up. SA works even without bayesian.

Quote
What makes you think any of us have the time to build v10?

Koozali SME is built by volunteers in their own time. None of work for Koozali.

We also have pandemics, (and Brexit) and wives and children and family and friends and jobs.

We just don't make excuses.

As you have PM'd me for a Rocket account I'll set that up, and then perhaps you can come and see what we have been up to and see the hours that people have spent in their spare time trying to get this done.
No need to explain to me what is pretty obvious :) Will discuss things privately.