Koozali.org: home of the SME Server

Spamassassin / Bayesian Filters / Internal Mail Server

Offline mmccarn

  • *
  • 2,626
  • +10/-0
Spamassassin / Bayesian Filters / Internal Mail Server
« on: September 22, 2006, 03:29:17 PM »
Is there an easy way to "train" the bayesian filters if I am using an "Internal Mail Server"?

All the discussions I see about training spamassassin seem to be designed for systems using local mail delivery...

I have several sites using SME 7 as a firewall / spam filter in front of Exchange servers, and would like to provide my users with an easy way to indicate SPAM and HAM from within Outlook.

My biggest site has approximately 300 user accounts, so I'm hoping for a solution that doesn't require that I duplicate all of these users on the SME server...

Offline mmccarn

  • *
  • 2,626
  • +10/-0
Spamassassin / Bayesian Filters / Internal Mail Server
« Reply #1 on: September 28, 2006, 02:58:35 PM »
[bump]Anyone?[/bump]

Offline mmccarn

  • *
  • 2,626
  • +10/-0
Spamassassin / Bayesian Filters / Internal Mail Server
« Reply #2 on: October 01, 2006, 06:21:23 PM »
I found this thread: http://forums.contribs.org/index.php?topic=32158.0, and this one: http://www.sonoracomm.com/index.php?option=com_content&task=view&id=49&Itemid=32 and have now done the following:
    cd ~
    wget http://mirror.contribs.org/smeserver/contribs/michaelw/sme7/smeserver-spamassassin-features-0.0.2-0.noarch.rpm

    yum localinstall smeserver-spamassassin-features-0.0.2-0.noarch.rpm

    config setprop spamassassin BayesAutoLearnThresholdSpam 12.00
     
    sa-learn --sync --dbpath /var/spool/spamd/.spamassassin -u spamd
    "chown spamd.spamd /var/spool/spamd/.spamassassin/bayes_*
    chmod 750 /var/spool/spamd/.spamassassin/bayes_*
    chown spamd.spamd /var/spool/spamd/.spamassassin/bayes.mutex

    signal-event post-upgrade
    signal-event reboot
    [/list]
    Notes:
      - Thanks to Michael Weinberger for the smeserver-spamassassin-features rpm!
      - Thanks to sonoracomm for a great how-to!
      - I use "yum localinstall" instead of "rpm -Uvh" because someone who knows more about SME than I do recommended it.  It takes longer, and tells me to reboot when I'm done, but what do I know?
      - I change the default for "BayesAutoLearnThresholdSpam" from 4.00 to 12.00 because I dread having users call me claiming they are missing their email.
      - I found I had to change the permissions on "bayes.mutex" or I get "Permission denied" errors in /var/log/spamd/current after manually creating the bayes databases as shown above.
      -
    sa-learn --dump magic shows that I am, indeed, accumulating bayes tokens.
    - On one system I have the "Autolearn" and "Spam reject" thresholds both set to 12 - this system *is* learning spam with these settings.[/list]
    Concerns:
      - One of my clients has accumulated a long "whitelist", and has a tendancy to whitelist entire domains - like "*@aol.com".  I worry that this may result in corruption of the bayes database as spam from aol.com users is artificially scored -100 due to the white-listing.
      - I still have no easy way for Exchange users to indicate SPAM or HAM for training the bayes database

    Wishlist:
    I'd love to have a way to have SME provide a SPAM Quarantine service in a manner similar to that used by "spamshark" and "postini", but this is way, way beyond me!

    SPAM Quarantine Server
      - SPAM above a certain score is stored on the server
      - a SPAM summary is emailed to the user regularly (daily, weekly, when the number of new spam = x) listing the date, time, sender, subject and spamassassin score of any spam received.  
      - Each line in the summary email is a clickable link that will open a specific message in a browser window directly from the quarantine server.
      - The user can "release" spam into their normal "Inbox"
      - User access to the quarantined emails is controlled by a self-generated password that works something like this:
        a) the user receives a SPAM summary email with a general SPAM quarantine login link at the top, and in which each message is a link to that specific message.
        b) the user clicks on any message, or on the login link
        c) SPAM quarantine login opens in a web browser.  The "usename" is the email address, and is automatically filled in by the links in the SPAM summary.  The login page includes an option to "email me a new password".
        d) If the user clicks on the "new password" link, the system sends a random password to the user's email address
        e) Once the user has received their password, s/he logs in.
        f) Once logged in, the user can view, delete, or "release to Inbox" any message in the Quarantine.  S/he can also change settings like SPAM retention time, SPAM Quarantine password, SPAM threshold, etc.
      - The system administrator can allow or deny spam quarantine access to individual users, and can view, release or delete all quarantined SPAM for all users.
      - Account creation is automatic, and since the system provides a way for users to generate their own passwords, does not need to be integrated with the authentication mechanism of the ultimate mail server.

    Offline azche24

    • *
    • 163
    • +0/-0
      • http://az-law.de
    Spamassassin / Bayesian Filters / Internal Mail Server
    « Reply #3 on: October 02, 2006, 06:13:56 AM »
    Hi,

    to me it makes no sense to use the SpamAssassin contrib by michael in a setting like yours. The whole stuff depends on active users, that check their /junkmail folders, and put false positives into hamlearning and spam into junkmailmissed folders.

    You can only achieve that with direct access to these folders on the SMEserver. Which to my knowledge is not possible via exchange/outlook.

    An autolearn-level of 12.00 does not make sense either, you get all the spam.

    If you really want to use bayes filtering in this kind of setup, you should perhaps install assp on your smtp-system. This allows to blacklist, whitelist by mail and trains the wordlists by checking the actual traffic of your users.

    Also your users can individually put wrong positives as "notspam" by email by simply sending them to a special account.
    Alexander Ziemann, Berlin - DE

    Offline mmccarn

    • *
    • 2,626
    • +10/-0
    Spamassassin / Bayesian Filters / Internal Mail Server
    « Reply #4 on: October 02, 2006, 02:27:35 PM »
    Thanks; I've been too lazy so far to look into assp, but I guess I should!  It would be reasonably easy to setup at one of my sites...

    Michael's contrib just turns on system-wide autolearn; the "LearnAsSpam.pl" script mentioned in Sonoracomm's HowTo that processes the users' SPAM and HAM folders comes from "bread" (Brian Read from abandonmicrosoft.co.uk, I think), and can be found here: http://mirror.contribs.org/smeserver/contribs/bread/mailstats/

    My "Spam" score is 5, but my "autolearn" level is 12 - since I have no way for users to indicate a false positive, I'm hoping this will keep my server from "learning" the wrong stuff.

    I suppose I could setup "spam" and "notspam" email addresses and figure out how to process them using sa-learn --use-ignores, but that could be harder than learning how to install & use assp...