Koozali.org: home of the SME Server

default Spamassassin auto-learn on PR1?

Offline idyll

  • ***
  • 113
  • +0/-0
default Spamassassin auto-learn on PR1?
« on: March 21, 2006, 09:39:09 PM »
Can one of the developers advise if a new installation of 7.0 PR1 uses the default of needing to see 200 pieces of SPAM before auto-learning is enabled?

Auto-learning is clearly disabled on my new server.

I used a perl script (thanks Jesper!) and ran it against a LearnAsSpam folder in each home directory to teach my 6.0.1 server. The cron/script is currently failing as it reports auto-learning is disabled.

I can wait as I know the system needs a minimum of SPAM before this is invoked, I just am asking for confirmation.

thanks

patrick
...

Offline raem

  • *
  • 3,972
  • +4/-0
Re: default Spamassassin auto-learn on PR1?
« Reply #1 on: March 22, 2006, 09:46:58 AM »
idyll

Depending on how the spam filter is configured in sme7, spam messages are rejected by the server and therefore will not be moved to the junkmail folder, as was the case in sme6 spam filter by knuddi.

If you enable RBL list rejection, you will be receiving a greatly reduced amount of spam anyway and therefore not see very many spam messages in the junkmail folder.
...

Offline idyll

  • ***
  • 113
  • +0/-0
I'm not sure what you are answering
« Reply #2 on: March 22, 2006, 02:51:35 PM »
UPDATE - I found this inside the system logged in as root. Enter this while in /etc/mail/spamassassin

perldoc Mail::SpamAssassin::Conf

which pretty much lays it all out. I think this should referenced for any and all SA questions. It clearly answered mine  ;-)

patrick

---------------------------------------------



I understand those issues very well. I did not make mention of RBLs or rejection thresholds.

The sa-learn features of the Bayes filters are OFF until they reach a threshold of a number of SPAM received. My site receives close to 1000 SPAM per day.

I have been using PR1 for five days and I am confused why the sa-learn is still disabled. A new installation will not be training the Bayes filters until this numeric threshold is reached.

When I enter this

sa-learn -D --dump data, the outout is this...

------------ snip ---------------------

[root@galadriel ~]# sa-learn --dump data
ERROR: Bayes dump returned an error, please re-run with -D for more information
[root@galadriel ~]# sa-learn -D --dump data
debug: SpamAssassin version 3.0.5
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/sbin/e-smith', keeping.
debug: PATH included '/usr/local/sbin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/usr/X11R6/bin', keeping.
debug: PATH included '/root/bin', which doesn't exist, dropping.
debug: Final PATH set to: /sbin/e-smith:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin
debug: using "/etc/mail/spamassassin/init.pre" for site rules init.pre
debug: config: read file /etc/mail/spamassassin/init.pre
debug: using "/usr/share/spamassassin" for default rules dir
debug: config: read file /usr/share/spamassassin/10_misc.cf
debug: config: read file /usr/share/spamassassin/20_anti_ratware.cf
debug: config: read file /usr/share/spamassassin/20_body_tests.cf
debug: config: read file /usr/share/spamassassin/20_compensate.cf
debug: config: read file /usr/share/spamassassin/20_dnsbl_tests.cf
debug: config: read file /usr/share/spamassassin/20_drugs.cf
debug: config: read file /usr/share/spamassassin/20_fake_helo_tests.cf
debug: config: read file /usr/share/spamassassin/20_head_tests.cf
debug: config: read file /usr/share/spamassassin/20_html_tests.cf
debug: config: read file /usr/share/spamassassin/20_meta_tests.cf
debug: config: read file /usr/share/spamassassin/20_phrases.cf
debug: config: read file /usr/share/spamassassin/20_porn.cf
debug: config: read file /usr/share/spamassassin/20_ratware.cf
debug: config: read file /usr/share/spamassassin/20_uri_tests.cf
debug: config: read file /usr/share/spamassassin/23_bayes.cf
debug: config: read file /usr/share/spamassassin/25_body_tests_es.cf
debug: config: read file /usr/share/spamassassin/25_hashcash.cf
debug: config: read file /usr/share/spamassassin/25_spf.cf
debug: config: read file /usr/share/spamassassin/25_uribl.cf
debug: config: read file /usr/share/spamassassin/30_text_de.cf
debug: config: read file /usr/share/spamassassin/30_text_fr.cf
debug: config: read file /usr/share/spamassassin/30_text_nl.cf
debug: config: read file /usr/share/spamassassin/30_text_pl.cf
debug: config: read file /usr/share/spamassassin/50_scores.cf
debug: config: read file /usr/share/spamassassin/60_whitelist.cf
debug: using "/etc/mail/spamassassin" for site rules dir
debug: config: read file /etc/mail/spamassassin/local.cf
debug: config: read file /etc/mail/spamassassin/whitelist.cf
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: plugin: loading Mail::SpamAssassin::Plugin::URIDNSBL from @INC
debug: plugin: registered Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x90168a4)
debug: plugin: loading Mail::SpamAssassin::Plugin::Hashcash from @INC
debug: plugin: registered Mail::SpamAssassin::Plugin::Hashcash=HASH(0x999f734)
debug: plugin: loading Mail::SpamAssassin::Plugin::SPF from @INC
debug: plugin: registered Mail::SpamAssassin::Plugin::SPF=HASH(0x996cbb8)
debug: plugin: Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x90168a4) implements 'parse_config'
debug: plugin: Mail::SpamAssassin::Plugin::Hashcash=HASH(0x999f734) implements 'parse_config'
debug: Score set 0 chosen.
debug: bayes: no dbs present, cannot tie DB R/O: /root/.spamassassin/bayes_toks
ERROR: Bayes dump returned an error, please re-run with -D for more information
[root@galadriel ~]#

--------- snip ---------------

the debug output is saying no dbs present? This is due to the sa-learn not being enabled.

So - developers - was this by design? I realize the Bayes mechanism can be a CPU hog on very feeble systems but overall it is the key to long-term effectiveness.

At the very least, I think you will agree, this is a grey area and it would be awesome to have some details spelling out the many options in the documentation. The trade-offs, maybe some setting templates depicting the pros and cons of using this feature/filter or that one, etc., relative to the scale of a person's site and the horsepower of their server. It is certainly one of the most oblique features provided by this splendid server.

regards,

patrick
...

Offline raem

  • *
  • 3,972
  • +4/-0
Re: I'm not sure what you are answering
« Reply #3 on: March 23, 2006, 01:44:15 AM »
idyll

>... I did not make mention of RBLs or rejection thresholds.

That's why I mentioned them.


>....My site receives close to 1000 SPAM per day.

Here's the spam filter report for the last 24hrs from my sme6 server (which uses the same techniques as sme7) which shows that RBL rejection takes care of the majority of spam messages, very few get to the junkmail folders.

Total spam rejected   :      764 ( 67.85%)
       RBL rejected   :      674 ( 59.86%)
     Score above 15   :       30 (  3.93%)
Total ham accepted    :      362 ( 32.15%)
                        -------------------
Total emails processed:     1126 (   47/hr)


Here is the virus scanner report, as you can see it has very little to do, also mainly due to RBL & pattern matching rejection.

Total emails rejected :        2 (  0.40%)
             Problems :        0 (  0.00%)
          Quarantined :        2 (100.00%)
Total emails accepted :      494 ( 99.60%)
                        -------------------
Total emails processed:      496 (   21/hr)


> ...I realize the Bayes mechanism can be a CPU hog on very feeble
>  systems but overall it is the key to long-term effectiveness.

If you want to be effective against spam (& viruses) I think RBL rejection is essential on any server (and also the use of the spamassassin custom rejection setting).
In that case, I was pointing out that you will have (significantly) less spam actually getting into the system for Bayes analysis to do its work on.

Perhaps the developers feel the same way so that may be part of the reason they did not enable Bayes by default.
...

Offline MSmith

  • *
  • 675
  • +0/-0
default Spamassassin auto-learn on PR1?
« Reply #4 on: March 23, 2006, 10:03:32 AM »
Ray, do you know a way on SME7 to get those concise spam & virus reports?  Thanks!
...

Offline raem

  • *
  • 3,972
  • +4/-0
default Spamassassin auto-learn on PR1?
« Reply #5 on: March 23, 2006, 11:01:45 AM »
MSmith

I made necessary changes to both those scripts and configured appropriate entries  for antivirus in the configuration db, and for conf.global in the spamassassin db, and both scripts run OK.

I don't have any mail messages on my test sme7 server, so I can't see if the reports are picking up the data correctly (all fields show 0).
...

Offline raem

  • *
  • 3,972
  • +4/-0
default Spamassassin auto-learn on PR1?
« Reply #6 on: March 23, 2006, 12:05:52 PM »
deleted original post

Ray
...

Offline raem

  • *
  • 3,972
  • +4/-0
default Spamassassin auto-learn on PR1?
« Reply #7 on: March 24, 2006, 04:19:47 AM »
MSmith

>...do you know a way on SME7 to get those concise spam & virus reports?

As my earlier post has disappeared (maybe I accidently edited it ??), I am reposting the info.

The reports are generated by scripts in spamfilter and antivirus for sme6 from jesper knudsen.
The scripts are
/usr/bin/antivirus-stats.pl
and
/usr/bin/spamfilter-stats.pl

These scripts are run by cron jobs in /etc/cron.d

These scripts can be copied to sme7 and modified to suit changes in sme7
ie different db location & log files
ie use
/home/e-smith/db/spamassassin
and
/home/e-smith/db/configuration

Also refer to /var/log/qpsmtpd/current rather than smtpfront-qmail/current

You will also need to put entries into the spamassassin db for conf.global
and into the configuration db for antivirus.
The important thing is to enable reports or when you run the scripts they will do nothing.
See the corresponding entries in sme6 db's for what is required.

Perhaps the scripts need to be rewritten for sme7.
My test sme7 server does not have enough email entries in the log files to see if the reports that are being generated by the scripts are valid, but the reports do appear to run OK (after modifying the scripts).
...

Offline brianr

  • *
  • 988
  • +2/-0
default Spamassassin auto-learn on PR1?
« Reply #8 on: March 24, 2006, 07:49:10 AM »
Ray - could you at least publish the contents of the modified scripts?

I have been looking at doing a similar thing, but do not want to duplicate your effort.
Brian j Read
(retired, for a second time, still got 2 installations though)
The instrument I am playing is my favourite Melodeon.
.........

Offline raem

  • *
  • 3,972
  • +4/-0
default Spamassassin auto-learn on PR1?
« Reply #9 on: March 24, 2006, 08:21:31 AM »
brianr

You will also need to add a few of the missing folders to your sme7 server to prevent script errors.
The changes I made were not extensive and depending how you implement it into sme7, there may be more work required.
Two scripts follow:


spamfilter-stats.pl

#!/usr/bin/perl

#############################################################################
#
# This script provides daily SpamFilter statistics and deletes all users
# junkmails. Configuration of the script is done by the Spam Filter
# Server-Manager module
#
# This script has been developed
# by Jesper Knudsen at http://sme.swerts-knudsen.dk
#
# Revision History:
#
# August 13, 2003:      Initial version
# August 25, 2004:   fixed problem when hostname had no-ASCII chars
# March 23, 2006        Revised for sme7 RM
#############################################################################


# internal modules (part of core perl distribution)
use Getopt::Long;
use Pod::Usage;
use POSIX qw/strftime floor/;
use Time::Local;
use Date::Manip;
use strict;
use esmith::ConfigDB;
use Sys::Hostname;

my $hostname = hostname();


#Configuration section
my %opt = ();
$opt{'logfile'} = '/var/log/maillog';      # Log file
$opt{'sendmail'} = '/usr/sbin/sendmail';   # Path to sendmail stub
$opt{'from'} = 'Admin';            # Who is the mail from
$opt{'end'} = "";
$opt{'start'} = "yesterday";
$opt{'mail'} = "admin";
$opt{'timezone'} = date +%z;
Date_Init("TZ=$opt{'timezone'}");

# Parameters for the Delete Junkmail functionality

my $file1 = "/home/e-smith/db/accounts";      #E-SMITH ACCOUNTS DATABASE
my $path = "/home/e-smith/files/users/";   #PATH TO USER DIRECTORIES
my $end_path_cur = "/Maildir/;junkmail/cur";       #END OF PATH STRING
my $end_path_new = "/Maildir/;junkmail/new";  #END OF PATH STRING

# end

my $sa_dbase = '/home/e-smith/db/spamassassin';
my $dbh = esmith::ConfigDB->open($sa_dbase) || die "Unable to open spamassassin configuration dbase.";
my %sa_conf = $dbh->get('conf.global')->props;

my $disabled = 1;
my $days_to_keep = 0;                   #How many days to keep junkmail

my $parameter = "";
my $value = "";
while (($parameter,$value) = each(%sa_conf)) {
  if ($parameter eq 'daily_report' && $value eq '1') {
   $disabled = 0;
  }
  if ($parameter eq 'delete_after') {
        $days_to_keep = $value;
  }
}


my $tstart = time;

# efficiency; don't rebuild the (constant) hash every loop iteration
my %month_list = ('Jan' => 0,
        'Feb' => 1,
        'Mar' => 2,
        'Apr' => 3,
        'May' => 4,
        'Jun' => 5,
        'Jul' => 6,
        'Aug' => 7,
        'Sep' => 8,
        'Oct' => 9,
        'Nov' => 10,
        'Dec' => 11);

#Local variables
my $YEAR = (localtime(time))[5]; # this is years since 1900

my $total = 0;
my $spamcount = 0;
my $spamavg = 0;
my $hamcount = 0;
my $hamavg = 0;
my $threshtotal = 0;
my $above15 = 0;
my $RBLcount = 0;

my %spambyhour = ();
my %hambyhour = ();

my ($start, $end) = parse_arg($opt{'start'}, $opt{'end'});

#---------------------------------------
# First scan the maillog file
#---------------------------------------

#Open log file
open(LOG, "< $opt{'logfile'}") or die "Can't open $opt{'logfile'}: $!\n";

LINE: while (<LOG>) {

# Agh... this is ugly.
  if (m/
^(\w{3})\s+             # Month
(\d+)\s+                # Day
(\d\d):(\d\d):(\d\d)\s+ # HH:MM:SS
(\S+)\s+                  # Hostname?
spamd\[\d+\]:\s+        # spamd[PID]
(clean\smessage|identified\sspam)\s  # Status
\(([-0-9.]+)\/([-0-9.]+)\)\s # Score, Threshold
for\s
\w+:\d+\s             # for daf:1000
in\s
[0-9.]+\sseconds,\s+
[0-9]+\sbytes\./x) {  # There's an extra space at the end for some reason.


    #Split line into components
    my $mon = $1;
    my $day = $2;
    my $hour = $3;
    my $min = $4;
    my $sec = $5;
    my $status = $7;
    my $score = $8;
    my $threshold = $9;

    # Convert to absolute time
    my $abstime = timelocal($sec, $min, $hour, $day, $month_list{$mon}, $YEAR);
    my $abshour = floor ($abstime / 3600); # Hours since the epoch

    #If date specified, only process lines matching date
    next LINE if ($abstime < $start);
    # We can assume that logs are chronological
    last if ($abstime > $end);

    #Total score
    $total++;

    if ($status eq "identified spam") {
      #Spam scores
      $spamcount++;
      $spamavg += $score;
      $spambyhour{$abshour}++;
      if ($score > 15 ){
        $above15++;
      }

    } elsif ($status eq "clean message") {
      #Nonspam scores
      $hamcount++;
      $hamavg += $score;
      $hambyhour{$abshour}++;
    } else {
      die "Strange error in regexp";
    }

    $threshtotal += $threshold;

  }


# SpamAssassin version 3.1x has changed the log file output (why???)

if (m/
^(\w{3})\s+             # Month
(\d+)\s+                # Day
(\d\d):(\d\d):(\d\d)\s+ # HH:MM:SS
(\S+)\s+                  # Hostname?
spamd\[\d+\]:\s+        # spamd[PID]
spamd:\s+                                 # SPAMASSASSIN 3.1 LOGGING
(clean\smessage|identified\sspam)\s  # Status
\(([-0-9.]+)\/([-0-9.]+)\)\s # Score, Threshold
for\s
\w+:\d+\s             # for daf:1000
in\s
[0-9.]+\sseconds,\s+
[0-9]+\sbytes\./x) {  # There's an extra space at the end for some reason.
   
   
    #Split line into components
    my $mon = $1;
    my $day = $2;
    my $hour = $3;
    my $min = $4;
    my $sec = $5;
    my $status = $7;
    my $score = $8;
    my $threshold = $9;
   
    # Convert to absolute time
    my $abstime = timelocal($sec, $min, $hour, $day, $month_list{$mon}, $YEAR);
    my $abshour = floor ($abstime / 3600); # Hours since the epoch
   
    #If date specified, only process lines matching date
    next LINE if ($abstime < $start);
    # We can assume that logs are chronological
    last if ($abstime > $end);
    #Total score
    $total++;
                 
    if ($status eq "identified spam") {
      #Spam scores
      $spamcount++;
      $spamavg += $score;
      $spambyhour{$abshour}++;
      if ($score > 15 ){
        $above15++;
      }

    } elsif ($status eq "clean message") {
      #Nonspam scores
      $hamcount++;
      $hamavg += $score;
      $hambyhour{$abshour}++;
    } else {
      die "Strange error in regexp";
    }

    $threshtotal += $threshold;

  }

}


#Done reading file
close(LOG);

#---------------------------------------
# First scan the qpsmtpd log file  
#---------------------------------------

system ("cat /var/log/qpsmtpd/current | /usr/local/bin/tai64nlocal > /var/tmp/sme-spamfilter.rbl.out");

open(LOG, "/var/tmp/sme-spamfilter.rbl.out") or die "Can't open qpsmtpd logfile\n";
     
LINE: while (<LOG>) {
   
# Agh... this is ugly.
  if (m/
^(\w{4})\-+      # Year
(\d+)\-+                # Month
(\d+)\s+                   # Day  
(\d\d):(\d\d):(\d\d).(\w{9})\s+ # HH:MM:SS
rblsmtpd\:\s+           # rblsmtpd
\w+/x) {  # There's an extra space at the end for some reason.


    #Split line into components
    my $year =$1;
    my $mon = $2;
    my $day = $3;
    my $hour = $4;
    my $min = $5;
    my $sec = $6;

    # Convert to absolute time
    my $abstime = timelocal($sec, $min, $hour, $day, $mon-1, $YEAR);
    my $abshour = floor ($abstime / 3600); # Hours since the epoch


    #If date specified, only process lines matching date
    next LINE if ($abstime < $start);
    # We can assume that logs are chronological
    last if ($abstime > $end);

    #Total score
    $total++;
 
    #RBL score
    $RBLcount++;
   
      #Spam scores
      $spamcount++;
      $spambyhour{$abshour}++;
  }
}
#Done reading file
close(LOG);


#Calculate some numbers
$spamavg=$spamavg/$spamcount if $spamcount;
$hamavg=$hamavg/$hamcount if $hamcount;
my $threshavg=$threshtotal/$total if $total;
my $spampercent=(($spamcount/$total) * 100) if $total;
my $rblpercent=(($RBLcount/$total) * 100) if $total;
my $hampercent=(($hamcount/$total) * 100) if $total;
my $hrsinperiod=(($end-$start) / 3600);
my $emailperhour=($total/$hrsinperiod) if $total;
my $above15percent = (($above15/$spamcount) * 100) if $spamcount;


my $oldfh;
#Open Sendmail if we are mailing it
if ($opt{'mail'} && !$disabled) {
  open (SENDMAIL, "|$opt{'sendmail'} -oi -t -odq") or die "Can't open sendmail: $!\n";
  print SENDMAIL "From: $opt{'from'}\n";
  print SENDMAIL "To: $opt{'mail'}\n";
  print SENDMAIL "Subject: Spam Filter Statistics from $hostname - ",strftime("%F", localtime($start)), "\n\n";
  $oldfh = select SENDMAIL;
}

my $telapsed = time - $tstart;

if (!$disabled) {

   #Output results
   print  "Period Beginning : ", strftime("%c", localtime($start)), "\n";
   print  "Period Ending    : ", strftime("%c", localtime($end)), "\n";
        print  "SpamAssassin Version : ",spamassassin -V;
   print  "\n";
   printf "Reporting Period : %.2f hrs\n", $hrsinperiod;
   print  "--------------------------------------------------\n";
   print  "\n";
   printf "Total spam rejected   : %8d (%6.2f%%)\n", $spamcount, $spampercent || 0;
        printf "       RBL rejected   : %8d (%6.2f%%)\n", $RBLcount, $rblpercent || 0;
        printf "     Score above 15   : %8d (%6.2f%%)\n", $above15, $above15percent || 0;
   printf "Total ham accepted    : %8d (%6.2f%%)\n", $hamcount, $hampercent || 0;
   print  "                        -------------------\n";
   printf "Total emails processed: %8d (%5.f/hr)\n", $total, $emailperhour || 0;
   print  "\n";
   printf "Average spam threshold : %11.2f\n", $threshavg || 0;
   printf "Average spam score     : %11.2f\n", $spamavg || 0;
   printf "Average ham score      : %11.2f\n", $hamavg || 0;
   print "\n";
   print "Statistics by Hour\n";
   print "-------------------------------------\n";
   print "Hour                 Spam         Ham\n";
   print "-------------    --------    --------\n";

   my $hour = floor($start/3600);
   while ($hour < $end/3600) {
        printf("%s      %8d    %8d\n",
       strftime("%F, %H", localtime($hour*3600)),
       $spambyhour{$hour} || 0, $hambyhour{$hour} || 0);
        $hour++;
   }
   print "\n";

} # not disabled

if ($days_to_keep > 0) {
   Delete_Junkmail();
}

if (!$disabled) {

   print "\nDone. Report generated in $telapsed sec.\n\n";

   #Close Senmdmail if it was opened
   if ($opt{'mail'}) {
        select $oldfh;
        close (SENDMAIL);
   }

}

#All done
exit 0;

#############################################################################
# Subroutines ###############################################################
#############################################################################

########################################
# Process parms                        #
########################################
sub parse_arg {
  my $startdate = shift;
  my $enddate = shift;

  my $secsinday = 86400;
  my $time = 0;

  my $start = UnixDate($startdate,"%s");
  my $end = UnixDate($enddate, "%s");

  if(!$start && !$end) {
    $end = time;
    $start = $end - $secsinday;
    return ($start, $end);
  }

  if(!$start) {
    $start = $end - $secsinday;
    return ($start, $end);
  }

  if(!$end) {
    $end = $start + $secsinday;
    return ($start, $end);
  }

  if($start > $end) {
    return ($end, $start);
  }

  return ($start, $end);

}

sub dbg {
  my $msg = shift;

  if ($opt{debug}) {
    print STDERR $msg;
  }
}


sub Delete_Junkmail {

my $deleted;
my $found;
my $junkmail_dir;
my $entry;
my $syscommand;
my $x;

        open (ORIGINAL, "$file1");      #OPEN FILE FOR READING
        my @original = <ORIGINAL>;         #READ FILE INTO AN ARRAY

        #PROCESS THE ARRAY
        foreach $x (@original) {
      
                #SPLIT THE RECORD TO RETRIEVE USER INFO  
                my @users_original = split /\|/, $x ;

                #SPLIT THE FIRST ENTRY TO RETRIEVE USERNAME AND TYPE (users/pseudonym/system)
                my @users = split /\=/, $users_original[0];

                #PROCESS THE RECORDS THAT ARE ACTUAL USERS
                if ($users[1] eq 'user') {

                        $deleted = 0;
                        $found = 0;

                        #Set path to the new mail folder
                        $junkmail_dir = "$path$users[0]$end_path_new";

                        # Now get the content list for the directory.
                        opendir(QDIR, "$junkmail_dir") or die "Couldn't read directory $junkmail_dir";
       
                        # Loop through this list looking for any *file* which hasn't been
                        # modified in the last $days_to_keep days.
                        while($entry = readdir(QDIR)) {
                                next if $entry =~ /^\./;
                                $entry = $junkmail_dir . '/' . $entry;
               
                                $syscommand = ("rm -f \"$entry\"");
                                $found++;
                       
                                if (-f $entry && (-M $entry > $days_to_keep)) {
                                        $deleted++;
                                        $found--;
               system("$syscommand");
                                }

       
                        }
                        closedir(QDIR);
       
                       
                        #Set path to the new mail folder
                        $junkmail_dir = "$path$users[0]$end_path_cur";
                        # Now get the content list for the directory.
                        opendir(QDIR, "$junkmail_dir") or die "Couldn't read directory $junkmail_dir";
                               
                        # Loop through this list looking for any *file* which hasn't been
                        # modified in the last $days_to_keep days.
                        while($entry = readdir(QDIR)) {
                                next if $entry =~ /^\./;
                                $entry = $junkmail_dir . '/' . $entry;
                                 
                                $syscommand = ("rm -f \"$entry\"");
                         
                                $found++;
                                if (-f $entry && (-M $entry > $days_to_keep)) {
                                        $deleted++;
                                        $found--;
               system("$syscommand");
                                }

                        }
                        closedir(QDIR);
         
         if (!$disabled) {                                
                           printf "Deleted %d old spam email(s) from user \"%s\" ", $deleted, $users[0];
            printf "- %d email(s) left in junkmail folder\n", $found ;
         }
                }
        }
   close ORIGINAL;
}








antivirus-stats.pl

#!/usr/bin/perl

#############################################################################
#
# This script provides daily Antivirus statistics and deletes all old
# Quarantined and Problems emails. Configuration of the script is done by the
# Antivirus Server-Manager module
#
# This script has been developed
# by Jesper Knudsen at http://sme.swerts-knudsen.dk
#
# Revision History:
#
# August 13, 2003:      Initial version
# March 23, 2006        modified for sme7 RM
#############################################################################

# internal modules (part of core perl distribution)
use Getopt::Long;
use Pod::Usage;
use POSIX qw/strftime floor/;
use Time::Local;
use Date::Manip;
use strict;
use esmith::ConfigDB;
use Sys::Hostname;

my $hostname = hostname();

#Configuration section
my %opt = ();
$opt{'logfile'} = '/var/log/amavis-ng/amavis-ng.log';      # Log file
$opt{'sendmail'} = '/usr/sbin/sendmail';   # Path to sendmail stub
$opt{'from'} = 'Admin';            # Who is the mail from
$opt{'end'} = "";
$opt{'start'} = "yesterday";
$opt{'mail'} = "admin";
$opt{'timezone'} = date +%z;
Date_Init("TZ=$opt{'timezone'}");

my $disabled = 1;
my $days_to_keep = 0;                   #How many days to keep junkmail

my $lastupdate = 'never';

our $db = esmith::ConfigDB->open
    || warn "Couldn't open SME configuration database (permissions problems?)";

unless($db->get('antivirus'))
{
 $disabled = 1;
}

my $days_to_keep = $db->get('antivirus')->prop('AutoDelete');
if ($db->get('antivirus')->prop('StatusReport') eq 'yes')
{
 $disabled = 0;
}

my $tstart = time;

# efficiency; don't rebuild the (constant) hash every loop iteration
my %month_list = ('Jan' => 0,
        'Feb' => 1,
        'Mar' => 2,
        'Apr' => 3,
        'May' => 4,
        'Jun' => 5,
        'Jul' => 6,
        'Aug' => 7,
        'Sep' => 8,
        'Oct' => 9,
        'Nov' => 10,
        'Dec' => 11);

#Local variables
my $YEAR = (localtime(time))[5]; # this is years since 1900

my $total = 0;
my $spamcount = 0;
my $hamcount = 0;
my $infectedcount = 0;
my $problemscount = 0;
my %infectedbyhour = ();
my %problemsbyhour = ();
my %spambyhour = ();
my %hambyhour = ();

my ($start, $end) = parse_arg($opt{'start'}, $opt{'end'});

#---------------------------------------
# First scan the maillog file
#---------------------------------------

#Open log file
open(LOG, "< $opt{'logfile'}") or die "Can't open $opt{'logfile'}: $!\n";

LINE: while (<LOG>) {

# Agh... this is ugly.
  if (m/
^(\w{3})\s+             # Month
(\d+)\s+                # Day
(\d\d):(\d\d):(\d\d)\s+ # HH:MM:SS
(\S+)\s+                  # Hostname?
amavis\[\d+\]:\s+        # amavis[PID]
(AMAVIS::MTA::Qmail:|Quarantining\sinfected\smessage\sto)\s+  # Status
(\S+)\s+
/x) {  # There's an extra space at the end for some reason.

    #Split line into components
    my $mon = $1;
    my $day = $2;
    my $hour = $3;
    my $min = $4;
    my $sec = $5;
    my $status = $7;
    my $issue = $8;


    # Convert to absolute time
    my $abstime = timelocal($sec, $min, $hour, $day, $month_list{$mon}, $YEAR);
    my $abshour = floor ($abstime / 3600); # Hours since the epoch

    #If date specified, only process lines matching date
    next LINE if ($abstime < $start);
    # We can assume that logs are chronological
    last if ($abstime > $end);

    if ($status eq "Quarantining infected message to") {
   $total++;
   if ($issue =~ m/problems/) {
          $problemscount++;
          $infectedbyhour{$abshour}++;
   }
   else {
          $infectedcount++;
          $infectedbyhour{$abshour}++;
   }

    } elsif ($status eq "AMAVIS::MTA::Qmail:") {
      if ($issue =~ m/Accepting/) {
       $total++;
       $hamcount++;
       $hambyhour{$abshour}++;
      }
    }
  }
}
#Done reading file
close(LOG);

#Calculate some numbers
my $totalissues = $infectedcount+$problemscount;
my $spampercent=((($totalissues)/$total) * 100) if $total;
my $problemspercent=(($problemscount/($totalissues)) * 100) if $totalissues;
my $infectedpercent=(($infectedcount/($totalissues)) * 100) if $totalissues;
my $hampercent=(($hamcount/$total) * 100) if $total;
my $hrsinperiod=(($end-$start) / 3600);
my $emailperhour=($total/$hrsinperiod) if $total;


my $oldfh;
#Open Sendmail if we are mailing it
if ($opt{'mail'} && !$disabled) {
  open (SENDMAIL, "|$opt{'sendmail'} -oi -t -odq") or die "Can't open sendmail: $!\n";
  print SENDMAIL "From: $opt{'from'}\n";
  print SENDMAIL "To: $opt{'mail'}\n";
  print SENDMAIL "Subject: Antivirus Statistics from $hostname - ",strftime("%F", localtime($start)), "\n\n";
  $oldfh = select SENDMAIL;
}

my $telapsed = time - $tstart;

show_last_freshclam_update();

if (!$disabled) {

        #Output results
        print  "Period Beginning      : ", strftime("%c", localtime($start)), "\n";
        print  "Period Ending         : ", strftime("%c", localtime($end)), "\n";
        print  "Clam Version          : ",freshclam -V;
   print  "\n";
   printf "Reporting Period : %.2f hrs\n", $hrsinperiod;
   print  "--------------------------------------------------\n";
   print  "\n";
   printf "Total emails rejected : %8d (%6.2f%%)\n", $infectedcount+$problemscount, $spampercent || 0;
        printf "             Problems : %8d (%6.2f%%)\n", $problemscount, $problemspercent || 0;
        printf "          Quarantined : %8d (%6.2f%%)\n", $infectedcount, $infectedpercent || 0;

   printf "Total emails accepted : %8d (%6.2f%%)\n", $hamcount, $hampercent || 0;
   print  "                        -------------------\n";
   printf "Total emails processed: %8d (%5.f/hr)\n", $total, $emailperhour || 0;
   print  "\n";
   show_virus_variants();
   print "Statistics by Hour\n";
   print "--------------------------------------\n";
   print "Hour              rejected    accepted\n";
   print "-------------     --------    --------\n";

   my $hour = floor($start/3600);
   while ($hour < $end/3600) {
        printf("%s      %6d    %6d\n",
       strftime("%F, %H", localtime($hour*3600)),
       $infectedbyhour{$hour} || 0, $hambyhour{$hour} || 0);
        $hour++;
   }
   print "\n";

} # not disabled

Delete_Junkmail();

if (!$disabled) {

   print "\nDone. Report generated in $telapsed sec.\n\n";

   #Close Senmdmail if it was opened
   if ($opt{'mail'}) {
        select $oldfh;
        close (SENDMAIL);
   }

}

#All done
exit 0;

#############################################################################
# Subroutines ###############################################################
#############################################################################

sub show_virus_variants
{
  my ($month, $day) = UnixDate("yesterday", "%b", "%e");
  my $mydate = "$month $day";

#  print($mydate);

  my $command = 'cat /var/log/amavis-ng/amavis-ng.log | grep "' . $mydate . '"  | grep -B 2 "Quarantining infected" | grep -v "AMAVIS" | grep -v "Quarantin" | grep -v !\-\-! | sed "s/^.*://" | sort | uniq -c | sort -r -g > /var/tmp/sme-antivirus.out';

  system ($command);

  #Open log file
  my $LOG;

  if (open(LOG, "/var/tmp/sme-antivirus.out")) {

    print("Virus Statistics by name:\n");
    print("---------------------------------------------\n");

    LINE: while (my $line=<LOG>) {
   
     if (not $line=~ m/--/) {
       print("Rejected " .$line);
     }
    }
    close(LOG);
    print("\n");
    system("rm -rf /var/tmp/sme-antivirus.out");
  }
}

sub show_last_freshclam_update
{
  my $updatefile = '/usr/share/clamav/last_update';
  if(-f "$updatefile")
  {
    if(open(FILE, "$updatefile"))
    {
      $lastupdate = <FILE>;

      if($lastupdate =~ /([A-z0-9\:\,\-\_\+\s]+)/)
      {
        $lastupdate = $1;
      }
      else
      {
        $lastupdate = 'unknown';
      }
      close(FILE);
    }
    else
    {
      warn "Cannot open last_update file (Permission problems?)";
    }
  }

#  print ("Clam Database last updated:\t" . $lastupdate);
  return '';
}

########################################
# Process parms                        #
########################################
sub parse_arg {
  my $startdate = shift;
  my $enddate = shift;

  my $secsinday = 86400;
  my $time = 0;

  my $start = UnixDate($startdate,"%s");
  my $end = UnixDate($enddate, "%s");

  if(!$start && !$end) {
    $end = time;
    $start = $end - $secsinday;
    return ($start, $end);
  }

  if(!$start) {
    $start = $end - $secsinday;
    return ($start, $end);
  }

  if(!$end) {
    $end = $start + $secsinday;
    return ($start, $end);
  }

  if($start > $end) {
    return ($end, $start);
  }

  return ($start, $end);

}

sub dbg {
  my $msg = shift;

  if ($opt{debug}) {
    print STDERR $msg;
  }
}

sub Delete_Junkmail {

my $problems_dir = '/var/spool/amavis-ng/problems';
my $quarantine_dir = '/var/spool/amavis-ng/quarantine';

my $deleted_problems = 0;
my $found_problems = 0;
my $deleted_quarantine = 0;                                                                                                                    
my $found_quarantine = 0;
my $entry;

my $report;

# Standardise the format of the directory name
die 'Path for quarantine_dir must be absolute' unless $quarantine_dir =~ /^\//;
$quarantine_dir =~ s/\/$//; # Delete trailing slash

# Now get the content list for the directory.
opendir(QDIR, $quarantine_dir) or die "Couldn't read directory $quarantine_dir";

# Loop through this list looking for any *directory* which hasn't been
# modified in the last $days_to_keep days.
# Unfortunately this will do nothing if the filesystem is backed up using tar.
while($entry = readdir(QDIR)) {
        next if $entry =~ /^\./;
        $entry = $quarantine_dir . '/' . $entry;

        if (-f $entry) {
                $found_quarantine++;
        }

        if (-f $entry && (-M $entry > $days_to_keep) && $days_to_keep > 0) {
                $deleted_quarantine++;
                $found_quarantine--;
      system("rm -f $entry");
        }
}
closedir(QDIR);

$found_quarantine = $found_quarantine/2;
if (!$disabled) {
 print "Deleted $deleted_quarantine old Quarantined email(s) - $found_quarantine email(s) left in Quarantine folder\n";
}

# Standardise the format of the directory name
die 'Path for problems_dir must be absolute' unless $problems_dir =~ /^\//;
$problems_dir =~ s/\/$//; # Delete trailing slash

# Now get the content list for the directory.
opendir(QDIR, $problems_dir) or die "Couldn't read directory $problems_dir";

# Loop through this list looking for any *directory* which hasn't been
# modified in the last $days_to_keep days.
# Unfortunately this will do nothing if the filesystem is backed up using tar.

while($entry = readdir(QDIR)) {
        next if $entry =~ /^\./;
        $entry = $problems_dir . '/' . $entry;

        if (-f $entry) {
                $found_problems++;
        }

        if (-f $entry && (-M $entry > $days_to_keep) && $days_to_keep > 0) {
                $deleted_problems++;
                $found_problems--;
      system("rm -f $entry");
        }
}

$found_problems = $found_problems/2;
if (!$disabled) {
 print "Deleted $deleted_problems old Problem email(s) - $found_problems email(s) left in Problems folder\n";
}
}
...

Offline brianr

  • *
  • 988
  • +2/-0
default Spamassassin auto-learn on PR1?
« Reply #10 on: March 27, 2006, 02:03:36 PM »
Ray

ok, I am making some progress with this, but please don't hold your breath!!

I had already done some work for jesper on these scripts so i understand them a bit already.

It only needs to analyse the /var/log/qpsmtpd/current as the /var/log/maillog does not seem to get much info about individual mails anymore.

I am thinking of merging the two so that the analysis covers Virus and Spam info in one table.  Also the junkmail deletion is already done in SME7, so I thought it would only need to give a count of junkmails left in the folders.


I am only running a "minimal" SME7 test mail server at the moment, and it does not receive any spam or viruses, so could some one email or give me access to a /var/log/qpsmtpd/current which has some spam and virus emails detected?

All suggestions and comments gratefully received.
Brian j Read
(retired, for a second time, still got 2 installations though)
The instrument I am playing is my favourite Melodeon.
.........

Offline brianr

  • *
  • 988
  • +2/-0
default Spamassassin auto-learn on PR1?
« Reply #11 on: March 28, 2006, 08:22:54 PM »
I have got a new script for people to try, the announcement is here:

http://forums.contribs.org/index.php?topic=31357.0

am still looking for some logs to test it on..
Brian j Read
(retired, for a second time, still got 2 installations though)
The instrument I am playing is my favourite Melodeon.
.........