Spamassassin behaving strangely

Fri Jan 14 05:21:19 UTC 2011

On Thu, 2011-01-13 at 22:01 -0500, Genes MailLists wrote:
>   Its great to use procmail - however if spamassasin has tagged
> something as spam - there is no point in running sa-learn --spam on the
> what SA has already identified as spam - only use it on things it missed.

Training a bayesian filter with messages that it has classified
correctly is still beneficial.

Imagine a filter that was trained with exactly one ham and one spam
message. It's probably not going to do a very good job - but imagine you
get lucky and it correctly classifies a large number of messages.

Training the filter with those messages will still make it a much better
filter, as it will then have a much better idea of what your ham and
spam look like (i.e. it has a larger statistical sample to analyze).

That's the whole point of Spamassassin's autolearn plugin. It will
automatically train itself as it filters, as long as a message exceeds a
certain spam or ham threshold.

You can configure it with:

bayes_auto_learn 1
bayes_auto_learn_threshold_spam    12.0
bayes_auto_learn_threshold_nonspam 0.1

Those are the default threshold values, you can change them to whatever
you want. Note that autolearning spam requires at least 3 points from
the header, and 3 points from the body, so setting the spam threshold
lower than 6 will be the same as setting it to exactly 6.

You can see if your messages were autolearned by looking at the headers;
you will see one of these:

autolearn=ham
autolearn=spam
autolearn=no

Brian