spamassassin/user_prefs

Tue Mar 23 10:17:35 UTC 2004

Charles Howse wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Monday 22 March 2004 09:12 am, Nigel Wade wrote:
> 
>>Charles Howse wrote:
>>
>>>-----BEGIN PGP SIGNED MESSAGE-----
>>>Hash: SHA1
>>>
>>>Hi,
>>>
>>>While reading another thread, I remembered I had no custom preferences
>>>for spamassassin, and decided to create some.
>>>
>>>I use the default settings for starting spamassassin at boot, and the
>>>following filters in KMail:
>>>1. In KMail menus, select Settings->Configure Filters
>>>2. Create a new filter with filter criteria:
>>>    <any header> matches regular expression .
>>>    (the regular expression is just the character "." meaning
>>>    "any character")
>>>    and filter action:
>>>    pipe through spamc
>>>    Uncheck the box "stop processing if this filter matches"
>>>3. Add a second filter below the one created in step 2, with criteria:
>>>    <any header> contains X-Spam-Flag: YES
>>>    and action:
>>>    move to folder trash
>>>    (or whatever you want to do with your spam)
>>>    check the "stop processing..." box
>>>
>>>These filters are working fine, with the exception of those html spams
>>>with all the random words in the body when viewed in text mode.
>>>
>>>I was just wondering if anyone would like to share some _generic_
>>>preferences for ~/.spamassassin/user_prefs, or comment.
>>
>>The way to catch those is with Bayesian filtering. You need to teach the
>>Bayesian filter with sufficient messages so that it learns what is spam and
>>what is not (at least 1000 of each is a good rule of thumb for best
>>accuracy).
> 
> 
> For the sake of the original subject, I was interested in the user_prefs file.
> 
> 
> I'm periodically training it with sa -learn on the MissedSpam folder.  I'll 
> 'get there' sooner or later.
> 
> I have never seen a false positive in my FilteredSpam folder, so I see no need 
> to train it on what *is* spam.  Am I wrong?

It's most important to train it with anything it misclassifies. But it's 
still a good idea to train it with both spam and ham which it has identified 
correctly. This way its database of spam and ham is kept current. If you 
don't keep training it it will get steadily worse and worse as the spam evolves.

-- 
Nigel Wade