<<< Chronological >>> Author Index    Subject Index <<< Threads >>>

Re: automated spam detection


On Tue, 16 Feb 1999, Richard Kettlewell wrote:

> It'd be useful to have the same kind of ability with email.  To make
> this work there would have to be a central location where MTAs could
> register, in real time, some kind of identifying information about the
> messages they received: the MD5 hash of the body and selected header
> fields might be a good place to start.
> 
> Once enough instances of the same hash had been registered, the
> central server could notify subscribing MTAs that a particular message
> was deemed bulk mail, and they could then choose to ignore it.
> 

It would, however, mean that each site had to agree to the central
server's idea of exactly when something should be considered spam. 

I heard an idea very similar to this one although in that scenario,
each SMTP server would contact the central server, tell it something
like "I got a mail with message body checksum A3F60E55" whereupon 
the central server would say "1435", indicating it had been
told the exact same thing 1435 times by other SMTP servers. Internally
it would of course update its database so that the number now said
"1436". There could be lots of stuff an SMTP server could ask about, 
like various headers (From:, To:, Subject:, Received:) or the size
of the mail, etc. One could even ask stuff like "I got a message with
the checksum AABBCCDD - is this message considered a spam message by
<insert your trusted party here>" which means there could be networks
of spam-detecting agencies whose opinions you could rely on to filter
spam from your network. There are probably varied opinions on whether
this is useful or not but still. The good thing about this is that it
would always be up to the SMTP server itself to decide when something
is spam or not - the central server would just be providing information
to help it in that decision.

The thing I didn't like about the idea, however, was that it required an
SMTP server to connect to the central server once for each message it
received - of course, it should be done with UDP or something but still,
the idea was that the SMTP server, when receiving a mail, would wait for
the response from the central server before accepting it for delivery
and I think that would lead to unacceptable congestion no matter how
distributed and how much computing power you throw at the centralized
server system. To do it like you suggest, letting the central server
update a client when something is spam, would perhaps work better in that
respect but it would, as I wrote above, mean that the central server
decided for the participants what was spam and what wasn't and I don't
think that's something all sites are willing to accept.

It's a difficult problem. I've been advocating a more loose trust-
based system where a site has its own filtering rules and informs
another site that trusts it when it has encountered something it
considers or suspects is spam. E.g. site A receives 51 mail messages with
identical message bodies and is configured to automatically warn its
'friends' about messages that appear more than 50 times but it might not 
consider a message spam and start rejecting it before it appears 100
times. It warns system B and tells B that "Message AABBCCDD has been seen
here 51 times" which means that B can immediately increase *its* counter
for that message by 51, which might mean B feels obligated to warn C or
maybe the counter gets high enough that B starts thinking of that message
as pure spam. Message body checksums are of course only one way of
detecting (some) spam. There are lots of other variables that could
be used.

I think that with good software and flexible filtering options, loose
trust-based networks like this could work quite well without requiring
that people rely on some gigantic centralized system. Then again, I've
never been a fan of centralized systems in the first place so maybe I'm
just biased :-)

  /Ragnar





<<< Chronological >>> Author    Subject <<< Threads >>>