Singpolyma

Technical Blog

Thinking About Decentralised SPAM Protection

Posted on

We all hate SPAM.  We all love Akismet.  GMail is also great at killing SPAM.  Why are Akismet and GMail so great?  They have huge databases of SPAM from their many users to train filters with.

Only one problem : they’re commercial and closed.  Same old story, if they go down or evil we’re screwed.

Solution : decentralise.

The way that I’ve been thinking this could work is threefold.

First off, write a plugin for WordPress/other things that logs all SPAM in the WordPress database and allows anyone to easily access this list in standard formats.  This could hook into Akismet and other solutions to track what existing solutions mark as SPAM, as well as what users manually mark as SPAM/ham.

Then, create a site that simply lists sites that are publishing SPAM data, with links.

Third, create simple server software that either scrapes sites publishing, accepts submissions of data, or has a public API for individual SPAM submissions (like Akismet) or a combination of the above.  This server could also include filter logic that trains itself and offers a public API, or that could be other servers that rely on these ones.

The big thing is that this code all be open source so that anyone can run a server.  Each server would either scrape from all publishing sites, or publishing sites could cache a lists of operating servers to submit to.  Either way, we end up with a multiple-server environment with distributed data / load.

14 Responses

factoryjoe

Exactly. BUT, you also need a trust filter to some extent, or else these databases will be overrun with false positives… If we could use the blogroll as the source for spam data, THEN we’re making progress! 😉

Aditya Mukherjee

How come I didn’t see this post before…. O_o

The bane of all open-source solution is that you can’t put food on your table through the goodwill of others alone. There is only so much software you can make open-source. Automattic has made the entire WordPress package open, so you can imagine the amount of revenue they’re losing anyway. It’ll be unfair to want them to open their only ‘closed’ revenue stream.

What do you plan to so with the spam data anyway? 😛

Leave a Response