Friday, November 09, 2007

Filtering Stupidity on the Internet

The StupidFilter project aims to create an open-source Bayesian filter to "detect rampant stupidity in written English." This filter could be used on internet comment boards to eliminate comments that have little or no meaningful content. For examples of the sort of thing they're looking to eliminate, you can view a random entries from their stupidity database, collected from comments on YouTube.

A few questions and answers from their FAQ:
Isn't filtering stupidity elitist?

Yes. Yes, it is. That's sort of the whole point.

So what do you plan to filter?

The idea is that the most egregiously stupid comments will also be the easiest to detect while remaining ignorant of context; comments with too much or too little capitalization, too many text-message abbreviations, excessive use of "LOL," exclamation points, and so on.

Won't people just try to defeat the filter, the way spammers try to get around spam filtering?

We certainly hope they will -- that implies they're no longer generating text statistically likely to be stupid. It's true that an obvious attack on the StupidFilter would be to salt a short, stupid comment with a long excerpt copy-pasted from, say, Project Gutenberg, but we think it's reasonable to count on the laziness of the stupidest commenters not to do this.

Aren't you just trying to eliminate comments and discourse that you consider to be stupid?

As much as that might be nice, no. The StupidFilter does not understand, in a meaningful sense, the text that it parses, and our graders select comments that are formally stupid -- that is, their diction, not their content, marks them as stupid. It is not our intent to eliminate debate or disagreement, but rather to programmatically enforce a certain quality of expression. Put another way: The StupidFilter will cheerfully approve an eloquent, properly-capitalized defense of mandatory, state-subsidized rocket-launcher ownership for all schoolchildren.