Sunday, March 18, 2007

5 months after Firefox 2 was released, misspellings on Digg are down 10%

It's amazing when the effects of a single software release can be seen so clearly. Firefox 2 added a spell check feature that resembles MS Word's, underlining words in text input boxes that are misspelled. As 65% of the Digg community use Firefox, it shouldn't be a surprise to see an improvement in spelling.

About the methodology:
This graph tracks the misspellings on the first page of comments on front page stories according to the dictionary provided by aspell (with the exception of the word "digg" which was ignored. Approximately 30,000 articles and 4gb of comments were processed to create this graph.

Some observations:
The decline leading up to August might be a result of users using a Firefox beta, but I doubt there would be enough early adopters to cause such a decline.

The increase in misspelled words seem to loosely correlate with the growth of Digg as graphed by Alexa. A Reddit user noted that Digg, and to a lesser extent, Reddit, are entering the Eternal September; interestingly enough, spelling suffers in September. A blogger commented that the Digg demographic consists of CS dropouts. I'm sure it has more variety than that, but the general consensus is that Digg has a large number of college students, a fact supported by this poll, all of which lend some credence to the Eternal September hypothesis.

Spelling has a high standard deviation relative to percentage of misspelled words: 0.42%. Each data point represents the quotient of the number of misspelled words in a day and the number of words in a day. On average, there are around 75-100 stories per day. with so many words making it into a single data point, it's surprising that the points were often .5% apart, and more than 1% apart at times. It took a 30 day rolling average to smooth out most of the bumps; that's around 2000 articles. Part of this is just due to the scale of the chart; normally half a percent isn't noticeable, but this chart has a maximum y value of 9%.

Caveat Legens:
Post hoc ergo propter hoc is a logical fallacy. This blog post uses it. You can't draw any solid conclusions from this graph. This is just evidence that supports a hypothesis.

No comments: