Thursday, March 8, 2007

Top 25 Words Used in Digg Titles

Ever wonder what the most frequently used words in Digg titles were? Well, wonder no more. Here are the results from the front page articles in the past year (March 6, 2006 - March 6, 2007).

These words are slightly edited. Words like "the" and "a" aren't that interesting, but they're included below the top 25 for verbosity.

The top 25 [noteworthy] Digg title words

Rank Word Occurrences
1 new 1923
2 2 1725
3 how 1324
4 Wii 1201
5 Google 983
6 video 897
7 Apple 893
8 Linux 779
9 Microsoft 685
10 year 623
11 free 621
12 Mac 616
13 world 611
14 top 597
15 game 541
16 PS3 537
17 iTunes 522
18 Windows 519
19 Nintendo 288
20 launch 473
21 Digg 464
22 first 460
23 web 459
24 iPod 454
25 pictures 448

Unedited top 100 Digg title words

Rank Word Occurrences
1 the 6900
2 to 6289
3 of 4345
4 a 3988
5 in 3576
6 for 3321
7 on 2538
8 and 2064
9 new 1923
10 i 1922
11 2 1725
12 is 1631
13 with 1572
14 your 1434
15 how 1324
16 wii 1201
17 you 1096
18 it 1030
19 from 986
20 google 983
21 video 897
22 apple 893
23 at 841
24 linux 779
25 3 731
26 by 719
27 be 693
28 microsoft 685
29 10 676
30 get 661
31 year 623
32 free 621
33 mac 616
34 world 611
35 not 606
36 up 604
37 top 597
38 do 587
39 1 568
40 us 542
41 game 541
42 ps3 537
43 iTunes 522
44 can 521
45 what 521
46 windows 519
47 will 507
48 an 502
49 out 499
50 why 490
51 nintendo 488
52 that 488
53 are 477
54 launch 473
55 digg 464
56 first 460
57 web 459
58 ipod 454
59 picture 448
60 more 445
61 about 442
62 bush 439
63 photo 439
64 war 433
65 5 432
66 over 432
67 say 432
68 now 426
69 all 424
70 as 418
71 go 407
72 xbox 403
73 vista 391
74 games 390
75 360 389
76 released 387
77 make 377
78 no 375
79 best 366
80 time 365
81 open 360
82 this 360
83 have 341
84 online 340
85 may 327
86 most 323
87 sony 323
88 internet 318
89 ever 317
90 computer 316
91 x 314
92 way 312
93 man 311
94 one 305
95 4 301
96 firefox 301
97 into 293
98 os 293
99 live 291
100 after 290
98 has 288
99 tv 288
100 pc 286

Notable entries just missing the list

102 ubuntu 285
103 million 282
106 source 276
108 iraq 273
112 itunes 261
114 youtube 259

2 comments:

M. said...

Someone tells me that the numbers mentioned here are incorrect. Want tell me more about the methodology used?

Bob said...

M:

I'll get around to it within the next few weeks. Briefly, though...

Digg will show the top stories for 365 days. I downloaded all pages and parsed them.

At times, they weren't valid XML/XHTML, so stories could have been uncounted.

Additionally, the pages changed over the course of the 2 days I spent downloading them, so there were probably a few stories counted twice, a few never counted.

I basically ignored that, though, because over 100,000 stories is a pretty good sampling.