Saturday, March 31, 2007

What reddit ads are going to look like: a sidebar hosted by doubleclick.net

Digging through some js, I noticed a page that reddit is going to use for ads: http://static.reddit.com/ad.html.

It isn't anything special, just an ad for Wired hosted by doubleclick.net.

Sorry for linkjacking, but this seems to be a page reddit isn't excited about users knowing about.

Wednesday, March 28, 2007

One of the hardest bugs to catch in C: Integer overflow

Here's some simple C code implements an array that can be accessed with some functions. (There are some casts I left out for clarity. They should be (unsigned char *).)







static void **array, **end;


int allocate (unsigned int size) ...


int deallocate () ...



int write (unsigned int index, void *data)

{

if (array + index < end) {

array[index] = data;

return 1;

} else {

return 0;

}

}



It does what it should: make a bounds check before writing to memory, and returning an error when the index is out of bounds.

Here's the bug: what the index + the beginning of the array overflows? The sum will overflow, resulting in a modulo equivalent value that's before the beginning of the array.

This type of bug is called an integer overflow. When writing code, sometimes the most legible order for expressions isn't the safest. The best way to prevent bugs like this is to write it out as it is above, then ask yourself what you know about the values of those numbers, and how you can insure the result will be within a certain range. Since end will always be greater or equal to array, using simple algebra, you can reorder the inequality to read (index < end - array). It might not be as easy to read, but since the end of the array is always after the beginning, i.e. greater, this code, which algebraically equivalent, won't have the same vulnerability.

For those who say this example is contrived, I have encountered real world code that requires logic like this. True, this example could just store size, but in some cases, perhaps where the elements in the array are of a dynamic size, the size can vary, so the end of the array is needed.

Remember those times in math when kids asked when they'd need to know about algebraic inequalities and modulo arithmetic? Yes, programming doesn't require lots of math, but it requires some. On one hand, writing C requires enough knowledge in computer architecture to know about integer overflows, and how they're not only good, they're needed, while on the other, it requires the mathematical background to understand and fix these problems.

Sunday, March 25, 2007

Funny Wikipedia Image Caption

Damn you Firefox 2 Spellchecker!!

The Firefox spellchecker's nice most of the time, but it's suggestion algorithm isn't as good as the Microsoft one, and it strangely thinks some of the words I use aren't really words.

  • Entendre
  • Stoners (but not stoner)
  • Advisor (but not adviser, both are correct, -or is the correct Latin form)
  • Indices
  • Fiance
  • Fiancee
  • jure/iure (but not facto)
  • de (but not facto)
  • omelette (but not omelet)
  • doughnut
  • quesadilla
  • millennia
  • cultivars

That's it for now, but this list seems to be ever growing, so I'm sure there'll be a Part II.

Tuesday, March 20, 2007

If you thought the Verizon .01 cents was bad, this is worse

Forward: this took place prior to y2k7, and I'm sorry, I wish I recorded this, but I didn't

One of the best way to stop text message spam is to call your cell provider and get those $0.20 back. It costs them far more than $0.20 to give you the money back, and it encourages them to prosecute the spammers and have better spam filters.

I called Sprint, my cell provider, to get this fixed. Since my account was new, they offered to give me a free month of data access. I went for it, and once I hung up, I proceeded to start playing with my new data plan.

When I got my next month's bill, it was higher than I expected it to be, about $50 higher. I called up their support line. When I asked, they told me I used the data plan at around 12:30 on Saturday (I forgot the day). I said that's not right, I called on Friday. Well, it turns out that they recorded the time in central standard time. I then asked when I used my data plan and was told 11:35 on Friday. I said I'm sure I used it after my phone call. I had to return a call to get a $0.20 credit, and I created a text file with the information I needed. Interestingly, the timestamp was around 9:30.

Now the fun part: I asked them what time they're on (CST) and what time it was. The time they gave me was 3 hours from me (PST); they gave me EST. I spent 10 minutes arguing with them what time it was. I pointed them to time.gov. Java wasn't working for them. I kept trying to tell them, and they kept saying it was the wrong time. I asked where they were based, and I was told the Philippines. I told them that what happened is that they had two systems running on different time systems, and that was the reason for the mistake, and why should I pay when their system clearly has at least one time mistake.

There was some good news that came out of this. In contrast to the Verizon case, with a single call and without escalating the case (I only talked to one rep.), I was credited on my next bill for the data.

Clearly, outsourcing has problems like this. Screwing up time zones is easy enough, but there's a good chance that whoever coded it was never in at least one of the relevant time zones. This is where good customer service comes in. While this case is as outrageous as the Verizon incident, I had it resolved within half an hour.

Monday, March 19, 2007

Two small values related to global warming

Fact 1: If all the oil consumed were combusted, the resulting water would raise the sea level about 1 mm.

Fact 2: In burning oil, approximately 3/10,000th of the oxygen in Earth's atmosphere has been consumed. That's around 75ppm. This number was found from the number of barrels of oil consumed. We have seen the C02 ppm increase by about 100 since 1970 (about 15% of oil consumption occurred before 1970).

Corollary: Neither of these statistics account for a significant portion of either the increasing CO2 in the atmosphere or the rising sea level.

Corollary: based on the measured increase in CO2 and the calculated decrease in oxygen due to combustion of oil, 32% of the Co2 increase can be attributed to burning oil (not including coal).

Basically, these serve as a reminder of just how big the Earth is, but also that the increase in C02 in the air is very likely due to the consumption of oil and coal.

Caveat Legens:
I made a large number of simplifications. I assure you that the general idea is right; the fact that the theoretical value was near the measured value suggests that. Just don't believe this fully; it should just give you an approximate range for the actual values.

Sources:
(I had to do the math on my own. It was a combination of aspects of calculus, statistics, and chemistry. Feel free to check it for me. I often made simplifying estimates, like the oil consumed in the past 10 years and the oil consumed before 1970.)

We have consumed around 1.1 trillion barrels of oil since 1900.
http://www.gravmag.com/oil.html, http://en.wikipedia.org/wiki/World_economy

By mass, around 1/7 of oil is hydrogen. This neglects alcohols and double bonds. For methane, it's 1/4 (methane has the most hydrogen by mass of all hydrocarbons)

What's in a barrel of oil?

Sunday, March 18, 2007

5 months after Firefox 2 was released, misspellings on Digg are down 10%

It's amazing when the effects of a single software release can be seen so clearly. Firefox 2 added a spell check feature that resembles MS Word's, underlining words in text input boxes that are misspelled. As 65% of the Digg community use Firefox, it shouldn't be a surprise to see an improvement in spelling.

About the methodology:
This graph tracks the misspellings on the first page of comments on front page stories according to the dictionary provided by aspell (with the exception of the word "digg" which was ignored. Approximately 30,000 articles and 4gb of comments were processed to create this graph.

Some observations:
The decline leading up to August might be a result of users using a Firefox beta, but I doubt there would be enough early adopters to cause such a decline.

The increase in misspelled words seem to loosely correlate with the growth of Digg as graphed by Alexa. A Reddit user noted that Digg, and to a lesser extent, Reddit, are entering the Eternal September; interestingly enough, spelling suffers in September. A blogger commented that the Digg demographic consists of CS dropouts. I'm sure it has more variety than that, but the general consensus is that Digg has a large number of college students, a fact supported by this poll, all of which lend some credence to the Eternal September hypothesis.

Spelling has a high standard deviation relative to percentage of misspelled words: 0.42%. Each data point represents the quotient of the number of misspelled words in a day and the number of words in a day. On average, there are around 75-100 stories per day. with so many words making it into a single data point, it's surprising that the points were often .5% apart, and more than 1% apart at times. It took a 30 day rolling average to smooth out most of the bumps; that's around 2000 articles. Part of this is just due to the scale of the chart; normally half a percent isn't noticeable, but this chart has a maximum y value of 9%.

Caveat Legens:
Post hoc ergo propter hoc is a logical fallacy. This blog post uses it. You can't draw any solid conclusions from this graph. This is just evidence that supports a hypothesis.

Life without a printer: How I escaped the rising cost of printer ink

One of the things I got for college was a printer. When you make the checklist of things, it's around the top; what good is your computer alone when your term paper needs to be printed? I used it a few times, but never all that often. After storing it over the summer, the ink dried out. I stored it two more summers, but I wasn't about to spend $45 at the campus book store for ink. I finally sold it on ebay; I got about $40 for a three year old printer (without ink) that originally sold for $100. As a college student, $40 is a big deal.

That was my last printer.

It wasn't my last because I had no more printing to do, it was my last because I used cheaper printers. The college I went to has many large laser printers (even a few color laser printers) throughout campus, and even one in my dorm. I learned where they were, how to print to them, and their reliability. At times, I found myself in a lecture finishing papers 20 minutes before they were due. Before I left, I printed the essay and picked it up on the way to class.

Several changes in technology have made the printer less of a necessity.
  • Network ubiquity
  • Network speed
  • PDF ubiquity
Networks used to be too slow and rare to not need a printer; most people didn't have the ability to send a document anywhere, and even if they could, it might not be in a format the recipient can read.

Following these advances, it has become possible to outsource printer needs. Companies that do a large amount of printing can afford printers that are faster, higher quality, and use cheaper ink.

Here are some tips to survive in a somewhat more paperless world:
  • Try to cut back on printing. If you need to store a copy of something, digital copies take less space, can be secured, and can be stored on a server that keeps redundant copies. Send emails when you can. That said, I think most people reading this already do those things.
  • Print documents at work. This one is a little ethically questionable, but it will save you money.
  • For documents, a Kinkos (and possibly other companies) offer services that allow you to print to one of their printers from home.
  • Need to send a letter to someone? USPS offers a service called NetPost that will mail a document you supply.
  • For pictures, there are many companies out there that will print your photos. If you print any pictures, this tip will probably save you the most money. Between the amount of ink it takes and the cost of photo paper, a company that specializes in printing photos can commoditize the process and save you money.
  • Think about how often you print, and if you can even justify owning a printer. If you think the prices at Kinkos are high, consider how many pages you'd have to print to pay for that $60 HP inkjet. They charge around $0.10 per page, but that also includes the paper. If the life of a printer is 3 years, you'd have to print 200 pages per year to break even, and that assumes the half filled cartridge the printer came with lasts.
  • If you find you really need a printer, opt for a black and white laser printer. Since you really need it, you undoubtedly print enough to justify a $200 (at least) printer. While it isn't completely true that the per page cost of laser printers is lower than inkjets, it generally is.
Basically, printer manufacturers make money one way: take a loss on the printer, make money on the ink. You lose money two ways, you have to buy the ink, but you also have to buy the printer.

Friday, March 16, 2007

Funny Windows Error


Umm, terminate Windows?!

Needless to say, Dr. Watson came to visit, but Dr. Reset had to fix the problem.

Thursday, March 15, 2007

Ignorance Sticker

Walking to work a few days ago, I saw a car with this bumper sticker:


First off, I'm against making a statement with a car other than "look at my car." Putting something potentially offensive something worth $20,000 is just stupid.

Medium aside, I applaud both the car owner and the sticker's creator for addressing one of Bush's policies, not treating him as the the embodiment of what liberals stand against--a catalyst unifying liberals (I'm looking at you, Rock Against Bush/Punk Voter).

Onto my actual point: ignoring the fact that all spending bills originate from the House of Representatives, the federal government has absolutely no duty to give money to students. According to this government web site,
The U.S. Constitution leaves the responsibility for public K-12 education with the states.
a statement echoing the Tenth Amendment:
The powers not delegated to the United States by the Constitution, nor prohibited by it to the States, are reserved for the States respectively, or to the people.

Lesson of the day: educate yourself on the issues before publicly presenting them. Yes, schools get federal funding, but it isn't required; all the responsibility lies with the states.

It always bugs me that newspapers and organizations endorse candidates (and in the case of Punk Voter, protest one). People should educate themselves on the candidates and issues, then decided for themselves. If you aren't informed on either a topics or candidates, don't vote for or against them (strange things can happen when a park ranger runs against an "educator" for the school board), and don't display a bumper sticker that advertises not only your political beliefs, but advertises just how informed you really are.

Saturday, March 10, 2007

Why the Democrats Won't Impeach Bush

First, while the democrats took control of congress after the 2006 midterm elections, they took control with a less than confidant majority of votes. In the house, the Republicans lost 3.6% of the popular vote--not exactly a landslide. Impeachment polls aren't all that common; Googling bush impeach poll doesn't yield any sites of recent mainstream media polls on the first page (contrast this to approval ratings). After the drawn out non-binding resolution to not support the action in Iraq, the current congress shows that they lack the political dedication and courage (balls, cajones, if you will) to carry through with meaningful action. Between their lack of gravitas and limited support, the second to last thing they will do is impeach Bush (the last being removing him from office).

The second reason is half comedic. Bush is a political paradox. The same people who are so quick to call him an idiot are the same who claim he's the mastermind of everything going wrong. Only Bush can not like black people, have black cabinet members, provide little immediate federal response to the attack on the WTC and be praised, provide more support to New Orleans and be cursed. When praising Bush it appropriate, he is easy to praise. When things go wrong, he's easy to blame. When jokes are needed, Bush is an idiot. When conspiracies and scandals emerge, Bush is the villain behind them. The Democrats can't impeach Bush because they could no longer place political blame so well on a single person. Back to the first reason, while Bush's ideas might not be the best, the Democrats have even less confidence in their ideas, and provided Bush is in power, when something doesn't go well, he can take the blame.

The Product that Limits its own Sales: Condoms

Sorry this is a short post, but there isn't that much explaining to do.

The problem with selling condoms is simple: you're selling a product that explicitly tries to limit its future market. Sure- the condom manufacturers are selling something associated with something very fun and very cool, but not even tobacco companies (...they're cool, available, and *addictive*. The job is almost done for us. ) have this problem; by the time you might die, your kids are already hooked.

So condom companies have a dilemma. On one hand, selling a defective product would help future sales. On the other, selling a defective product labels your company's product as inferior, hurt current sales, and the entire plan fails.

I wonder if some actuaries figured out the ideal effectiveness: effective enough for people to not complain, but ineffective enough to boost future sales.

Thursday, March 8, 2007

Top 25 Words Used in Digg Titles

Ever wonder what the most frequently used words in Digg titles were? Well, wonder no more. Here are the results from the front page articles in the past year (March 6, 2006 - March 6, 2007).

These words are slightly edited. Words like "the" and "a" aren't that interesting, but they're included below the top 25 for verbosity.

The top 25 [noteworthy] Digg title words

Rank Word Occurrences
1 new 1923
2 2 1725
3 how 1324
4 Wii 1201
5 Google 983
6 video 897
7 Apple 893
8 Linux 779
9 Microsoft 685
10 year 623
11 free 621
12 Mac 616
13 world 611
14 top 597
15 game 541
16 PS3 537
17 iTunes 522
18 Windows 519
19 Nintendo 288
20 launch 473
21 Digg 464
22 first 460
23 web 459
24 iPod 454
25 pictures 448

Unedited top 100 Digg title words

Rank Word Occurrences
1 the 6900
2 to 6289
3 of 4345
4 a 3988
5 in 3576
6 for 3321
7 on 2538
8 and 2064
9 new 1923
10 i 1922
11 2 1725
12 is 1631
13 with 1572
14 your 1434
15 how 1324
16 wii 1201
17 you 1096
18 it 1030
19 from 986
20 google 983
21 video 897
22 apple 893
23 at 841
24 linux 779
25 3 731
26 by 719
27 be 693
28 microsoft 685
29 10 676
30 get 661
31 year 623
32 free 621
33 mac 616
34 world 611
35 not 606
36 up 604
37 top 597
38 do 587
39 1 568
40 us 542
41 game 541
42 ps3 537
43 iTunes 522
44 can 521
45 what 521
46 windows 519
47 will 507
48 an 502
49 out 499
50 why 490
51 nintendo 488
52 that 488
53 are 477
54 launch 473
55 digg 464
56 first 460
57 web 459
58 ipod 454
59 picture 448
60 more 445
61 about 442
62 bush 439
63 photo 439
64 war 433
65 5 432
66 over 432
67 say 432
68 now 426
69 all 424
70 as 418
71 go 407
72 xbox 403
73 vista 391
74 games 390
75 360 389
76 released 387
77 make 377
78 no 375
79 best 366
80 time 365
81 open 360
82 this 360
83 have 341
84 online 340
85 may 327
86 most 323
87 sony 323
88 internet 318
89 ever 317
90 computer 316
91 x 314
92 way 312
93 man 311
94 one 305
95 4 301
96 firefox 301
97 into 293
98 os 293
99 live 291
100 after 290
98 has 288
99 tv 288
100 pc 286

Notable entries just missing the list

102 ubuntu 285
103 million 282
106 source 276
108 iraq 273
112 itunes 261
114 youtube 259