## Saturday, March 31, 2007

### What reddit ads are going to look like: a sidebar hosted by doubleclick.net

Digging through some js, I noticed a page that reddit is going to use for ads: http://static.reddit.com/ad.html.

It isn't anything special, just an ad for Wired hosted by doubleclick.net.

Sorry for linkjacking, but this seems to be a page reddit isn't excited about users knowing about.

## Wednesday, March 28, 2007

### One of the hardest bugs to catch in C: Integer overflow

Here's some simple C code implements an array that can be accessed with some functions. (There are some casts I left out for clarity. They should be (unsigned char *).)

 static void **array, **end; int allocate (unsigned int size) ...int deallocate () ... int write (unsigned int index, void *data) { if (array + index < end) {array[index] = data; return 1; } else { return 0; } }

It does what it should: make a bounds check before writing to memory, and returning an error when the index is out of bounds.

Here's the bug: what the index + the beginning of the array overflows? The sum will overflow, resulting in a modulo equivalent value that's before the beginning of the array.

This type of bug is called an integer overflow. When writing code, sometimes the most legible order for expressions isn't the safest. The best way to prevent bugs like this is to write it out as it is above, then ask yourself what you know about the values of those numbers, and how you can insure the result will be within a certain range. Since end will always be greater or equal to array, using simple algebra, you can reorder the inequality to read (index < end - array). It might not be as easy to read, but since the end of the array is always after the beginning, i.e. greater, this code, which algebraically equivalent, won't have the same vulnerability.

For those who say this example is contrived, I have encountered real world code that requires logic like this. True, this example could just store size, but in some cases, perhaps where the elements in the array are of a dynamic size, the size can vary, so the end of the array is needed.

Remember those times in math when kids asked when they'd need to know about algebraic inequalities and modulo arithmetic? Yes, programming doesn't require lots of math, but it requires some. On one hand, writing C requires enough knowledge in computer architecture to know about integer overflows, and how they're not only good, they're needed, while on the other, it requires the mathematical background to understand and fix these problems.

## Sunday, March 25, 2007

### Damn you Firefox 2 Spellchecker!!

The Firefox spellchecker's nice most of the time, but it's suggestion algorithm isn't as good as the Microsoft one, and it strangely thinks some of the words I use aren't really words.

• Entendre
• Stoners (but not stoner)
• Advisor (but not adviser, both are correct, -or is the correct Latin form)
• Indices
• Fiance
• Fiancee
• jure/iure (but not facto)
• de (but not facto)
• omelette (but not omelet)
• doughnut
• millennia
• cultivars

That's it for now, but this list seems to be ever growing, so I'm sure there'll be a Part II.

## Tuesday, March 20, 2007

### If you thought the Verizon .01 cents was bad, this is worse

Forward: this took place prior to y2k7, and I'm sorry, I wish I recorded this, but I didn't

One of the best way to stop text message spam is to call your cell provider and get those \$0.20 back. It costs them far more than \$0.20 to give you the money back, and it encourages them to prosecute the spammers and have better spam filters.

I called Sprint, my cell provider, to get this fixed. Since my account was new, they offered to give me a free month of data access. I went for it, and once I hung up, I proceeded to start playing with my new data plan.

When I got my next month's bill, it was higher than I expected it to be, about \$50 higher. I called up their support line. When I asked, they told me I used the data plan at around 12:30 on Saturday (I forgot the day). I said that's not right, I called on Friday. Well, it turns out that they recorded the time in central standard time. I then asked when I used my data plan and was told 11:35 on Friday. I said I'm sure I used it after my phone call. I had to return a call to get a \$0.20 credit, and I created a text file with the information I needed. Interestingly, the timestamp was around 9:30.

Now the fun part: I asked them what time they're on (CST) and what time it was. The time they gave me was 3 hours from me (PST); they gave me EST. I spent 10 minutes arguing with them what time it was. I pointed them to time.gov. Java wasn't working for them. I kept trying to tell them, and they kept saying it was the wrong time. I asked where they were based, and I was told the Philippines. I told them that what happened is that they had two systems running on different time systems, and that was the reason for the mistake, and why should I pay when their system clearly has at least one time mistake.

There was some good news that came out of this. In contrast to the Verizon case, with a single call and without escalating the case (I only talked to one rep.), I was credited on my next bill for the data.

Clearly, outsourcing has problems like this. Screwing up time zones is easy enough, but there's a good chance that whoever coded it was never in at least one of the relevant time zones. This is where good customer service comes in. While this case is as outrageous as the Verizon incident, I had it resolved within half an hour.

## Monday, March 19, 2007

### Two small values related to global warming

Fact 1: If all the oil consumed were combusted, the resulting water would raise the sea level about 1 mm.

Fact 2: In burning oil, approximately 3/10,000th of the oxygen in Earth's atmosphere has been consumed. That's around 75ppm. This number was found from the number of barrels of oil consumed. We have seen the C02 ppm increase by about 100 since 1970 (about 15% of oil consumption occurred before 1970).

Corollary: Neither of these statistics account for a significant portion of either the increasing CO2 in the atmosphere or the rising sea level.

Corollary: based on the measured increase in CO2 and the calculated decrease in oxygen due to combustion of oil, 32% of the Co2 increase can be attributed to burning oil (not including coal).

Basically, these serve as a reminder of just how big the Earth is, but also that the increase in C02 in the air is very likely due to the consumption of oil and coal.

Caveat Legens:
I made a large number of simplifications. I assure you that the general idea is right; the fact that the theoretical value was near the measured value suggests that. Just don't believe this fully; it should just give you an approximate range for the actual values.

Sources:
(I had to do the math on my own. It was a combination of aspects of calculus, statistics, and chemistry. Feel free to check it for me. I often made simplifying estimates, like the oil consumed in the past 10 years and the oil consumed before 1970.)

We have consumed around 1.1 trillion barrels of oil since 1900.
http://www.gravmag.com/oil.html, http://en.wikipedia.org/wiki/World_economy

By mass, around 1/7 of oil is hydrogen. This neglects alcohols and double bonds. For methane, it's 1/4 (methane has the most hydrogen by mass of all hydrocarbons)

What's in a barrel of oil?

## Sunday, March 18, 2007

### 5 months after Firefox 2 was released, misspellings on Digg are down 10%

It's amazing when the effects of a single software release can be seen so clearly. Firefox 2 added a spell check feature that resembles MS Word's, underlining words in text input boxes that are misspelled. As 65% of the Digg community use Firefox, it shouldn't be a surprise to see an improvement in spelling.

This graph tracks the misspellings on the first page of comments on front page stories according to the dictionary provided by aspell (with the exception of the word "digg" which was ignored. Approximately 30,000 articles and 4gb of comments were processed to create this graph.

Some observations:
The decline leading up to August might be a result of users using a Firefox beta, but I doubt there would be enough early adopters to cause such a decline.

The increase in misspelled words seem to loosely correlate with the growth of Digg as graphed by Alexa. A Reddit user noted that Digg, and to a lesser extent, Reddit, are entering the Eternal September; interestingly enough, spelling suffers in September. A blogger commented that the Digg demographic consists of CS dropouts. I'm sure it has more variety than that, but the general consensus is that Digg has a large number of college students, a fact supported by this poll, all of which lend some credence to the Eternal September hypothesis.

Spelling has a high standard deviation relative to percentage of misspelled words: 0.42%. Each data point represents the quotient of the number of misspelled words in a day and the number of words in a day. On average, there are around 75-100 stories per day. with so many words making it into a single data point, it's surprising that the points were often .5% apart, and more than 1% apart at times. It took a 30 day rolling average to smooth out most of the bumps; that's around 2000 articles. Part of this is just due to the scale of the chart; normally half a percent isn't noticeable, but this chart has a maximum y value of 9%.

Caveat Legens:
Post hoc ergo propter hoc is a logical fallacy. This blog post uses it. You can't draw any solid conclusions from this graph. This is just evidence that supports a hypothesis.

### Life without a printer: How I escaped the rising cost of printer ink

One of the things I got for college was a printer. When you make the checklist of things, it's around the top; what good is your computer alone when your term paper needs to be printed? I used it a few times, but never all that often. After storing it over the summer, the ink dried out. I stored it two more summers, but I wasn't about to spend \$45 at the campus book store for ink. I finally sold it on ebay; I got about \$40 for a three year old printer (without ink) that originally sold for \$100. As a college student, \$40 is a big deal.

That was my last printer.

It wasn't my last because I had no more printing to do, it was my last because I used cheaper printers. The college I went to has many large laser printers (even a few color laser printers) throughout campus, and even one in my dorm. I learned where they were, how to print to them, and their reliability. At times, I found myself in a lecture finishing papers 20 minutes before they were due. Before I left, I printed the essay and picked it up on the way to class.

Several changes in technology have made the printer less of a necessity.
• Network ubiquity
• Network speed
• PDF ubiquity
Networks used to be too slow and rare to not need a printer; most people didn't have the ability to send a document anywhere, and even if they could, it might not be in a format the recipient can read.

Following these advances, it has become possible to outsource printer needs. Companies that do a large amount of printing can afford printers that are faster, higher quality, and use cheaper ink.

Here are some tips to survive in a somewhat more paperless world:
• Try to cut back on printing. If you need to store a copy of something, digital copies take less space, can be secured, and can be stored on a server that keeps redundant copies. Send emails when you can. That said, I think most people reading this already do those things.
• Print documents at work. This one is a little ethically questionable, but it will save you money.
• For documents, a Kinkos (and possibly other companies) offer services that allow you to print to one of their printers from home.
• Need to send a letter to someone? USPS offers a service called NetPost that will mail a document you supply.
• For pictures, there are many companies out there that will print your photos. If you print any pictures, this tip will probably save you the most money. Between the amount of ink it takes and the cost of photo paper, a company that specializes in printing photos can commoditize the process and save you money.
• Think about how often you print, and if you can even justify owning a printer. If you think the prices at Kinkos are high, consider how many pages you'd have to print to pay for that \$60 HP inkjet. They charge around \$0.10 per page, but that also includes the paper. If the life of a printer is 3 years, you'd have to print 200 pages per year to break even, and that assumes the half filled cartridge the printer came with lasts.
• If you find you really need a printer, opt for a black and white laser printer. Since you really need it, you undoubtedly print enough to justify a \$200 (at least) printer. While it isn't completely true that the per page cost of laser printers is lower than inkjets, it generally is.
Basically, printer manufacturers make money one way: take a loss on the printer, make money on the ink. You lose money two ways, you have to buy the ink, but you also have to buy the printer.

## Friday, March 16, 2007

### Funny Windows Error

Umm, terminate Windows?!

Needless to say, Dr. Watson came to visit, but Dr. Reset had to fix the problem.

## Thursday, March 15, 2007

### Ignorance Sticker

Walking to work a few days ago, I saw a car with this bumper sticker:

First off, I'm against making a statement with a car other than "look at my car." Putting something potentially offensive something worth \$20,000 is just stupid.

Medium aside, I applaud both the car owner and the sticker's creator for addressing one of Bush's policies, not treating him as the the embodiment of what liberals stand against--a catalyst unifying liberals (I'm looking at you, Rock Against Bush/Punk Voter).

Onto my actual point: ignoring the fact that all spending bills originate from the House of Representatives, the federal government has absolutely no duty to give money to students. According to this government web site,
The U.S. Constitution leaves the responsibility for public K-12 education with the states.
a statement echoing the Tenth Amendment:
The powers not delegated to the United States by the Constitution, nor prohibited by it to the States, are reserved for the States respectively, or to the people.

Lesson of the day: educate yourself on the issues before publicly presenting them. Yes, schools get federal funding, but it isn't required; all the responsibility lies with the states.

It always bugs me that newspapers and organizations endorse candidates (and in the case of Punk Voter, protest one). People should educate themselves on the candidates and issues, then decided for themselves. If you aren't informed on either a topics or candidates, don't vote for or against them (strange things can happen when a park ranger runs against an "educator" for the school board), and don't display a bumper sticker that advertises not only your political beliefs, but advertises just how informed you really are.

## Saturday, March 10, 2007

### Why the Democrats Won't Impeach Bush

First, while the democrats took control of congress after the 2006 midterm elections, they took control with a less than confidant majority of votes. In the house, the Republicans lost 3.6% of the popular vote--not exactly a landslide. Impeachment polls aren't all that common; Googling bush impeach poll doesn't yield any sites of recent mainstream media polls on the first page (contrast this to approval ratings). After the drawn out non-binding resolution to not support the action in Iraq, the current congress shows that they lack the political dedication and courage (balls, cajones, if you will) to carry through with meaningful action. Between their lack of gravitas and limited support, the second to last thing they will do is impeach Bush (the last being removing him from office).

The second reason is half comedic. Bush is a political paradox. The same people who are so quick to call him an idiot are the same who claim he's the mastermind of everything going wrong. Only Bush can not like black people, have black cabinet members, provide little immediate federal response to the attack on the WTC and be praised, provide more support to New Orleans and be cursed. When praising Bush it appropriate, he is easy to praise. When things go wrong, he's easy to blame. When jokes are needed, Bush is an idiot. When conspiracies and scandals emerge, Bush is the villain behind them. The Democrats can't impeach Bush because they could no longer place political blame so well on a single person. Back to the first reason, while Bush's ideas might not be the best, the Democrats have even less confidence in their ideas, and provided Bush is in power, when something doesn't go well, he can take the blame.

### The Product that Limits its own Sales: Condoms

Sorry this is a short post, but there isn't that much explaining to do.

The problem with selling condoms is simple: you're selling a product that explicitly tries to limit its future market. Sure- the condom manufacturers are selling something associated with something very fun and very cool, but not even tobacco companies (...they're cool, available, and *addictive*. The job is almost done for us. ) have this problem; by the time you might die, your kids are already hooked.

So condom companies have a dilemma. On one hand, selling a defective product would help future sales. On the other, selling a defective product labels your company's product as inferior, hurt current sales, and the entire plan fails.

I wonder if some actuaries figured out the ideal effectiveness: effective enough for people to not complain, but ineffective enough to boost future sales.

## Thursday, March 8, 2007

### Top 25 Words Used in Digg Titles

Ever wonder what the most frequently used words in Digg titles were? Well, wonder no more. Here are the results from the front page articles in the past year (March 6, 2006 - March 6, 2007).

These words are slightly edited. Words like "the" and "a" aren't that interesting, but they're included below the top 25 for verbosity.

The top 25 [noteworthy] Digg title words

 Rank Word Occurrences 1 new 1923 2 2 1725 3 how 1324 4 Wii 1201 5 Google 983 6 video 897 7 Apple 893 8 Linux 779 9 Microsoft 685 10 year 623 11 free 621 12 Mac 616 13 world 611 14 top 597 15 game 541 16 PS3 537 17 iTunes 522 18 Windows 519 19 Nintendo 288 20 launch 473 21 Digg 464 22 first 460 23 web 459 24 iPod 454 25 pictures 448

Unedited top 100 Digg title words

 Rank Word Occurrences 1 the 6900 2 to 6289 3 of 4345 4 a 3988 5 in 3576 6 for 3321 7 on 2538 8 and 2064 9 new 1923 10 i 1922 11 2 1725 12 is 1631 13 with 1572 14 your 1434 15 how 1324 16 wii 1201 17 you 1096 18 it 1030 19 from 986 20 google 983 21 video 897 22 apple 893 23 at 841 24 linux 779 25 3 731 26 by 719 27 be 693 28 microsoft 685 29 10 676 30 get 661 31 year 623 32 free 621 33 mac 616 34 world 611 35 not 606 36 up 604 37 top 597 38 do 587 39 1 568 40 us 542 41 game 541 42 ps3 537 43 iTunes 522 44 can 521 45 what 521 46 windows 519 47 will 507 48 an 502 49 out 499 50 why 490 51 nintendo 488 52 that 488 53 are 477 54 launch 473 55 digg 464 56 first 460 57 web 459 58 ipod 454 59 picture 448 60 more 445 61 about 442 62 bush 439 63 photo 439 64 war 433 65 5 432 66 over 432 67 say 432 68 now 426 69 all 424 70 as 418 71 go 407 72 xbox 403 73 vista 391 74 games 390 75 360 389 76 released 387 77 make 377 78 no 375 79 best 366 80 time 365 81 open 360 82 this 360 83 have 341 84 online 340 85 may 327 86 most 323 87 sony 323 88 internet 318 89 ever 317 90 computer 316 91 x 314 92 way 312 93 man 311 94 one 305 95 4 301 96 firefox 301 97 into 293 98 os 293 99 live 291 100 after 290 98 has 288 99 tv 288 100 pc 286

Notable entries just missing the list

 102 ubuntu 285 103 million 282 106 source 276 108 iraq 273 112 itunes 261 114 youtube 259