March 3, 2005
By Karen Kenworthy
IN THIS ISSUE
I would have sworn we'd gotten together sometime during the last month. But when I settled down to write tonight, I realized we haven't talked since January 25th! Time sure flies.
What happened? Well, when a computer programmer disappears for a long time, there are only a few possible explanations:
1. A lengthier-than-usual alien abduction. 2. A lavish, extended vacation. 3. Several round-the-clock programming sessions.
Seriously, for a true nerd, there's only one possible explanation -- number three.
And so it was with me. I started last month working on a couple of long- range projects. These are the sorts of programming puzzles that take days, weeks, and even months to unravel. I turn to one of these every chance I get, hoping to finish them some day.
But then, a couple of weeks ago, an e-mail message from Michael Horowitz appeared in my in-box. Innocently, he said "I just wanted to let you know that I mentioned your URL Discombobulator program" in a new article posted on his personal web site.
The link he provided was intriguing:
And so was the article found there. "Links that Lie" is an excellent explanation of how spammers, and other ne'er-do-wells, disguise the addresses of their evil web sites, making them appear to be innocent, often well-known internet destinations.
We've talked before about how to defeat these tricks. In fact, my URL Discombobulator program was created just for that purpose. Give it a suspect URL, and it peels back layers of deception, exposing the actual Internet hideouts of these scoundrels.
I won't go into detail, here, about these tricks. You can read my previous comments on the subject in back issues of this newsletter. Look for their links on the URL Discombobulator's home page at:
And be sure to check out Michael great article on the subject, and the link mentioned above.
Running In Circles
When you read Michael's article, you'll see he provides several examples of shrouded URLs -- links that deliberately use obscure web browser features to hide their true destinations. In all but one case, Michael shows how, with a little work or the help of a good program, you can unmask these deceptive links.
But one URL stumped Michael. Worse, it stumped the URL Discombobulator too. There was no doubt the URL was hiding something. But what? Clearly, the Discombobulator wasn't finished. And I had some work to do ...
[Alert: Portions of the explanation are rated N5 on the open-ended Nerdiness Scale. Reading this passage could cause temporary dizziness or disorientation, especially among individuals weakened by previous exposure to excessive Nerdiness levels. On the other hand, it could make you the life of your next party or family get-together.]
I won't show the actual naughty URL here (I don't want anyone to click the nefarious link and accidentally visit its site). But here are the first few characters of the link:
Notice how the reference to www.google.com is repeated? The weasel who devised this URL is taking advantage of a feature of the www.google.com site called "redirection". Google allows you to pass a new web address as part of a link. Place the new address after the sequence "url?q=", and Google will quietly redirect your web browser to that new site.
In the example shown above, the "new" site is the same as the old -- www.google.com. In fact, in the example Michael found, www.google.com appears three times, meaning a browser would take three quick trips to www.google.com. Not until the end of the third trip would Google finally redirect your browser to the sneaky URL's final destination.
As you've probably guessed, the new URL Discombobulator sees through these redirections, correctly identifying the final destination. In addition to redirection via Google.com's site, the program also recognizes redirections through Yahoo.com and Citibank.com. As a bonus, it even reports the number of times a dishonest URL will redirect your browser, before letting it finally come to rest.
Breaking The Code
The Discombobulator had to learn one more trick to completely decode Michael's troublesome URL.
To understand, let's take a look at the first several characters of the link's final destination, after all its redirections have been removed:
Weird, isn't it?
At first glance, this looks like an old trick, called "hexadecimal encoding" (or "hex encoding"). This little-known browser feature allows characters to be replaced by a single percent sign ("%"), followed by exactly two "hexadecimal digits" (any of the normal digits 0 through 9, plus the letters A through F).
For example, a link might contain the sequence "%4B". Your browser will translate this to a single letter "K", since 4B is the hexadecimal representation of the capital letter K.
The Discombobulator has long understood this trick. It automatically translates hex encoded characters into their more familiar counterparts. The program even displays an "ASCII Code Table", revealing the hexadecimal equivalents of all common characters.
But the mystery portions of Michael's link aren't ordinary hexadecimal substitutions. In the fragment I showed a moment ago, there are three consecutive percent signs, not one. And they are followed by four hex digits, not two.
So, what do these odd strings mean, and how can they be decoded?
After a little head-scratching, the answer came to me. Remember, the original URL is redirected three times. As a result, its characters are processed by your browser three times. And each time through the browser, a few incremental changes are made.
For example, the first time around your browser correctly identifies three properly hex encoded characters. The first, "%33", is translated into the single character "3". The second, "%37", becomes the single character "7". And the final sequence, "%35", becomes the digit "5".
After these replacements, our original text looks like this:
Because of redirections, our browser now processes this resulting text. Once again, it makes three substitutions: "%34" becomes "4", "%74" becomes the lower-case letter "t", and the capital letter "T" replaces "%54".
Now, after this second round of replacements, our text is beginning to come into focus. Here's what we have now, ready for a third pass through our browser (thanks to the final redirection):
All that's left is one more substitution. This time, the sequence "%48" is replaced by its equivalent, the letter "H". Finally, the characters at the beginning of our URL looks like this:
Allowing for a little juvenile license, mixing upper- and lower-case letters, it's exactly what we'd expect to see! Our discombobulated URL begins with the familiar "http://" prefix, indicating the link will take us to a web site.
Naturally, the new URL Discombobulator can now see through this repetitive hex encoding masquerade. And there's more to say about the new version -- especially its new "WhoIs" button!
But that will have to wait until we meet again. Until then, why not drop by the URL Discombobulator's home page and give the new version a try? It's easy. Just follow this (already discombobulated) link:
As always, the program is free for personal/home use. If you're a programmer, you can download the updated Visual Basic source code too!
Better yet, get the latest version of every Power Tool on a brand-new, shiny CD. You'll even get three bonus Power Tools, not available anywhere else. The source code of every Power Tool, the text of every issue of my newsletter, and some of my articles written for Windows Magazine, are also included. And owning the CD grants you a special license to use all my Power Tools at work.
Best of all, buying a CD is the easiest way to support the KarenWare.com web site, Karen's Power Tools, and this newsletter! To find out more, visit:
Until we meet again, don't take any rides on alien spacecraft, unless you know the aliens really well. And if you see me on the 'net, be sure to wave and say "Hi!"