July 26, 2005

By Karen Kenworthy

IN THIS ISSUE

Whew! Up here in the northern hemisphere, summer has definitely arrived. Just outside the protective walls of the secluded Power Tools workshop, experts predict the temperature may reach 103 degrees Fahrenheit today!

Those experts should come inside, where it's always nice and cool. And if you're caught outside in this heat too, come on over. There's an icy pitcher of lemonade, and a cool plate of fresh sugar cookies, just waiting to be shared ...

Yum! More Hash!

A warm serving of corned beef hash may not be the most appetizing meal during the heat of the summer. But longtime readers may still remember Karen's Hasher fondly. It's a cool little program that computes something called a "hash value" -- a small number representing a bit of text or the data found in a file.

Hash values, or "hashes", are amazing numbers. They serve as digital fingerprints, uniquely identifying the data used to compute them.

If two files have the same hash value, the contents of the two files are almost certainly identical too. But if the contents of two files differ, even by just one bit among billions of billions, their hash values will be dramatically different.

Thanks to this exquisite sensitivity to differences, hashes are often used protect the integrity of evidence. Detectives routinely use programs like my Hasher to compute the hash value of every computer file they uncover, before they examine the file further.

Later, during a trial, they can prove the file's contents haven't changed while in police custody. If the file's current hash value is the same as the one computed the day the file was discovered, no changes have been made.

You and I can use hashes to detect changes too. For example, ask my Hasher to compute the hash value of every file in your \Windows folder, and store the results in a disk file. Later, you can ask the Hasher to those hashes again, and compare the new hashes to the older values. If any hash value has changed, the content of the corresponding file has changed too.

Hashes can also be used to rapidly compare files "at a distance". It's a trick used by my Web Update program, and by similar services such as Windows Update too.

These services automatically distribute and install recently updated programs and other files. But how can a computer or web site, located thousands of miles away, know if your computer needs a new version of a particular file?

It's a simple trick. First, the remote computer, where the latest version of every file is stored, sends a short list. The list includes the names of the files it can provide, and the latest hash value of each.

Armed with this master list, software running on your computer calculates hash values for your copies of the files on the list. Finally, it compares the hashes it computes for your files, to the hashes appearing in the list.

If a file's two hash values are the same, your copy of the file is identical to the copy residing on the remote server. In other words, your copy of the file is up-to-date.

But if a file's two hash values differ, your version of the file is not the same as the latest version residing on the distant server. You have an old version of the file. Software running on your computer can then automatically request the latest version of the file, and install it on your computer when it arrives.

Thanks to hashes, only the files you actually need are transmitted to your computer. And no information about your current files is sent to the faraway server. Instead, it sends information about its files, and your computer does the rest!

Almost Certainly?

The sharp-eyed among us noticed I used the phrase "almost certainly" a moment ago. Yes, it's true. In theory, two files can have the same hash value, even if their contents differ.

But such a "collision" is very unlikely. Even the simplest popular hash formula, known as MD5 (Message Digest version 5) produces hash values consisting of 128 binary ones and zeroes. The yields an unimaginable number of unique hash values -- a total of 340,282,366,920,938,463,463,374,607,431,768,211,455 to be exact.

If every person on earth frantically created 1,000 new disk files every second, it would take 1,797,100,385,499,003,074 years to create that many different files -- one for every available hash value. And frankly, I suspect a few less-motivated folks would give up early, forcing the rest of us to work even longer.

Still, some people worry about risk of hash collisions. Accidental collisions, though possible, are not a serious problem. But intentional collisions -- one file deliberately created to have the same hash value as another -- are another matter. They can upset some encryption schemes, and also upset the folks who rely upon them.

That's one reason why mathematicians have worked hard to improve "hash algorithms", the formulas used to compute hash values. Today, there are six popular hash formulas, each providing more security, and a smaller chance of collisions, than the one before it.

We've already mentioned the oldest popular hash algorithm, MD5. It's still widely used today. But recently, members of a new family of algorithms, known as SHA (Secure Hash Algorithms), have begun to replace MD5.

The oldest member of the SHA family is now known as SHA-1. Its hash values contain a total of 160 bits. The remaining SHA formulas have names that indicate the number of ones and zeroes their hash values contain: SHA-224, SHA-256, SHA-384 and SHA-512.

The new version of my Hasher can compute all six types of hash values. It's your choice how many bits you'd like to receive. And just in case you're wondering how many different 512-bit values the SHA-512 algorithm can produce, here it is:

13,407,807,929,942,597,099,574,024,998,205,846,127,479,365, 820,592,393,377,723,561,443,721,764,030,073,546,976,801,874, 298,166,903,427,690,031,858,186,486,050,853,753,882,811,946, 569,946,433,649,006,084,096

By some estimates, that's 2,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,000 times as many atoms in the known universe!

Verify and Associate

I mentioned earlier that some folks use hashing programs to detect changes to files on their disk drives. Until recently, this process involved a bit of hand work. But the newest version of my Hasher makes this job easy.

First, ask the program to compute the current hash values of any folders and files whose content you want to protect. Then save those computations to a disk file, by clicking the Hasher's "Save Results to Disk" button.

Later, when you want to test your files, simply click the Hasher's "Verify Saved Hash File" tab then ask the program to open the file where you saved the earlier results. Finally click the program's "Verify Hashes" button.

That's it! The program will automatically compute the current hash value of each file, and compare it to the hash value computed earlier. Any discrepancies are noted on-screen. If none are found, the files are all unchanged.

There's also a way to make hash verifications even easier. If you click the Hasher's "Settings" tab you'll see new checkboxes allowing you to "Associate" the program with any of the common hash filename extensions (.md5, .sha1, .sha224, .sha256, .sha384 and .sha512).

Once you've asked the program to make one of these associations, just double-clicking a saved hash file's icon will automatically run my Hasher, and cause it to load the file ready for verification!

This trick lets you quickly verify hash files you've saved, and also those that sometimes accompany large downloadable files from web sites such as The Gutenberg Project:

    http://www.gutenberg.org/

If you'd like to put the new Hasher through its paces, download your copy from its home page at:

    https://www.karenware.com/powertools/pthasher

As always, the program is free for personal/home use. If you're a programmer, you can download its complete Visual Basic source code too!

You can also get the latest version of every Power Tool, including the new Hasher, on a shiny CD. These include three bonus Power Tools, not available anywhere else. The source code of every Power Tool, every issue of my newsletter, and some articles I wrote for Windows Magazine, are also on the CD. And owning the CD grants you a license to use all my Power Tools at work.

Best of all, buying a CD is the easiest way to support the KarenWare.com web site, Karen's Power Tools, and this newsletter. To find out more, visit:

    https://www.karenware.com/licenseme

Until we meet again, stay out of the sun, and drink lots of lemonade. And if you see me on the 'net, be sure to wave and say "Hi!"