February 10, 2006
By Karen Kenworthy
IN THIS ISSUE
Who is that stranger, standing beside my brother Bill? He's friendly enough. And he's having a good time. You'd think he was a member of the family. But if I'll be darned if I know who he is.
On the other hand, his quick smile and easy laugh are mighty familiar. The way he stands, moves his hands. And look at those twinkling eyes! He reminds me a lot of Bill, and his youngest son.
Wait a minute! This is no stranger. In the last few months, he's grown several inches. He's thinner too, and his face has changed. But there's no doubt about it. The handsome young man standing before me has to be my nephew, and Bill's son, Patrick!
With so many nephews and nieces, these sorts of stories aren't uncommon. If I don't see one of these kids at least every few weeks, they can change beyond recognition. Identifying who's giving me a hug at a family gathering can be quite a challenge.
Fortunately, our computer friends don't have to cope with Patrick's changing size and appearance. They recognize him by his user name and password, which don't change nearly as often as his shoe size.
But computers do have their own special identity problems. Countless times each day, a computer must trust that data read from a disk is exactly the same data it wrote to that disk a few minutes, hours, or even days before. Ones and zeros sent to a printer, received from a mouse, or sailing across the Internet, are expected to travel safely, arriving at their destinations intact - identical to the bits that started the journey.
Unfortunately, these hopes and dreams are sometimes dashed. Information stored on disk drives can mutate if the disk's speed of rotation fluctuates. The disk's delicate coating, where bits are represented by small magnetized regions, can have microscopic flaws that unexpectedly alter data stored on the disk.
When traveling over wires, data can be corrupted by stray signals from radios, electric motors, magnets, and even the computer itself. Data stored in a computer's memory can be changed by power glitches, or even cosmic rays.
On the surface, our computers appear able to display pictures, play music, and manage text. But we're not fooled by this slight of hand. We know that, underneath, our binary buddies are only capable of dealing with 1s and 0s - what we call "bits".
And it's these 1s and 0s that must remain intact if our computer data is to be preserved. If a 0 improperly flips and becomes a 1, or a 1 is scrambled and turns into a 0, the data has damaged. Good bits can also disappear completely, or bogus bits can mysteriously appear among the valid ones (and zeros).
The effect of these data errors depend on several factors, including the number of bits with the wrong value, and the type of data that's gone bad.
A single wrong bit in a program file might make the program crash each time it runs. Even worse, the program might appear to run correctly, but produce wrong results.
Data files are vulnerable too. Even a small error in a bank's computer files can dramatically increase an account balance. This might seem a trifle, hardly worth mentioning, if it happens to your account. But what about those poor souls that own the bank?
If only a few errors appear in an hour-long video stream, we might never notice. Was that a fly landing on Michael Jackson's nose? Or just a tiny error in the video data?
A small glitch in a music file might mean a note is slightly flat or sharp, or too loud or too soft. Or perhaps Michael's just having a bad day.
But several bad bits, especially clustered near one another other, can make a mess of a movie or song.
[Nerdy Fashion Tip: Data can be stored on a disk, where it's often called a "file". Data can be sent along a wire of optical fiber, where it may be called a "transmission". Chunks of data stored in other locations, such as a computer's memory, may go by other names. But today's fashionable nerd uses the word "stream" to refer to all sequences of data, regardless of where they're found.]
In most cases, data errors shouldn't be overlooked. But how can a computer spot a single bad bit, floating in sea of innocent-looking 1s and 0s?
Over the years, a lot of ingenious ways have been developed, allowing our computers to sift data, separating the good from the bad. Not surprisingly, these are called "Error Detection" techniques.
Each trick involves adding extra "error detection" bits to our data. Sometimes these bits are assigned to entire files, protecting millions or even billions of bits. Or the bits may used to monitor much smaller chunks of information, often containing as few as eight data bits. Frequently, more than one error detection technique is used at the same time, resulting in several groups of error detection bits, placed in several strategic locations.
It's helpful to think of these extra bits as sentinels. They don't store your data. Instead, they live invisibly within your data, patiently standing guard. As long as your data is intact, they are silent. But if any unauthorized changes occur, in the data they've been assigned to protect, they instantly sound the alarm!
OK, they don't really sound an alarm. After all, they're just bits. And bits have a very limited range of expression: 0 or 1.
They don't alert our computer, either. To verify protected data, our computer must play detective. It examines both the data and the error detection bits. If the story told by the error detection bits "checks out" - matches the story told by the data -- the data bits may be on the level.
But if the stories don't match, the whole batch of bits is suspect. Maybe one of the data bits has gone bad, maybe an error detection bit isn't telling the truth. But something's fishy and the bunch has to go.
So, what tricks do these clever sentinels use, to detect changes? And what stories do they tell? Here are short dossiers about some of the most popular techniques:
[Techie Tip: If the descriptions of these techniques seem, well, a bit technical, don't worry. The most important point is that there are several techniques in use today, each with their own strengths and weaknesses. You can now skip what follows, and go directly to the next section ("Shortcomings"), if your pupils begin to dilate, or your breathing becomes slow and shallow.]
One of the oldest tricks is repetition or redundancy -- simply transmitting or storing the data more than once. This was often used by early space-exploring satellites. To ensure their precious data safely traversed billions of miles to a sensitive earth station, they simply transmitted the data two or more times. Unfortunately, this technique seriously reduces that amount of useful data that can be sent or stored during a given amount of time.
Length or Block Count
One of the simplest ways to protect data is to add a "count". It might be a count of the number of bits or bytes the data contains. Or, if the data is broken down into larger blocks of data, the number of blocks might be added.
This technique is easy to implement, and is used in by many Internet protocols, and internally within many types of data files. But it's also easily fooled. It only detects missing or extra bits or blocks. It has no way of knowing a 1 should be a 0, or a 0 should be a 1.
Checksums, Cyclic Redundancy Check (CRC)
All these techniques involve doing some sort of calculation, using the bits that make up the data. The results of the calculation, usually 16 to 32 bits in length, become the error detection bits.
These techniques reliably detect all 1-bit errors, and are sensitive to errors affecting multiple bits. But they can be fooled. Changes to several bits can produce data that yields the same checksum or CRC as the original data.
Still, these techniques are easy to implement. They can also monitor large amounts of data (millions or billions of bytes), using a relatively small number (usually 16 or 32) of sentinel bits. You'll find this trick used in disk drives, within Internet transmissions, and elsewhere.
This is one of the most modern, and effective, error detection techniques. We've talked about these hashes before. Like checksums and CRCs, they are the result of complex computations performed on the original data. But these calculations produce between 128 and 512 bits.
In practice, they cannot be fooled. They detect all errors, whether one or a billion bits are changed. They also detect bits that are added or removed. These hashes are also reasonably compact, protected any number of bits with just 128 to 512 sentinel bits.
Here, data bits are divided into small groups (often containing eight data bits, or one "byte"). Each group is assigned its own sentinel bit. The value of this "parity bit" depends on the number of 1s found in the other eight bits.
If the "even parity" scheme is used, the value of the parity bit is chosen so that the total number of 1s in the group is an even number. For example, if the group of data bits contains five 1s, the parity bit will be set to 1 also, making the total number of 1s an even number (six).
If the "odd parity" scheme is chosen, the parity bit's value ensures the total number of 1s in the group is an odd number. For example, if the group of data bits contains three 1s, the parity bit would be set to 0, keeping the total number of 1s an odd number (3).
This scheme is still used in serial communications (for example, between your computer and an external modem), and inside some disk drives. It was also used in early personal computer memories.
Error detection is a big help. But merely detecting errors in our computer data isn't enough. Somehow, the errors must be fixed.
Sometimes the fix is simple. In the case of data transmitted across the Internet, our computer may be able to request or send a fresh copy, one that hopefully will arrive intact.
If data stored on our computer disk goes bad, we may have good, backup copies safely recorded on tape, CD, DVD, or other backup media. In that case, we simply have to find the copy, and copy it to our disk, replacing the damaged data.
But such simple fixes aren't always possible. Re-sending transmitted data isn't always practical. This is often the case when data represents some "live" event, such as a television picture. Retransmission is also difficult, or impossible, when data has traveled a great distance. This data may have arrived from such out-of-town locations as a satellite orbiting the earth, or passing far from earth on a voyage through the solar system.
And of course, as we all know, there isn't always a backup copy of important data stored on our disk drives. It sometimes seems bits are most likely to fail, in the moments just before a backup copy is made.
In these cases, when damaged data can't be replaced, it must somehow be "repaired".
Fortunately, some very clever folks have found ways to fix broken data. Once an error has been detected, the original data can be reconstructed without seeing a copy of the original information. Naturally, these tricks are called "Error Correction" techniques.
Unfortunately, we've run out of time. Talk about error correction will have to wait until our next get-together.
In the meantime, don't forget my collection of Power Tools. One, Karen's Hasher, even computes those "cryptographic hashes" we talked about a little bit ago.
You'll find links to them all on the Power Tools home page:
As always, each program is free for personal/home use. And you can download its complete Visual Basic source code too!
You can also get the latest version of every Power Tool on a shiny CD. These include three bonus Power Tools, not available anywhere else. The source code of every Power Tool, every issue of my newsletter, and some articles I wrote for Windows Magazine, are also on the CD. And owning the CD grants you a license to use all my Power Tools at work.
Best of all, buying a CD is the easiest way to support the KarenWare.com web site, Karen's Power Tools, and this newsletter. To find out more, visit:
Until we meet again, don't change too much. If you see Patrick, give him a hug for me. And, if you see me on the 'net, be sure to wave and say "Hi!"