September 25, 2003
By Karen Kenworthy
IN THIS ISSUE
I've always been fascinated by numbers. Big or small, imaginary or real, binary, decimal, or hex, I love them all.
Fortunately, computers, those machines I work with every day, are literally full of numbers. Sure, most of those numbers are 0 or 1. But there are a lot of them. And with a little ingenuity, those two digits can be combined to make something wonderful.
A Byte is a Byte is a Byte, Right?
And confusing. :)
But don't blame the computers. A lot of confusion is caused by the words we humans use to describe the amazing digital world of our computers. To see what I mean, let's take a look at a few examples ...
Bit: A bit is the smallest unit of data. It can only have one of two values: 0 or 1.
Nothing could be simpler than a bit, right? And, truth be told, most of the time the definitions shown above are true. The word "bit" is a contraction of "binary digit", meaning a digit that can have only two values. And a bit is "indivisible", meaning it cannot be divided into smaller units of data.
But the world is seldom as simple as it seems. Deep inside your computer, and many other electronic devices, are digital devices using what engineers called "tri-state" logic. The signals processed by these circuits can have three discreet values!
[Nerdy Extra: The three possible values are 0 volts, Vcc High Impedance, and Vcc Low Impedance.]
Communications over fiber optic cables often employ multi-state logic too. By varying polariztion, frequency, and other characteristics, each individual pulse of light can carry much more information than a traditional, solitary 1 or 0.
Byte: Each byte contains exactly eight bits.
Today, bytes often do contain eight bits. But throughout the history of electronic storage and communication, this hasn't always been the case.
Long before computers were born, back in 1870, Emile Baudot invented a way to encode text using just five bits per character. Later, the "Baudot code" was used by Teletype machines, TDD terminals and some HAM radio equipment.
[Another Nerdy Extra: If Baudot's name sounds familiar, that's because the term "baud" (meaning "bits per second" is named in his honor.]
Some early computers were even more stingy than Baudot, placing as few as four bits in each byte. The pioneering IBM model 1401 computer, debuting in the early 1960's, lavished 6 bits on each byte. Later, the ground- breaking Digital Equipment Corporation (DEC) model PDP-10 used a whopping 9 bits per byte!
It wasn't until 1965, with the release of the legendary IBM model 360, that the 8-bit byte became the industry (though still not universal) standard it is today.
Even now, many bytes contain one or more hidden bits, called "parity bits", bringing the total number of bits needed to store a byte to as many as nine or ten. Parity bits don't actually store our data. Instead, they store extra information used by our computer hardware to detect errors in the byte's other eight "data bits".
Word: A word is a computer's favorite-sized group of bits. The computer's circuits are optimized to compare, move, add, subtract, and multiply this number of bits simultaneously.
A long time ago this definition was always correct. Even today, computers with 32 bits in each Word are called "32-bit" computers. The Words of newer 64-bit computers hold 64 bits of data.
But many programmers, especially those writing programs for Windows, have a different idea of what a Word should be. To them, "Word" often refers to a block containing just 16 bits of data, regardless of the preferences of the computer hardware.
Why? Old habits die hard. In the early days of Windows, and its predecessor MS-DOS, 16-bit computers were the standard. Back then, a "Word" really was 16 bits long. When 32-bit computers appeared, many programmers simply continued using "Word" to mean just 16 bits.
So what do these programmers call the 32-bit block of data handled most efficiently by today's 32-bit computers? These are called a "Double Word", or DWORD. As 64-bit computers arrive, we're beginning to hear a new term. The phrase "Quad Word" (QWORD) is coming into vogue, meaning a block the size of four 16-bit Words (4 x 16 = 64).
In case that's not confusing enough, some cutting-edge computers use variable-size Words. They can manipulate any amount of data, from 1 to 128 bits or more, with equal ease.
OK, experts quibble about the meaning of bit, byte and Word. But these are small matters, no more than a few bits apiece. When it comes to really important amounts of data, surely then everyone agrees, right? Well, not exactly ...
For example, take a look at this table that appeared in last week's newsletter. It shows how the sizes of past, present, and future disk drives are measured:
1 kilobyte (KB) = 1,000 bytes
1 megabyte (MB) = 1,000 kilobytes
1 gigabyte (GB) = 1,000 megabytes
1 terabyte (TB) = 1,000 gigabytes
1 petabyte (PB) = 1,000 terabytes
1 exabyte (EB) = 1,000 petabytes
1 zettabyte (ZB) = 1,000 exabytes
1 yottabyte (YB) = 1,000 zettabytes
Today, drives whose capacities are measured in kilobytes are often found in museums. There are plenty of disks storing megabytes of data in closets and attics around the world, and in a few computers. The hard disks found in most personal computers today can hold several gigabytes, while a few folks have drive arrays able to store a few terabytes.
Larger capacities, those measured in petabytes, exabytes, zettabytes and yottabytes are, for the moment, reserved for the future. But I've no doubt they are already on some engineers' drawing board, and some IT persons' wish list.
Look closely at each of these units, and you'll see that each unit's name consists of a short prefix ("kilo", "mega", "giga", etc.), followed by the word "byte". As I'm sure you know, these prefixes are multipliers, indicating how many bytes each unit represents.
Those who learned their weights and measures outside the United States, and a few others, will immediately realize something else. These prefixes are the ones used by the Metric System, also known as the International System of Units, or just the "SI".
According to this system, each prefix represents a different multiple of one thousand. A one "kilogram" cabbage weighs exactly 1,000 grams, or about 772 scruples. A one megawatt generator produces 1,000,000 (1,000 x 1,000) watts, or about 10,197 pronies.
A bathysphere that withstands one gigapascal, or 1,000,000,000 (1,000 x 1,000 x 1,000) pascals of pressure could descend to an undersea depth of 334,562 feet without crushing its occupants. And a disk drive holding one terabyte of data must have a capacity of at least 1,000,000,000,000 (1,000 x 1,000 x 1,000 x 1,000) bytes.
The Power of Two
It all sounds simple, doesn't it? But of course, it isn't.
To see why, get out your scanning electron microscope. Now remove one of your computer's memory circuits (those little boards called SIMMs or DIMMs). See those black rectangles on each board? They're "chips", complex electric circuits enclosed in a protective block of plastic or ceramic.
Pry one of those blocks loose from its board, and crack it open to reveal the silicon chip inside. Place the chip in the vacuum chamber of your microscope, and take a look.
Cool, eh? See those pretty grids -- shimmering rectangular arrays of identical structures? Those are the chip's memory cells. Each cell stores a single bit!
Keep looking. See how the array of cells is perfectly square, with the same number of columns and rows? Not only that, the number of cells in each column and row is always a multiple of two.
It's easy to figure the total capacity of the chip. Just count the number of cells. Or, if you're in a hurry, multiply the number of columns by the number of rows. For example, if your chip has 1,024 (a multiple of two) columns and 1,024 rows, it can store 1,048,576 bits.
Hmm ... that number is very close to 1,000,000, isn't it? In fact, it's as close as to one million as the size of a memory chip can get. I guess that's why, many years ago, engineers started referring to this many bits as one megabit.
A memory chip with 32,768 (another popular multiple of two) cells along each side of its array can store 1,073,741,824 bits. That's close to 1,000,000,000. So close, engineers decided to ignore the difference and call it one gigabit.
In the fullness of time, we all came to measure the size of our computer memories in these oddball units. A megavolt is 1,000,000 (1,000 x 1,000) volts. But a mega byte is 1,048,576 (1,024 x 1,024) bytes. The difference is so small, who'll notice?
Well, a lot of folks. Officially, SI prefixes always mean some multiple of 1,000. They should never be used to represent any multiple of two, such as 1,024 or 1,048,576.
Disk drive manufacturers follow this rule. And it's easy to see why. Hard disks aren't square, and there's no reason to adjust their capacity to be an exact multiple of two. To them, a megabyte is 1,000,000 (1,000 x 1,000) bytes, exactly.
Bizarrely, the makers of 3.5-inch diskette drives are of two minds. To them, a megabyte contains 1,024,000 (1,024 x 1,000) bytes! Apparently, their philosophy is, if you can't please everyone, annoy everyone. :)
But the folks who make and sell memory chip are the real rebels. They stubbornly insist on stuffing 1,048,576 (1,024 x 1,024) bits into every megabyte. Sure, it makes computer users happy, giving us a little bit more for our money. But it sure annoys the logical folks who manage international standards.
That's why a group called the International Electrotechnical Commission (IEC) has defined a new set of standard prefixes. These represent multiples of 1,024, perfect for measuring the capacity of memory chips. Here are some examples, including the standard spelling and abbreviations of each:
1 Kibibyte (KiB) = 1,024 bytes
1 Mebibyte (MiB) = 1,024 Kibibytes
1 Gibibyte (GiB) = 1,024 Mebibytes
1 Tebibyte (TiB) = 1,024 Gibibytes
1 Pebibyte (PiB) = 1,024 Tebibytes
1 Exbibyte (EiB) = 1,024 Pebibytes
1 Zebibyte (ZiB) = 1,024 Exbibytes
1 Yobibyte (YiB) = 1,024 Zebibytes
The IEC is not affiliated with the SI, the folks who control the Metric System. But they do have the support of the IEEE (Institute of Electrical and Electronics Engineers), and other influential organizations. The NIST (the U.S. Government's National Institute of Standards) as also endorsed this new standard. All hope this scheme will catch on, and reduce confusion among computer users.
But I'm skeptical. Try saying "kibibits" out loud, in public, and you'll hear why. I'm afraid it just sounds silly. Unfortunately, some cures are worse than the disease. :)
Wow! It's almost time to go. But until we part, don't forget to visit the web site and check out the latest Power Tools. I've recently updated three programs (Window Watcher, 'Net Monitor, and Drive Info). To get your free copies, visit their home pages at:
And if you're a programmer, don't forget to download their free Visual Basic source code too.
Better yet, get the latest version of every Power Tool, including all the newest versions, on a shiny CD. The disc also contains three bonus Power Tools not available anywhere else. You'll find the source code of every Power Tool, every back issue of my newsletter, and even some of my original Windows Magazine articles! The CD also includes a special license that lets you use your Power Tools at work.
Buying a CD is also the easiest way to support the KarenWare.com web site, and this newsletter. To find out more, visit:
That's all for now! But don't despair. We'll get together again in about 604,799,982,968 microseconds. In the meantime, if you see me on the 'net, be sure to wave and say "Hi!"
Power Tools Newsletter
- Directory Printer v5.4.4 Adds Unicode Characters Support
- Replicator v3.7.6 Eradicates Bug that caused Error 3
31515 Verified Subscribers
Subscribe to receive new issues of the newsletter about Karen and her free Power Tools.Click here to Subscribe