April 10, 2000
By Karen Kenworthy
IN THIS ISSUE
My good buddy Bob sent me a note the other day. Now don't get me wrong. Bob is a wonderful family man. As upright as they come. But like everyone else these days, he's inundated with junk e-mail. And some of it is pretty unsavory.
To Bob's credit, he's not interested in invitations to "peek inside Stacy's dorm room." I know, because he told me so. But being a programmer-type, Bob was puzzled by the bizarre Web addresses often mentioned in some of these solicitations. Addresses like:
How, he asked can these work? After all, each Web address must contain a domain name. And a domain name can't consist of just digits. At a minimum a domain name must contain one letter of the alphabet, and end with a period, followed by one of the valid top-level domain names (.com, .net., .edu, .uk, etc.). The result is supposed to be something that looks like this:
The funny Web addresses violate all those rules. Or do they?
The rules of the Internet are spelled out in special set of documents known as RFCs (Requests For Comment). These documents got that name because each RFC was originally a simple proposal. Someone, somewhere (usually at a university), suggested a way to make a part of the Internet work. Or they proposed a solution to a particular problem. Each proposal was written down, assigned a number, and then circulated among other pointy-headed folks who spend their days worrying about networks, computers, and stuff like that.
Some RFCs were eventually rejected. Perhaps the proposed "solution" didn't really work. Or someone may have had an even better idea, which supplanted the original proposal. But many RFCs, often after repeated changes (the result of the requested "comments") were eventually adopted. They became, and still are, the rules of Internet-land.
Most of these RFCs are pretty obscure. Take a look at RFC0815, "IP datagram reassembly algorithms," for example. Or my personal favorite, RFC0825 "Request for comments on Requests For Comments." Only pretty serious folks ever read these.
But a few RFCs have affected the daily life of nearly everyone reading this letter. One is RFC1738, entitled "Uniform Resource Locators (URL)." It tells us how to name locations on the Internet. Or, as the authors so elegantly put it: "This document specifies a Uniform Resource Locator (URL), the syntax and semantics of formalized information for location and access of resources via the Internet."
OK, maybe their description wasn't so elegant. What they were trying to say is this: The Internet is -full- of neat stuff. Wouldn't it be great if we gave a name to each place a neat thing is stored? We could call those names "URLs!"
Returning to RFC1738, a URL looks like this:
Look familiar? Of course not. That wasn't a real URL. It's a diagram of sorts, showing how URLs are formed. It breaks a URL down into its smallest pieces, and shows where each piece belongs.
As you can see, it all starts with something called the "protocol." This little bit of text doesn't tell where to find information. Instead, it tells you how to retrieve the information once you've found it. You're already familiar with the protocol known as "http:". It appears at the beginning of every Web address, such as https://www.karenware.com. HTTP stands for HyperText Transfer Protocol, the shipping technique computers use when sending Web pages across the Internet. And yes, as you've already guessed, HTTP is described in an RFC. Two, in fact, RFC1945 and RFC2616.
Other protocols used in URLs include ftp: (File Transfer Protocol, news: and nntp: (NetNews newsgroups), and gopher: (a now mostly-obsolete protocol used to request searches of remote computers). Another common protocol is mailto:, used to specify an e-mail address.
What's next? Except in the case of mailto:, the protocol is followed by two slashes ("//"). Don't ask me why. It just is. The slashes probably let computers know the protocol portion of the URL is over, and the remaining portion is about to begin. But since protocol names always end with a colon (":"), it seems the additional separator characters are unnecessary. On the other hand, they are cute, and give URLs a certain exotic look. :)
What's In A Name?
Right after the protocol (and the two slashes) come two optional parts of a URL you may not have seen before: user and password. If a Web, ftp or other site requires you to log in before accessing a particular item, it will prompt you for your name and password. It will, that is, unless you specify them as part of the URL. To do that just type your user name, followed by a colon (":"), your password, and finally an at sign ("@"). The result might look something like this:
Next comes the best part of the URL -- the name of the computer where the information is stored. Experts call computers connected to the Internet "hosts." The rest of us just call them computers.
What does a host name look like? That's the crux of Bob's question. We're most familiar with something called a "domain name," a collection of letters and numbers, separated by a few dots. For example, www.karenware.com.
Domain name usually have three parts. The right-most, called the top-level domain, will always be ".com", ".net", ".org", ".edu", ".mil", or one of the two-character country-specific top level domains (such as .us, .uk, or .de). These give clues about the computer's owner. For example, .com domain names are supposed to be U.S.-based companies (though often they are not). Domain names ending in .mil are owned by branches of the United States Armed Forces. And all .edu domains are colleges or universities.
In the middle of a domain name you'll usually find the name of company or network computer network. In essence, this portion of the domain name specifies a group of computers. In the example above, this portion of the domain is karenware, a small group of computers that I use to distribute this newsletter, and host its web site.
The left-most portion of a domain name identifies a specific computer within a group. The address www.microsoft.com, for example, refers to the computer named www, on the network named microsoft, belonging to a well-known U.S. company. Often the computer name will indicate the type of information stored on the computer, such as www (for World Wide Web pages). Other computers have names like secure.mycompany.com, indicating they carry out secure transactions, or mail.mycompany.com, if they process e-mail. One of the computers in the winmag network, that's a part of the World Wide Web, is named bbs.winmag.com. It got its name by supporting our online discussion forums, something old-timers call "Bulletin Board Systems."
OK. That explains URLs like https://www.karenware.com, or even http://myname:email@example.com. But what about those "funny" URLs Bob asked about? Well, they'll have to wait until next week. It's getting late, and Bob has to get up early tomorrow.
But don't worry. If you'd like to learn more about URLs in the meantime, check out these URLs:
They'll put you to sleep faster than anything I know, but they are an interesting piece of computer history. And while you're cruising the 'net, drop by my home page at https://www.karenware.com/ and check out the latest Power Tools. Oh, and don't forget: Wherever you go this week, don't forget to wave and say "Hi!" I'll be looking for you!