HTTP/HTML/CGI
The Rules of the Web

by Bruce A. Hake and Gordon W. Hake of Hake Internet Projects, LLC.

Summary of the basic rules of the World Wide Web:

  1. Standards: Who Makes The Rules?
  2. HTTP: Hypertext Transfer Protocol
  3. HTML: Hypertext Markup Language
  4. CGI: Common Gateway Interface


1. Standards: Who Makes The Rules?

IETF. The Internet Engineering Task Force (IETF) is the Internet's protocol engineering and development arm. It is a large international community of network designers, operators, vendors, and researchers concerned with the evolution and smooth operation of the Internet. Membership is open to all interested persons.

IESG, the Internet Society, and the IAB. IETF's internal management is handled by the InternetEngineering Steering Group (IESG). Operational management of the Internet standards process is handled by the IESG under the auspices of the Internet Society. The Internet Architecture Board (IAB) is a body of the Internet Society responsible for overall Internet architecture that also helps adjudicate disputes in the standards process.

Internet-Drafts and Request for Comments (RFCs). There are two kinds of "official" Internet documents: Internet-Drafts and Request for Comments (RFCs). Internet-Drafts have no formal status and can be changed or deleted at any time. See Guidelines to Authors of Internet-Drafts. The IETF's Secretariat maintains an Internet-Drafts index.

RFCs, the IAB's official document series, are the closest thing the Internet has to authoritative rules. They are archived permanently (i.e., they are never deleted, and once published, an RFC will never change). Technically, not all RFCs are standards (see the IETF structure and Internet standards process for complete information or browse the RFC Editor Web pages at ISI). The InterNIC's Database and Directory Services, provided by AT&T, has an RFC index .

The World Wide Web Consortium (W3C). The World Wide Web Consortium (W3C) is another important standard-setter for the Web. It produces specifications and reference software. W3C is funded by industrial members but its products are freely distributed. It is administered by the MIT Laboratory for Computer Science and by INRIA (the French National Institute for Research in Computer Science and Control), in collaboration with CERN (the European Laboratory for Particle Physics, on the border between France and Switzerland), where the idea of the World Wide Web was invented in 1989.


Standards · HTTP · HTML · CGI · Top

2. HTTP: The Hypertext Transfer Protocol

The Hypertext Transfer Protocol (HTTP) is the protocol (set of rules) that governs the "under-the-hood" aspects of the World Wide Web. HTTP is a "client-server" protocol designed to facilitate communication of "hypertext" (interlinked documents) across multiple platforms (i.e., among different computers that can be running different operating systems). "Client-server" means it is a model in which client programs (called "Web browsers") are used to interact with a central Web server program that serves up documents and other files such as sound or graphics files upon request.

Because HTTP is a generalized, cross-platform protocol, a single browser program (such as Netscape) can be used to communicate with many different flavors of Web server program. By the same token, a single server program (such as O'Reilly's WebSite for the Windows NT operating system) can communicate with many different flavors of Web browser program running on different kinds of computers with different operating systems.

Incidentally, in Internet parlance the word "server" gets double duty. It refers to server programs serving various Internet protocols (e.g., HTTP or Web server, FTP or File Transfer Protocol server, DNS or Domain Name Service server). It also refers to the main server computers on which server programs run.

The foundational protocol for data transmission over the global Internet is TCP/IP (Transmission Control Protocol/Internet Protocol). Data is transmitted over the Internet in "packets"--groups of digital signals (bits, i.e., 1's or 0's). A host computer (host) connected to the Internet communicates with other hosts in a high-speed, more-or-less continuous stream of signals. Among other things, TCP/IP defines how certain patterns of signals are to be interpreted as marking the start (header) of a new, discrete information packet.

TCP/IP and related protocols also define how various kinds of software programs running on each host relate to one another. All the various programs that comprise a computer's Internet connectivity are said to reside in a "stack," referring to the network communications hierarchy that starts with the raw "hardware layer" (programs that interpret incoming signals from a network communications hardware port) and rises through several levels up to the top "application layer," the user interface.

When you connect to another computer over the World Wide Web, a signal coming from the Web browser program on the topmost "application layer" of your computer's TCP/IP stack is passed to the program on your computer's next lower network layer, down through seven layers, through the network card, out to the hub and router, through cables and routers in between, then back up through the TCP/IP stack on the remote computer up to the Web server program running there on that computer's application layer. Thus, in the split second it takes your signal to get to the remote Web server program, that signal was interpreted and passed along by at least 14 intervening programs.

HTTP is evolving toward something akin to a master protocol for Internet user interfaces, because it subsumes other, earlier methods for distributing information. For instance, an HTTP server needs to know about the FTP protocol, but an FTP server does not need to know about the HTTP protocol. The National Center for Supercomputing Applications (NCSA) publishes an excellent compendium of Everything you Need to know to write an HTTP Server. Selected contents include:


Standards · HTTP · HTML · CGI · Top

3. HTML: Hypertext Markup Language

Whereas HTTP is the "under-the-hood" part of the Web, HTML (Hypertext Markup Language) is what makes the Web work on the outside. HTML is the standardized system of codes to add to ordinary text documents to turn them into "HTMLized" hypertext documents. For example, one set of HTML codes can turn a text portion into an active hypertext link; clicking on the text will call up another document. Other HTML codes give formatting intructions, telling browser programs how to display the text.

Learning basic HTML is ridiculously easy. You can learn how to code Web pages that work in an hour from scratch. On the other hand, it takes talent and lots of hard work to get really good at it. One reason "quick and dirty" HTML coding seems to work is that Web browser programs are increasingly intelligent about fixing coding errors. However, professional-level HTML coding must be cognizant of the peculiarities of many different browsers, in addition to the standard specifications for HTML, because non-standard code will not work, or will not work right in horrible ways, on some browsers.

In theory, HTML permits a user to code hypertext documents that will be seen identically by users using a wide variety of browsers. The neophyte HTML coder often can produce documents quickly that look OK on the browser she is using, so she thinks the coding is correct. In practice, the differences between the browsers are so extreme that a great deal of knowledge and experience must be put into "defensive coding," so that HTML documents will look right on the widest possible range of browsers. Since browser technology is evolving with blinding speed, a good HTML coder needs to be constantly experimenting with many new browser versions.

There are so many excellent, free resources for learning HTML on the Web, it is amazing that so many expensive books are sold on the topic. Two good places to start are the MIT HTML tutorial and Gordon Hake's W3 Writer tutorial.

Those Web pages each give links to numerous resources on virtually every aspect of HTML theory and practice, including the current orthodox HTML version 2.0 specification, and "extensions" to HTML proposed by various vendors. HTML theory and practice is constantly evolving, so you should be sure to check from time to time for new developments. A good place to start is the HTML section on Yahoo.


Standards · HTTP · HTML · CGI · Top

4. CGI: Common Gateway Interface

The Common Gateway Interface (CGI) is an Internet standard for external gateway programs that interface with information servers such as HTTP servers. In other words, it is a standardized way to write little programs called "CGI scripts" that are executed on the computer where the Web server program is running when a person clicks on a link on a hypertext document that invokes the CGI script. The current version is designated CGI/1.1.

The CGI script can do something all by itself, such as create a custom HTML page, which it returns to the Web server program for display to the user who clicked the CGI link. Alternatively, the CGI script can act as a gateway to another program running on the Web server's computer, such as a database program or word processing program. For instance, the Web server calls the CGI script when it detects that a CGI link has been clicked, passes information to the CGI script such as a query request entered by the user into an online form; the CGI script then passes that data along to a database program, waits for an answer, and then passes the answer back to the Web server program, which passes the information in turn back to the user's Web browser program. This process is generally invisible to the user.

CGI scripts are most commonly written in "procedural," interpreted (non-compiled) languages like perl or the UNIX Korn shell. They can also be written in a compiled language like Visual Basic or C++, in a specialty development language such as Borland's Delphi, or even as primitive DOS batch files. In our opinion, the best language for CGI scripts is the interpreted language REXX, which is available for VM, UNIX, Windows NT, and other network operating systems. For a good guide, see the Stanford Guide to Writing CGI Scripts in REXX and Perl.

CGI scripts can be fairly involved. On a server with limited RAM, for example, one can use a CGI script to implement a fast, memory-efficient custom database application, that avoids the need to have a large database application running. For an example, see the Immigration Lawyers on the Web Searchable Lawyer Database, which was programmed entirely in REXX.

Selected other sources:


Standards · HTTP · HTML · CGI · Top

Hake Internet Projects, LLC
Silver Spring, MD USA
301-589-7766 voice · 301-589-9798 fax
webmaster@hake.com



Copyright © 1995-97 Hake Internet Projects, LLC. All rights reserved worldwide.