12.1 A Typical CGI Interaction
For an example of a CGI application, suppose you create a guestbook
for your web site. The guestbook page asks users to submit their
first name and last name using a fill-in form composed of two input
text fields. Figure 12-1 shows the form you might
see in your browser window.
The HTML that produces this form might read as follows:
<HTML><HEAD><TITLE>Guestbook</TITLE></HEAD>
<BODY>
<H1>Fill in my guestbook!</H1>
<FORM METHOD="GET" ACTION="/cgi-bin/guestbook.pl">
<PRE>
First Name: <INPUT TYPE="TEXT" NAME="firstname">
Last Name: <INPUT TYPE="TEXT" NAME="lastname">
<INPUT TYPE="SUBMIT"> <INPUT TYPE="RESET">
</FORM>
The form is written using special
"form" tags (discussed in detail in
Chapter 6):
The <form> tag defines the
method used for the form (either GET or POST)
and the action to take when the form is
submitted—that is, the URL of the CGI program to pass the
parameters to.
The <input> tag can be used in many
different ways. In its first two invocations, it creates a text input
field and defines the variable name to associate with the
field's contents when the form is submitted. The
first field is given the variable name
"firstname," and the second field
is given the name "lastname."
In its last two invocations, the <input> tag
creates a Submit button and a Reset button.
The </form> tag indicates the end of the
form.
When the user presses the Submit button, data entered into the
<input> text fields is passed to the CGI
program specified by the action attribute of the
<form> tag (in this case, the
/cgi-bin/guestbook.pl program).
12.1.1 Transferring the Form Data
Parameters to a CGI program
are transferred either in the URL or in the body text of the request.
The method used to pass parameters is determined by the
method attribute to the
<form> tag. The GET method says to transfer
the data within the URL itself; for example, under the GET method,
the browser might initiate the HTTP transaction as
follows:
GET /cgi-bin/guestbook.pl?firstname=Joe&lastname=Schmoe HTTP/1.1
See Chapter 17 for more information on HTTP
transactions.
The POST method says to use the body portion of the HTTP request to
pass parameters. The same transaction with the POST method would read
as follows:
POST /cgi-bin/guestbook.pl HTTP/1.1
... [More headers here]
firstname=Joe&lastname=Schmoe
In both
examples, you should recognize the firstname and
lastname variable names that were defined in the
HTML form, coupled with the values entered by the user. An ampersand
(&) is used to separate the variable=value pairs.
The server now passes the variable=value
pairs to the CGI program. It does this either through Unix
environment variables or in standard input (STDIN). If the CGI
program is called with the GET method, parameters are expected to be
embedded in the URL of the request, and the server transfers them to
the program by assigning them to the QUERY_STRING environment
variable. The CGI program can then retrieve the parameters from
QUERY_STRING as it would read any environment variable (for example,
from the %ENV associative array in Perl). If the
CGI program is called with the POST method, parameters are expected
to be embedded into the body of the request, and the server passes
the body text to the program as standard
input.
(Other environment variables defined by the server for CGI programs
are listed later in this chapter. These variables store such
information as the format and length of the input, the remote host,
the user, and various client information. They also store the server
name, the communication protocol, and the name of the software
running the server.)
The CGI program needs to retrieve the information as appropriate and
then process it. The sky's the limit on what the CGI
program actually does with the information it retrieves. It might
return an anagram of the user's name, or tell them
how many times their name uses the letter
"t," or it might just compile the
name into a list that the programmer regularly sells to
telemarketers. Only the programmer knows for sure.
12.1.2 Creating Virtual Documents
Regardless of
what the CGI program does with its input, it's
responsible for giving the browser something to display when
it's done. It must either create a new document to
be served to the browser or point to an existing document. On Unix,
programs send their output to standard output (STDOUT) as a data
stream that consists of two parts. The first part is either a full or
partial HTTP header that (at minimum) describes the format of the
returned data (e.g., HTML, ASCII text, GIF, etc.). A blank line
signifies the end of the header section. The second part is the body
of the output, which contains the data conforming to the format type
reflected in the header. For
example:
Content-type: text/html
<HTML>
<HEAD><TITLE>Thanks!</TITLE></HEAD>
<BODY><H1>Thanks for signing my guest book!</H1>
...
</BODY></HTML>
In this case, the only header line generated is
Content-type, which gives the media format of the
output as HTML (text/html). This line is essential
for every CGI program, since it tells the browser what kind of format
to expect. The blank line separates the header from the body text
(which, in this case, is in HTML format as advertised). See Chapter 17 for a listing of other media formats that are
commonly recognized on the Web.
Notice that it does not matter to the web server what language the
CGI program is written in. On Unix platforms, the most popular
language for CGI programming is Perl. Other languages used on Unix
are C, C++, Tcl, and Python. On Macintosh computers, programmers use
Applescript and C/C++, and on Microsoft Windows, programmers use
Visual Basic, Perl, and C/C++. As long as there's a
way in a programming language to get data from the server and send
data back, you can use it for CGI.
The server transfers the results of the CGI program back to the
browser. The body text is not modified or interpreted by the server
in any way, but the server generally supplies additional headers with
information such as the date, the name and version of the server,
etc. See Chapter 17 for a list of valid HTTP
response headers.
CGI programs can also supply a complete HTTP header itself, in which
case the server does not add any additional headers but instead
transfers the response verbatim as returned by the CGI program. (The
server may need to be configured to allow this behavior.)
Here is the sample output of a program generating an HTML virtual
document, with a complete HTTP header:
HTTP/1.1 200 OK
Date: Thursday, 28-June-96 11:12:21 GMT
Server: Apache/2.0.36
Content-type: text/html
Content-length: 2041
<HTML>
<HEAD><TITLE>Thanks!</TITLE></HEAD>
<BODY>
<H1>Thanks for signing my guestbook!</H1>
...
</BODY>
</HTML>
The header contains the communication protocol, the date and time of
the response, and the server name and version. (The 200
OK is a status code generated by the
HTTP protocol to communicate the status of a request, in this case
successful. See Chapter 17 for a list of valid HTTP
status codes.) Most importantly, it also contains the content type
and the number of characters (equivalent to the number of bytes) of
the enclosed data.
As seen in Figure 12-2, the result is that after
users click the Submit button, they see the message contained in the
HTML section of the response thanking them for signing the guestbook.
 |