Chapter 17. HTTP
The Hypertext
Transfer Protocol (HTTP) is the language web clients and servers use
to communicate with each other. It is essentially the backbone of the
World Wide Web. While HTTP is largely the realm of server and client
programming, a firm understanding of HTTP is also important for CGI
programming. In addition, sometimes HTTP filters back to the
users—for example, when server error codes are reported in a
browser window.
This chapter covers all the basics of HTTP. For absolutely complete
coverage of HTTP and all its surrounding technologies, see
HTTP: The Definitive Guide by David Gourley and
Brian Totty, with Marjorie Sayer, Sailu Reddy, and Anshu Aggarwal
(O'Reilly).
All HTTP transactions follow the same general format. Each client
request and server response has three parts: the request or response
line, a header section, and the entity body. The client initiates a
transaction as follows:
The client contacts the server at a designated port number (by
default, 80). It sends a document request by specifying an HTTP
command called a method, followed by a document
address, and an HTTP version number. For example:
GET /index.html HTTP/1.1 This makes use of the GET method to request the
document index.html using Version 1.1 of HTTP.
HTTP methods are discussed in more detail later in this chapter.
Next, the client sends optional header information to inform the
server of its configuration and the document formats it will accept.
All header information is given line by line, each with a header name
and value. For example, this header information sent by the client
indicates its name and version number and specifies several document
preferences:
User-Agent: Mozilla/4.05(WinNT; I)
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
*/* The client sends a blank line to end the header. After sending the request and headers, the client may send additional
data. This data is mostly used by CGI programs that use the POST
method. It may also be used by clients like Netscape Navigator
Professional Edition to publish an edited page back onto the web
server.
The server responds in the following way to the
client's request:
The
server replies with a status line containing three fields: HTTP
version, status code, and description. The HTTP version indicates the
version of HTTP the server is using to respond.The status code is a
three-digit number that indicates the server's
result of the client's request. The description
following the status code is simply human-readable text that
describes the status code. For example:
HTTP/1.1 200 OK This status line indicates that the server uses Version 1.1 of HTTP
in its response. A status code of 200 means that the
client's request was successful, and the requested
data will be supplied after the headers.
After the status line, the server sends header information to the
client about itself and the requested document. For example:
Date: Fri, 20 Sep 1998 08:17:58 GMT
Server: NCSA/1.5.2
Last-modified: Mon, 17 Jun 1998 21:53:08 GMT
Content-type: text/html
Content-length: 2482 A blank line ends the header. If the client's request is successful, the requested
data is sent. This data may be a copy of a file or the response from
a CGI program. If the client's request could not be
fulfilled, the additional data may be a human-readable explanation of
why the server could not fulfill the request.
In HTTP 1.0, after the server has finished sending the requested
data, it disconnects from the client, and the transaction is over
unless a Connection: Keep Alive header is sent.
Beginning with HTTP 1.1, however, the default is for the server to
maintain the connection and allow the client to make additional
requests. Since many documents embed other documents (inline images,
frames, applets, etc.), this saves the overhead of the client having
to repeatedly connect to the same server just to draw a single page.
Under HTTP 1.1, therefore, the transaction might cycle back to the
beginning, until either the client or server explicitly closes the
connection.
Being a stateless protocol, HTTP does not maintain any information
from one transaction to the next, so the next transaction needs to
start all over again. The advantage is that an HTTP server can serve
a lot more clients in a given period of time, since
there's no additional overhead for tracking sessions
from one connection to the next. The disadvantage is that more
elaborate CGI programs need to use hidden input fields (as described
in Chapter 6), or external tools such as cookies
(described later in this chapter) to maintain information from one
transaction to the next.
 |