18.1 Understanding Apache
The Apache
distribution consists of the source for the core binary,
httpd, the standard set of modules, and numerous
additional header and configuration files. You can compile the server
for your particular architecture and preferences using the
config-make-make install routing common to
building open source software. The latest version of
gcc or another up-to-date ANSI C compiler is
required to compile and build
Apache.
However, you may not have to compile Apache from source. Most Linux
and Mac OS X distributions have Apache already built-in. Furthermore,
binaries are available for most popular platforms. Refer to
www.apache.org for details.
By itself, httpd doesn't do
more than listen for requests and deliver files as is. Apache is
designed to load special modules to implement
additional functionality. These modules define much of the behavior
of the Apache server. A set of standard modules is distributed with
the server, including a set of core modules that is automatically
compiled into the server binary. Apache will call on modules as
needed to perform a dedicated task, such as user authentication or
database queries.
18.1.1 Loading Modules
Modules
must be compiled first to be used by the server, and can be loaded in
two ways: statically or dynamically. Modules can be statically built
directly into the server binary at compile time:
./configure --enable-module
./configure --disable-base_module
./configure --enable-modules=module_list
Alternatively, you can compile modules as DSO's
(Dynamically Shared Objects) and load them as needed at run-time
(when the server is started or restarted) by identifying them with
the LoadModule directive in the configuration
file.
To compile shared modules at compile time, use:
./configure --enable-MODULE=shared
DSO modules may also be compiled with apxs
(Apache Extension Tool) at any time outside of the Apache source
tree. See the Apache documentation for full details on
apxs.
18.1.2 Server Configuration
At startup, Apache reads the main
server configuration file httpd.conf. You can
control the behavior of the server and its modules by inserting or
modifying the directives within this file. Additional configuration
can occur on a directory-specific level using
.htaccess files. These are configuration files
like httpd.conf, but the directives they contain
apply only to the directory where they reside. This allows for
delegation of control over separate content areas of a single server,
and may simplify server management.
The Apache server uses one other configuration file,
mime.types, to determine what MIME types should
be associated with what file suffixes (see Chapter 17).
The
configuration files contain directives, which are one-line commands
that tell the server what to do. In addition to the directives
themselves, the configuration files may contain any number of blank
lines or comment lines beginning with a hash mark
(#). Although directive names are not
case-sensitive, we use the case conventions in the default files.
Example copies of each of these files are included with the server
software distribution, which you can refer to for more information.
The first things Apache needs from the configuration file are basics
like the listening port, server name, the default locations for
content, logs, and other important files, and what modules to load.
After that, the wider server functionality is configured. This
includes access control, virtual hosts, special resource handling,
and module-specific directives.
Here are some basic directives you might find in the
httpd.conf configuration file:
ServerType standalone
Port 80
ServerAdmin webmaster@oreilly.com
ServerName webnuts.oreilly.com
User nobody
Group nobody
Each directive here specifies a property of the
server's configuration and binds it to a default
setting or value. Since these directives exist on their own in the
configuration file, their context is that of the whole server. Many
directives will appear in special subsections that limit their scope.
Directives that define subsections are bracketed, XML-like elements.
For example:
<Directory /docs>
Deny From All
</Directory>
This configuration section sets a directive for requests to a single
directory /docs. Many configuration sections
apply to locations of file on the server, such as
<Files>,
<Location>, and
<Directory>. Other configuration sections
define virtual servers (<VirtualHost>) or
contain directives specific to a module
(<IfModule>)
All server configuration can occur in the
httpd.conf file, but you may want to allow
special configuration of only certain parts of your server—you
could let a user configure some aspects of how documents in her
directory are served. By default, Apache looks for
.htaccess files in every directory it serves a
file from. .htaccess may contain any
configuration directives allowed by the server configuration file
with the AllowOverride directive. For example, if
httpd.conf contained the line:
AllowOverride AuthConfig
most of the directives from the user authorization modules
(Auth*) could be used in an
.htaccess file to limit access to the files in
that directory. This is exactly equivalent to using the same
directives within a specified <Directory>
section in httpd.conf.
Since .htaccess files affect the directory they
are in and any subdirectories, they have a cascading affect on
configuration. A directive in a lower-level
.htaccess requires an
AllowOverride from a parent-level
.htaccess to work. This places increased load on
the server, which must search for .htaccess
files and parse them for every request in the current
and parent-level directories. If you want to
completely ignore .htaccess files, use
AllowOverride None in
httpd.conf.
18.1.3 Handling Requests
On Unix systems, the Apache daemon
httpd always starts itself as a system superuser
(root). This is often done at startup through entries in the system
initialization files. On Windows, the Apache service is called
apache and runs with administrator privileges.
Once started, Apache's job is to listen for requests
on any address and port to which it has been configured. When
handling a request from a specific client, Apache spawns a separate
process to handle the connection. This spawned process, however,
doesn't run as the superuser; for security reasons,
it instead runs as a restricted user that serves files to the client.
Apache normally has five such processes waiting for connections;
hence, after startup, you will see one process
(httpd) running as root and five processes owned
by the Apache user ID, which stand to service requests. You can
reconfigure that number, as well as the minimum and maximum number of
service processes allowed with the StartServers,
MinSpareServers, and
MaxSpareServers directives. Each process handles
specific HTTP requests for the client, such as GET or POST, which
affect content on the server.
All resources available to visiting browsers (HTML documents, images,
etc.) reside by default under a single root directory defined by the
DocumentRoot directive. This defines the base
directory that is prepended to a URL path to locate a file on the
server. Most URL mapping is as simple as locating a file under the
document root, but more complex mapping can be defined through
aliasing, redirection, and URL rewriting using the
mod_alias and mod_rewrite
modules.
18.1.4 Access Control
Webmasters often find the need to
restrict some or all of the data on their servers to authorized
users. Access can be controlled by requiring username and password
information or by restricting the originating IP address of the
client request. The mod_access and
mod_auth core modules provide basic access
control for Apache.
Access control is usually confined to specific directories of the
document tree. You can place authorization directives in
httpd.conf within
<Directory> sections, or within
.htaccess files in the restricted directory
itself (using AllowOverride AuthConfig).
This example shows the directives used to configure username and
password access to a specific directory:
<Directory /projects>
Options All
AuthType Basic
AuthName "Editorial Group"
AuthUserFile /usr/local/etc/httpd.conf/.htpasswd
AuthGroupFile /usr/local/etc/httpd.conf/.htgroup
require group editors
</Directory>
The AuthType
directive specifies the type of authentication used.
"Basic" authentication describes
the simple authorization scheme used by Apache where user password
files are created with the htpasswd program.
AuthName specifies the authorization
"Realm". The realm can describe
many different server locations so that an authorized user does not
have to re-supply his password information as he navigates.
AuthUserFile provides the user/password file
location, and AuthGroupFile provides the group
file location. require sets the restriction to
only members of the group
"editors".
The following configuration section limits access to a directory to
requests from a specific domain:
<Directory /projects/golf>
order deny,allow
deny from all
allow from .golf.org
</Directory>
18.1.5 Password and Group Files
A password file is needed for user and
group-level authentication. The location and name of the password
file are specified with the AuthUserName
directive. The easiest and most common way to create a password file
or add passwords is to use the htpasswd program
that is distributed with the server. If a password file already
exists for a location, you can type:
htpasswd pathname username
The program then asks you to type the
password you wish for the given username twice, and the username and
encrypted password are stored in the new file.
If a password file does not exist yet, you can create one by typing
the same command with the -c option (e.g.,
htpasswd -c pathname
username). But be careful, since the
-c option will create a new file without checking
if one already exists, thereby overwriting any existing passwords.
Password files created with .htpasswd are
similar to Unix password files. Keep in mind, however, that there is
no correspondence between valid users and passwords on a Unix server,
and users and passwords on an Apache web server. You do not need an
account on the Unix server to access the web server.
You can bundle several users into a single named group by creating a
group file. The location and name of the group file are specified
with the AuthGroupFile directive. Each line of a
group file specifies the group name, followed by a colon, followed by
a list of valid usernames that belong to the group:
groupname: username1 username2 username3 ...
Each user in a group needs to be entered into the Apache password
file. When a group authentication is required, the server accepts any
valid username/password from the group.
The .htpasswd user authentication scheme is
known as the basic authentication method for
HTTP servers. Apache allows other types of authentication methods,
which are configured with a similar set of directives.
18.1.6 Virtual Hosting
Apache also has the ability to
perform virtual hosting. This allows a single
httpd process to serve multiple IP addresses or
hostnames. Virtual hosting seems like a complicated procedure;
however, it really isn't as bad as it seems. In each
configuration file, you can structure directives that apply only to
virtual hosts. For example, you can specify separate
DocumentRoot directives for each virtual machine,
such that someone connecting to www.oreilly.com
is served one set of documents, while another client connecting to
www.onlamp.com receives another, even though the
content for each of these sites is served by the same server on the
same machine.
To create a virtual server, simply enclose
httpd.conf directives related to the server in a
<VirtualHost> directive. Here is an example
httpd.conf configuration that will set up two
virtual servers:
ServerName www.oreilly.com
AccessConfig /dev/null
ResourceConfig /dev/null
<VirtualHost www.oreilly.com>
ServerAdmin webmaster@oreilly.com
DocumentRoot /usr/local/www/virtual/htdocs/oreilly
ServerName www.oreilly.com
ErrorLog /usr/local/www/virtual/htdocs/oreilly/error_log
TransferLog /usr/local/www/virtual/htdocs/oreilly/transfer_log
</VirtualHost>
<VirtualHost www.onlamp.com>
ServerAdmin webmaster@onlamp.com
DocumentRoot /usr/local/www/virtual/htdocs/onlamp
ServerName www.onlamp.com
ErrorLog /usr/local/www/virtual/htdocs/onlamp/error_log
TransferLog /usr/local/www/virtual/htdocs/onlamp/transfer_log
</VirtualHost>
18.1.7 Log Files
Apache creates two log files by
default: the error log and the access log. The
server's error log records any errors the server
encounters during execution. The access log records all client
requests made to the server. You can set the locations of these files
with the ErrorLog and CustomLog
directives.
Access logs are highly configurable. The LogFormat
directive allows you to specify which data is recorded for each
server transaction. For example, the following directive:
LogFormat "%h %l %u %t \"%r\" %>s %b" common
configures the access log to record information in the Common Log
Format, which includes such data as the client IP, user ID, time of
request, the request command, and the server's
response.
 |