Web Page Essentials

Creating XHTML documents
Understanding document type definitions
Using meta tags
Attaching external documents
Working with the body section
Using CSS for web page backgrounds
Commenting your work


Starting with the essentials

You might wonder what’s meant by this tutorial title: web page essentials. This tutorial will run through everything you need to do with a web page prior to working on the layout and content, including creating the initial documents, attaching external documents to HTML files, and dealing with the head section of the web page. Little of this is a thrill with regard to visual design, which is why many designers ignore the topics we’ll cover, or stick their fingers in their ears, hum loudly, and wish it would all go away (and then probably get rather odd looks from nearby colleagues). However, as the tutorial’s title states, everything we’ll be talking about is essential for any quality web page, even if you don’t see exciting things happening visually.

This tutorial also explores web page backgrounds, which, although they should be used sparingly and with caution, often come in handy. It’s worth bearing in mind that some aspects discussed here will crop up later in the book. For example, CSS techniques used to attach backgrounds to a web page can be used to attach a background to any web page element (be that a div, table, heading, or paragraph). But before we get into any CSS shenanigans, we’ll put our CSS cheerleading team on hold and look at how to properly construct an XHTML document.


Document defaults

As mentioned in tutorial 1, we’ll be working with XHTML markup in this book rather than HTML. Although XHTML markup differs slightly from HTML, the file suffix for XHTML web pages remains .html (or .htm if you swear by old-fashioned 8.3 DOS naming techniques). Although XHTML’s stricter rules make it easier to work with than HTML, you need to be aware of the differences in the basic document structure. In HTML, many designers are used to starting out with something like the following code:

<html>
<head>
<title></title>
</head>
<body>
</body>
</html>
But in XHTML, a basic, blank document awaiting content may well look like this (although
there are variations):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="content-type" content="text/html;
å charset=utf-8" />
<title></title>
</head>
<body>
</body>
</html>

Although this is similar to the minimal HTML document, there are important differences. The most obvious is found at the beginning of the document: a DOCTYPE declaration that states what document type definition (DTD) you are following (and no, I’m not shouting— DOCTYPE is spelled in all caps according to the W3C).

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">


The DTD indicates to a web browser what markup you’re using, thereby enabling the browser to accurately display the document in question (or at least as accurately as it can—as shown in tutorial 9, browsers have various quirks, even when you’re using 100% validated markup).

Next is the html start tag, which contains both a namespace and a language declaration. The first of those is intended to reduce the ambiguity of defined elements within the web page. (In XML, elements can mean different things, depending on what technology is being used.) The language declaration indicates the (default) language used for the document’s contents. This can assist various devices, for example enabling a screen reader in correctly pronouncing words on a page, rather than assuming what the language is. (Also, internal content can have language declarations applied to override the default, for example when embedding some French within an English page.) The xml:lang attribute is a reserved attribute of XML, while the lang attribute is a fallback, used for browsers that lack XML support. Should the values of the two attributes differ, xml:lang outranks lang.

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang ="en" lang="en">
You’ll also notice that a meta tag appears in the head section of the document:
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
To pass validation tests, you must declare your content type, which can be done using this meta element. Here, the defined character set is UTF-8 (Unicode), the recommended

default encoding, and one that supports many languages and characters (so many characters needn’t be converted to HTML entities). There are other sets in use, too, for the likes of Hebrew, Nordic, and Eastern European languages, and if you’re using them, the charset value would be changed accordingly. Although www.iana.org/assignments/character-sets provides a thorough character set listing, and www.czyborra.com/charsets/iso8859.html contains useful character set diagrams, it’s tricky to wade through it all, so listed here are some common values and their associated languages:

ISO-8859-1 (Latin1): Western European and American, including Afrikaans, Albanian, Basque, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish.
ISO-8859-2 (Latin2): Central and Eastern European, including Croatian, Czech,
Hungarian, Polish, Romanian, Serbian, Slovak, and Slovene.
ISO-8859-3 (Latin3): Southern European, including Esperanto, Galician, Maltese,
and Turkish. (See also ISO-8859-9.)
ISO-8859-4 (Latin4): Northern European, including Estonian, Greenlandic, Lappish,
Latvian, and Lithuanian. (See also ISO-8859-6.)
ISO-8859-5: Cyrillic, including Bulgarian, Byelorussian, Macedonian, Russian, Serbian,
and Ukrainian.
ISO-8859-6: Arabic.
ISO-8859-7: Modern Greek.
ISO-8859-8: Hebrew.
ISO-8859-9 (Latin5): European. Replaces Icelandic-specific characters with Turkish
ones.
ISO-8859-10 (Latin6): Nordic, including Icelandic, Inuit, and Lappish.
For an overview of the ISO-8859 standard, see http://en.wikipedia.org/wiki/ISO_8859.


DOCTYPE declarations explained

XHTML 1.0 offers you three choices of DOCTYPE declaration: XHTML Strict, XHTML Transitional, and XHTML Frameset. In the initial example, the DOCTYPE declaration is the first thing in the web page. This is always how it should be—you should never have any content or HTML elements prior to the DOCTYPE declaration. (An exception is the XML declaration; see the section “What about the XML Declaration?” later in this tutorial.)

XHTML Strict
For code purists, this is the DTD that does not allow the use of presentational markup or
deprecated elements:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

It forces a stricter way of working, but tends to ensure greater browser compatibility when you play by its rules.

XHTML Transitional
In common usage, this friendly DTD enables you to get away with using deprecated elements, and is useful for those rare occasions where you’d otherwise be banging your head against a brick wall, trying to work out how to get around using one of those few still-useful old tags:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Note that even if you end up solely using strict markup, the transitional DTD still ensures browsers generally render elements correctly.

XHTML Frameset
Frames are a relic, and are rarely used online. However, for backward compatibility and for those designers who still use them, there is a frameset-specific DTD (individual pages within a frameset require one of the aforementioned DTDs):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

HTML DOCTYPEs
If you wish to work with HTML markup rather than XHTML, your documents still need a DOCTYPE to pass validation. The three DOCTYPEs for HTML 4.01 more or less match those for HTML: Strict, Transitional, and Frameset.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/frameset.dtd">

Partial DTDs
Always include full DTDs. Some older web design packages and online resources provide incomplete or outdated ones that can switch browsers into “quirks” mode, displaying your site as though it were written with browser-specific, old-fashioned markup and CSS, and rendering the page accordingly (as opposed to complying strictly with web standards. The argument for quirks mode was largely down to backward-compatibility. For example, it enabled Internet Explorer 6 to display CSS layouts with the box model used by Internet Explorer 5. This type of fix is today considered archaic—see Chapter 9 for modern methods of backward compatibility, including conditional comments. For more on quirks mode, read Wikipedia’s article at http://en.wikipedia.org/wiki/Quirks_mode. For the record, an example of an incomplete DTD looks like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"/DTD/xhtml1-transitional.dtd">

In this case, the URI (web address) is relative. Unless you have the DTD in the relevant place on your own website, the browser will display the page this DTD is included on in quirks mode. (And, quite frankly, if you do have the DTD on your website instead of using the one on the W3C’s site, you are very odd indeed.) The same thing happens if you leave out DTDs entirely. Therefore, always include a DTD and always ensure it’s complete.