HTML BasicsHandling special characters presents a bit of a challenge. For example, how can one use any of the markup characters like >, < or & in an HTML document without the browser thinking they are being used as markup. How am I able to write a phrase like the following:?
Please rember to put a <p> at the beginning of every paragraph & a
</p> at the end.
Fortunately, SGML provides a mechanism for dealing with characters that are in some way problematic: entity references.
Entities are named parts of a marked up document, which is another way of saying that they are a name that refers to some other bit of text (anything from a phrase to a novel). In the simplest case, (and this is that simplest case), entities are simply short mnemonic codes that represent some other simple--but difficult to type or difficult to process--string of text. To use one of those name, simple prepend an & to the name and append a semicolon (these are known, not surprisingly, as entity references). The browser will then display the appropriate character.
For a large number of documents, the only entities needed are the four that HTML defines mainly for the special purpose of including in a document characters that could be confused with markup:
Entity Meaning Use Which is rendered
amp ampersand & &
gt greater than > >
lt less than < <
quot double quote " "
This gets us past the simple problem of including markup characters in a text without their being interpreted as markup, but there are other uses for entities in HTML.
In principal, HTML has as it's base character set the character set
known as ISO Latin 1, which contains many of the common accented
characters of a number of European languages. Unfortunately, using these
characters in an HTML document is more complicated then it might be. How can one use
characters like Ä, ø etc.?
HTML has defined, for the set of accented characters, a set of mnemonic names for each of the characters. To use these, you simply put an & before the name and a semicolon after, thus
Die Tätigkeit des Restaurators basiert in der materiellen
Bewahrung von Kultur- und Kunstgütern im
õffentlichen, kirchlichen und privaten Besitz durch
Untersuchung, Erfassung, Konservierung, Restaurierung, Wartung,
Beratung und Erforschung und der diesbezüglichen
Dokumentation. Die Tätigkeit des Restaurators besteht in
Ausnahmefällen auch in der wissenschaftlich fundierten
Rekonstruktion von Kultur- und Kunstgütern
is rendered:
Die Tätigkeit des Restaurators basiert in der materiellen
Bewahrung von Kultur- und Kunstgütern im õffentlichen,
kirchlichen und privaten Besitz durch Untersuchung, Erfassung,
Konservierung, Restaurierung, Wartung, Beratung und Erforschung und
der diesbezüglichen Dokumentation. Die Tätigkeit des
Restaurators besteht in Ausnahmefällen auch in der
wissenschaftlich fundierten Rekonstruktion von Kultur- und
Kunstgütern
Unlike element tags, entity names are case sensitive:
Ü refers to an upper case letter and
ü a lower case. The ampersand,
& is represented by &, but there is no
&
There are a number of characters that are in the ISO Latin 1 character set but are not in the set that have been assigned names (in this version of the HTML specification. These characters, which include some very useful symbols are listed in the Table of Numeric character references for HTML. To use these, we need to use a slightly less human-friendly format: We prepend the characters & and #the numeric value of the character, (i.e. the position of the character in the ISO Latin 1 character collation) and append a semicolon, which is much harder to say than do.
In practice, simply look in the table, find the character you want and copy (even cut & paste), the text from the rightmost column into your text:
&#191;Get it? If so, you win a prize of
&#165;5000&#189;, or
perhaps you'd prefer it in &#163;s
¿Get it? If so, you win a prize of ¥5000½, or perhaps you'd prefer it in £s