HTML Basics
Stanford University Libraries & Academic Information Resources

Entity references

Handling special characters presents a bit of a challenge. For example, how can one use any of the markup characters like >, < or & in an HTML document without the browser thinking they are being used as markup. How am I able to write a phrase like the following:?

    Please rember to put a <p> at the beginning of every paragraph & a
    </p> at the end.

Fortunately, SGML provides a mechanism for dealing with characters that are in some way problematic: entity references.

Entities are named parts of a marked up document, which is another way of saying that they are a name that refers to some other bit of text (anything from a phrase to a novel). In the simplest case, (and this is that simplest case), entities are simply short mnemonic codes that represent some other simple--but difficult to type or difficult to process--string of text. To use one of those name, simple prepend an &amp to the name and append a semicolon (these are known, not surprisingly, as entity references). The browser will then display the appropriate character.

For a large number of documents, the only entities needed are the four that HTML defines mainly for the special purpose of including in a document characters that could be confused with markup:

    Entity      Meaning             Use             Which is rendered

    amp         ampersand           &amp;       &
    gt          greater than        &gt;        >
    lt          less than           &lt;        <
    quot        double quote        &quot;      "

This gets us past the simple problem of including markup characters in a text without their being interpreted as markup, but there are other uses for entities in HTML.

Characters with diacritics

In principal, HTML has as it's base character set the character set known as ISO Latin 1, which contains many of the common accented characters of a number of European languages. Unfortunately, using these characters in an HTML document is more complicated then it might be. How can one use characters like Ä, ø etc.?

HTML has defined, for the set of accented characters, a set of mnemonic names for each of the characters. To use these, you simply put an & before the name and a semicolon after, thus

    Die T&auml;tigkeit des Restaurators basiert in der materiellen
    Bewahrung von Kultur- und Kunstg&uuml;tern im
    &otilde;ffentlichen, kirchlichen und privaten Besitz durch
    Untersuchung, Erfassung, Konservierung, Restaurierung, Wartung,
    Beratung und Erforschung und der diesbez&uuml;glichen
    Dokumentation. Die T&auml;tigkeit des Restaurators besteht in
    Ausnahmef&auml;llen auch in der wissenschaftlich fundierten
    Rekonstruktion von Kultur- und Kunstg&uuml;tern

is rendered:

    Die Tätigkeit des Restaurators basiert in der materiellen
    Bewahrung von Kultur- und Kunstgütern im õffentlichen,
    kirchlichen und privaten Besitz durch Untersuchung, Erfassung,
    Konservierung, Restaurierung, Wartung, Beratung und Erforschung und
    der diesbezüglichen Dokumentation. Die Tätigkeit des
    Restaurators besteht in Ausnahmefällen auch in der
    wissenschaftlich fundierten Rekonstruktion von Kultur- und
    Kunstgütern

Things to remember

Unlike element tags, entity names are case sensitive: &Uuml; refers to an upper case letter and &uuml; a lower case. The ampersand, & is represented by &amp;, but there is no &AMP;

Other special characters

There are a number of characters that are in the ISO Latin 1 character set but are not in the set that have been assigned names (in this version of the HTML specification. These characters, which include some very useful symbols are listed in the Table of Numeric character references for HTML. To use these, we need to use a slightly less human-friendly format: We prepend the characters & and #the numeric value of the character, (i.e. the position of the character in the ISO Latin 1 character collation) and append a semicolon, which is much harder to say than do.

In practice, simply look in the table, find the character you want and copy (even cut & paste), the text from the rightmost column into your text:

    &amp;#191;Get it? If so, you win a prize of
    &amp;#165;5000&amp;#189;, or
    perhaps you'd prefer it in &amp;#163;s

¿Get it? If so, you win a prize of ¥5000½, or perhaps you'd prefer it in £s