Elements and Attributes
HTML, XML, and MathML
Many people are somewhat familiar with HTML-style syntax. In HTML, one mixes
keywords in angle brackets with the text to be displayed to indicate logical
sections like paragraphs and titles. Different kinds of logical blocks display
in different styles. Often, one can specify variants on a theme by adding
attributes in the start tags of a particular block. For example, in HTML, the
start and end tags <table> and </table>
mark a table section, and you can specify variations by adding attributes like <table
width="85%">.
MathML uses a very similar style of markup. In MathML, because of the nature
of the subject matter, the ratio of tags to text is much higher than in HTML,
but the start tag/end tag syntax and the use of attributes is the same.
There are a few small differences, which we will go over below. These stem
from the fact that the HTML syntax follows the rules of SGML while MathML
follows the rules of XML. Both SGML and XML are systems for defining markup
languages like HTML and MathML. SGML has been around a long time, especially in
industry and the government. However, it is quite complicated, so a simplified
version tailored to Web applications called XML has been formulated, and is fast
replacing SGML in many contexts.
There are many good reasons why MathML is an application of XML. Among them
is that XML is widely supported in Web-related software of all kinds. By casting
MathML as an XML application, it possible to use standard browser extension
machinery to implement math rendering.
The downside of XML-style syntax is that it is tedious and error-prone to
enter it by hand, just like complicated HTML. However, with tools like
MathType and
MathFlow, it is generally not necessary to directly edit much MathML by
hand.
A MathML Syntax Primer
In MathML there are two kinds of elements. Most elements have start
and end tags of the form:
<element_name> ... </element_name>
These elements can have other data in between the start and end tag, such as
text, extended characters, or other elements.
The other type of MathML element is an empty element of the form:
<element_name/>
These elements have just one tag, which looks like a hybrid between a start
and an end tag.
All MathML elements accept a few attributes, and some accept a dozen
or more. Attributes generally specify additional optional information about the
element. Each attribute has a name and a value. When used with an element that
has both start and end tags, the attributes go in the start tag between the
element name and the final >. In empty elements, attributes go
in between the element name and the final />.
Attribute values must always be enclosed in quotes. In XML, either double or
single quotes are permitted. For technical reasons involving how browsers work
today, WebEQ tools generally use single quotes.
These two templates illustrate the general format for attributes:
<element_name attrib_name1='val1' attrib_name2='val2' ... >
and
<element_name attrib_name='value'/>
Most MathML attribute values are required to be in a particular format, such
as a positive integer, or one of a short list of keywords like "true"
and "false." The proper format for a given attribute is listed in the Presentation
Element Reference section. WebEQ Editor and Publisher will also
automatically generate the proper attribute format.
The final thing you need to know about MathML syntax is how the actual text
and symbol characters needed for mathematical formulas are encoded. First of
all, characters and symbols can only appear inside a handful of special MathML
elements called token elements.
Consider this example:
<mrow>
<mi>a</mi>
<mo>+</mo>
<mi>b</mi>
</mrow>
Most MathML elements, like the outer mrow element, expect to
only find other MathML elements in their content. By contrast, the mi
and mo elements are tokens, and their content consists of
characters and symbols.
Within token elements, one can have plain text characters, which display as
themselves, or special entity references. Entity references are just
keywords in a special format, which represent extended characters. Examples of
character references are α and ∩
which stand for a lower case Greek alpha, and the intersection sign,
respectively. MathML renderers like WebEQ, with access to symbol fonts, will
display the actual extended character glyph in the place of the entity
reference.
The format for an entity reference is a keyword preceded by an ampersand
(&) and followed by a semicolon (;). That is, a generic entity reference
looks like: &entity_name;. It is also possible to use a numeric
format for entities which refer to the Unicode codepoint for the symbol.
Most of the MathML entities names are nearly identical to LaTeX symbol names:
To write a LaTeX symbol such as \alpha in a form used by MathML,
remove the initial backslash and add an ampersand to the beginning and a
semi-colon to the end of the word. Thus, \alpha becomes α.
The complete list of MathML entity references is very long and comprehensive
with more than 1,800 symbols. WebEQ can render about 400 of these.
Next Steps
Since syntax without any substance is hard to focus on, here are the main points
in review:
- MathML elements either have start and end tags which enclose
their content, or use a single empty tag.
- Attributes may be specified in a start or empty tag. Attribute
values must be enclosed in quotes.
- All character data must be enclosed in token elements. Extended characters
are encoded as entity references.
Now that we understand MathML syntax well enough to read it, in the next
section, Boxes, Boxes and More Boxes we turn our
attention to the presentation elements, what they mean, and how they are used.
|