Skip NavigationDesign Science: How Science Communicates
Products Solutions Store Support Reference Company View Cart
 
 

Elements and Attributes

HTML, XML, and MathML

Many people are somewhat familiar with HTML-style syntax. In HTML, one mixes keywords in angle brackets with the text to be displayed to indicate logical sections like paragraphs and titles. Different kinds of logical blocks display in different styles. Often, one can specify variants on a theme by adding attributes in the start tags of a particular block. For example, in HTML, the start and end tags <table> and </table> mark a table section, and you can specify variations by adding attributes like <table width="85%">.

MathML uses a very similar style of markup. In MathML, because of the nature of the subject matter, the ratio of tags to text is much higher than in HTML, but the start tag/end tag syntax and the use of attributes is the same.

There are a few small differences, which we will go over below. These stem from the fact that the HTML syntax follows the rules of SGML while MathML follows the rules of XML. Both SGML and XML are systems for defining markup languages like HTML and MathML. SGML has been around a long time, especially in industry and the government. However, it is quite complicated, so a simplified version tailored to Web applications called XML has been formulated, and is fast replacing SGML in many contexts.

There are many good reasons why MathML is an application of XML. Among them is that XML is widely supported in Web-related software of all kinds. By casting MathML as an XML application, it possible to use standard browser extension machinery to implement math rendering.

The downside of XML-style syntax is that it is tedious and error-prone to enter it by hand, just like complicated HTML. However, with tools like MathType and MathFlow, it is generally not necessary to directly edit much MathML by hand.

A MathML Syntax Primer

In MathML there are two kinds of elements. Most elements have start and end tags of the form:

   <element_name> ... </element_name>
These elements can have other data in between the start and end tag, such as text, extended characters, or other elements.

The other type of MathML element is an empty element of the form:

   <element_name/>
These elements have just one tag, which looks like a hybrid between a start and an end tag.

All MathML elements accept a few attributes, and some accept a dozen or more. Attributes generally specify additional optional information about the element. Each attribute has a name and a value. When used with an element that has both start and end tags, the attributes go in the start tag between the element name and the final >. In empty elements, attributes go in between the element name and the final />.

Attribute values must always be enclosed in quotes. In XML, either double or single quotes are permitted. For technical reasons involving how browsers work today, WebEQ tools generally use single quotes.

These two templates illustrate the general format for attributes:

   <element_name attrib_name1='val1' attrib_name2='val2' ... >
and
   <element_name attrib_name='value'/>

Most MathML attribute values are required to be in a particular format, such as a positive integer, or one of a short list of keywords like "true" and "false." The proper format for a given attribute is listed in the Presentation Element Reference section. WebEQ Editor and Publisher will also automatically generate the proper attribute format.

The final thing you need to know about MathML syntax is how the actual text and symbol characters needed for mathematical formulas are encoded. First of all, characters and symbols can only appear inside a handful of special MathML elements called token elements.

Consider this example:

   <mrow>
     <mi>a</mi>
     <mo>+</mo>
     <mi>b</mi>
   </mrow>
Most MathML elements, like the outer mrow element, expect to only find other MathML elements in their content. By contrast, the mi and mo elements are tokens, and their content consists of characters and symbols.

Within token elements, one can have plain text characters, which display as themselves, or special entity references. Entity references are just keywords in a special format, which represent extended characters. Examples of character references are &alpha; and &cap; which stand for a lower case Greek alpha, and the intersection sign, respectively. MathML renderers like WebEQ, with access to symbol fonts, will display the actual extended character glyph in the place of the entity reference.

The format for an entity reference is a keyword preceded by an ampersand (&) and followed by a semicolon (;). That is, a generic entity reference looks like: &entity_name;. It is also possible to use a numeric format for entities which refer to the Unicode codepoint for the symbol.

Most of the MathML entities names are nearly identical to LaTeX symbol names: To write a LaTeX symbol such as \alpha in a form used by MathML, remove the initial backslash and add an ampersand to the beginning and a semi-colon to the end of the word. Thus, \alpha becomes &alpha;.

The complete list of MathML entity references is very long and comprehensive with more than 1,800 symbols. WebEQ can render about 400 of these.

Next Steps

Since syntax without any substance is hard to focus on, here are the main points in review:
  • MathML elements either have start and end tags which enclose their content, or use a single empty tag.
  • Attributes may be specified in a start or empty tag. Attribute values must be enclosed in quotes.
  • All character data must be enclosed in token elements. Extended characters are encoded as entity references.
Now that we understand MathML syntax well enough to read it, in the next section, Boxes, Boxes and More Boxes we turn our attention to the presentation elements, what they mean, and how they are used.
Copyright © 1996-2017 Design Science. All rights reserved. | Privacy statement