Products Solutions Store Support Reference Company

# The Big Picture

## Presentation and Content

Think about trying to help a student with a math problem over the phone. Your first challenge is to make sure you are both talking about the same thing, and there are two natural approaches. You can say things like "use the chain rule to write down the derivative of f composed with g", or, if the student is really at sea, you can say "write f prime, open paren, g of x, close paren, g prime of x". The first method tries to communicate the sense or meaning, and leaves the notation up to the student. The second method tries to convey the notation, so that by looking at it, the student can grasp the sense.

In MathML, these two styles of encoding are called content encodings and presentation encodings. Which kind of encoding is most appropriate for a given task will depend on the situation. MathML allows an author to use either kind of encoding, or mix them in a hybrid.

There are 30 MathML presentation elements, with about 50 attributes. These elements are for encoding mathematical notation. Most elements represent templates or patterns for laying out subexpressions. For example, there is an mfrac element, which as you would expect, is used for forming a fraction from two expressions by putting one over the other with a line in between. Using presentation elements, you can precisely control how an expression will look when displayed in a browser, or printed on paper. Unfortunately, as with any layout-based mark-up language, it is all too easy to get it to look right, without taking care to get the underlying structure right. In some cases this won't matter, but it is less likely a badly encoded expression could be spoken properly by a voice synthesizer, evaluated in a computer algebra system, or used by other applications which need to know something of the sense of an expression, rather than just its appearance.

For content markup, there are around 100 elements, with about a dozen attributes. Many of these elements come in families, and represent mathematical operations and functions, such as plus and sin. Others represent mathematical objects like set and vector. Content markup is intended for facilitating applications other than display, like computer algebra, and speech synthesis. As a consequence, when using content mark-up, it is harder to directly control how an expression will be displayed.

The WebEQ editor is presently set up to generate presentation markup. It is possible to use it to edit content encodings as well, but that is not what it is currently designed to do.

## Expression Trees

If you look at a lot of math notation, you will soon notice that although there are a lot of math symbols, there are only a few ways of arranging them -- a row, subscript and superscripts, fractions, matrices and a few others. Of course, these notational patterns or schemata often appear nested inside one another, such as a square root of a fraction, and they generally have a number of parameters which depend on the context, such as the amount to shift a superscript for inline math vs. displayed math. The important point is that even complicated, nested expressions are built-up from a handful of simple schemata.

MathML presentation elements encode the way an expression is built-up from of the nested layout schemata. The best way to understand how this works is to look at an example:

`     (a + b)2`
This expression naturally breaks into a "base," the (a + b), and a "script," which is the single character '2' in this case. The base decomposes further into a sequence of two characters and three symbols. Of course, the decomposition process terminates with indivisible expressions such as digits, letters, or other symbol characters.

The MathML presentation encoding of this expression is:

```     <msup>
<mfenced>
<mi>a</mi>
<mo>+</mo>
<mi>b</mi>
</mfenced>
<mn>2</mn>
</msup>```
The top-level structure is an expression with a superscript. This is encoded by the fact that the outermost tags in the MathML mark-up are the `<msup>` and `</msup>` tags. The mark-up in between the start tag and the end tag defines the base and the superscript.

The first subexpression is an mfenced element, which displays its contents surrounded by parentheses. The second expression is the character 2, enclosed in `<mn>` tags, which tell a renderer to display it like a number. Similarly, the subexpressions contained in the mfenced element are all individual characters, wrapped in tags to indicate that they should be displayed as identifiers (`<mi>`) and operators (`<mo>`) respectively.

Though we won't go into this until later, the content markup for the same example might be:

```    <apply>
<power/>
<apply>
<plus/>
<ci>a</ci>
<ci>b</ci>
</apply>
<cn>2</cn>
</apply>```
As you see, content mark-up uses the same kind of syntax as presentation markup. Each layout schemata or content construction corresponds to a pair of start and end tags (except for so-called empty elements like `<plus/>`, which we will encounter later). The the mark-up for subexpressions is enclosed between the start and end tags, and the order they appear in determines what roles they play, e.g. the first child is the base and the second child is the superscript in an msup schema.

As the indentation of the MathML examples suggests, it is natural to think about MathML expressions as tree structures. Each node in the tree corresponds to a particular layout schema, and its "branches" or child nodes correspond to its subexpressions.

This abstract expression tree is a handy thing to have in the back of your mind. It also describes how the MathML tags should be nested to encode the expression, and how typesetting "boxes" should be nested on the screen to display the notation.

## Next Steps

Before we go on, and start getting into the details, let's review the main points from this section:
• Presentation mark-up is for describing math notation, and content mark-up is for describing mathematical objects and functions.
• In presentation mark-up, expressions are built-up using layout schemata, which tell how to arrange their subexpressions, e.g. as a fraction or a superscript.
• The way the MathML layout schemata are nested together is naturally described by an expression tree, where each node represents a particular schema, and its branches represent its subexpressions.

Now that we have a have an idea of the big picture, it is time to get more specific. The next section, Elements and Attributes, describes the syntax of MathML mark-up in more detail.