MathML and MathType

Paul Topping pault@dessci.com

Design Science, Inc. http://www.mathtype.com

January 21, 1999

Currently, there is no effective way of expressing standard mathematical notation in Web pages. Equations can be displayed as GIF images but printing is poor, pages can download slowly, and they don't adapt to the browser user's font choices.

MathML is a potential solution to the problem. As of April 1998, MathML is a Recommendation by the W3C (World Wide Web Consortium) for the representation of mathematics. It is based on XML (Extensible Markup Language), a successor to HTML (Hypertext Markup Language), the language of the Web. MathML can be used to express both the presentation of mathematics and its meaning (through high school level mathematics). MathML is human-readable but designed to be written by software, rather than humans.

MathType 4.0 can generate MathML for use in authoring Web pages with mathematics. It will do so via a new translator mechanism. Translators are defined using a simple language and may be customized by the end user. Several MathML translator definition files are supplied with MathType 4.0 and will produce MathML presentation tags.

With support for XML/MathML by the major browser vendors and authoring tool suppliers, MathML will be a good mechanism for bringing mathematics to the Web.

- Mathematics on the Web
- XML and MathML
- Support for MathML in MathType 4.0
- Web Browser Support for MathML
- References

Standard mathematical notation is used by millions of educators, students, engineers, scientists, and businessman around the world. It is the language of science. Although almost all modern word processing software provides support for the creation and editing of mathematical notation, there is virtually no support for it in Web page authoring software. The main reason for this lack is that HTML (Hypertext Markup Language --- the language used to define Web pages) provides no way of expressing math notation.

Today, most Web pages that include math are created by adding links to GIF images of equations. GIF is the Graphics Interchange Format and is the standard image file format for line art (as opposed to JPEG, which is best for photographic images) on the Web. There are several tools available for creating equations as GIF images, including our MathType 3.5 product.

However, GIF images of equations are far from ideal and have several disadvantages:

- Although browsers print normal text at the full resolution of the printer, much as a word processor would, GIF images are printed as bitmaps, showing no more detail than appears on the screen.
- The fonts used in a GIF equation are fixed at authoring time, while the font size of the body text of the document is under the browser user's control.
- Downloading of a page containing several GIF equations is relatively slow because GIF images are not an efficient storage format for equations and downloading each one requires a separate transaction with the Web server.
- It is impossible to do searches on textual characters or phrases in the equation as there is no text in the GIF image.
- It is impossible to transfer the equation to other software for manipulation as mathematics, as the mathematical structure is not preserved.

It has long been recognized by the designers of the World Wide Web that the right way to support mathematical notation is to make it part of the Web page language, HTML. Some years ago, Dave Raggett of the W3C (World Wide Web Consortium, the organization responsible for creating and disseminating the standards that define the Web) proposed an extension of HTML that would allow math to be expressed. For complicated reasons, the proposal was never accepted. Since then, two things have happened relevant to math on the Web:

- XML (Extensible Markup Language) has been created as a successor to HTML as the language of the Web. The key feature of XML is that it is designed to be extended into new domains, like mathematics.
- A W3C Math Working Group has written a specification for MathML [1], an XML-based language for expressing mathematical concepts. It was accepted by the W3C as a Recommendation in April, 1998.

XML brings the power of SGML (Standard Generalized Markup Language) to the Web. SGML was invented to solve problems that governments and other large institutions were having managing the large volumes of textual data that they have to deal with. HTML was actually designed using some of the ideas that originated with SGML and its predecessors. Its designers did not follow all the guidelines dictated by the SGML approach as they were not important in the early days of the Web. Now that the Web is starting to mature, the advantages of the SGML approach are beginning to be appreciated by the Web community. Luckily, as HTML has many of the features of SGML already, moving to XML will be easier than it would be otherwise.

XML was developed as a successor to HTML with the following advantages:

- It allows the structure of textual data to be expressed, not just its formatting.
- It can extend the domain of what Web data means.

While a detailed description of XML is beyond the scope of this article (see [2] and [3] for good introductions), a simple example might give you the main idea. In HTML, one might show the author of a document as bold text by surrounding it with bold tags as, <b>John Q. Public</b>. In XML, you might surround the name with "author" tags as, <author>John Q. Public</author>. The formatting of the name would be specified by associating font, size, and character style information with the author tag via a style sheet mechanism (currently CSS [4], eventually XSL [5]). The important difference between the HTML and XML ways of handling the author's name is that XML captures the meaning of the chunk of text. One obvious application of this is in searching. With XML documents, one could search for all documents with a given author. With HTML, the best you could hope for would be to find all documents that contain the author's name. This would include documents that simply reference the author's work.

As both the major Web browser manufacturers, Microsoft and Netscape, have pledged support for XML, it is destined to become an important World Wide Web standard. Several useful projects have already been based on XML. Peter Murray-Rust's Chemical Markup Language [6] can be used to capture chemical structures. Microsoft has based their Channel Definition Format [7] on XML, allowing Web data to be "broadcast" to your browser. Others, including MathML, are in the works.

The MathML specification was written by the W3C Math Working Group [1]. In April 1998, it was raised to Recommendation status by the W3C. MathML has as its main goals:

MathML is intended to be used to both present mathematical notation and as a medium of exchange between scientific and mathematical software. Toward that end, MathML defines a set of XML elements and attributes (together called markup) that fall into two categories: presentation markup and content markup. Presentation markup is intended to describe mathematical expressions from a two-dimensional layout point-of-view, whereas content markup is intended to capture the meaning of the mathematics.

Because the body of mathematical knowledge and meaning is constantly expanding, it would be impossible to capture the meaning of all mathematics with 50 MathML elements and their attributes. In order to keep the scope of content markup down to a reasonable size, the designers of MathML have restricted the mathematics that it attempts to cover to high school level mathematics. This is probably adequate to express most of the mathematics for which it is practical to exchange between computer programs that are going to generate and/or accept mathematical equations and calculate using them.

For uses where expressing mathematical meaning is not important or not practical, MathML has presentation markup. Much of mathematical notation is ambiguous unless interpreted by human authors and readers and even then with respect to some sub-field of mathematics or science. For example, a bar over a letter might mean the inverse of some signal in electronics, whereas in other areas it might signify the value of the variable in the last step of some iterative algorithm. Presentation markup has its immediate goal to describe mathematical notation just well enough for a Web browser (or an add-on software module to a Web browser) to display it.

Below is a simple example of MathML's presentation markup for the following simple equation:

**x ^{2} + 4x + 4 = 0**

The presentational tags generally start with "m" and then use "o" for operator "i" for identifier "n" for number, and so on. The "mrow" tags are to do with organization into horizontal groups.

<mrow> <mrow> <msup> <mi>x</mi> <mn>2</mn> </msup> <mo>+</mo> <mrow> <mn>4</mn> <mo>&invisibletimes;</mo> <mi>x</mi> </mrow> <mo>+</mo> <mn>4</mn> </mrow> <mo>=</mo> <mn>0</mn> </mrow>

Although this may seem verbose, remember that it is not intended that humans type this language. Instead, it is expected that it will be created using software tools like MathType.

Once MathML becomes more of a reality (i.e. supported in browsers, tools for creating/editing, support in calculation applications, etc.), people will be able to use it to create some wonderful applications:

Proper browser support for MathML will allow technical documents, such as journal articles, to be created as web pages. These will be better than those produced with current methods such as PDF (Adobe Acrobat), IBM Techexplorer, Mathematica Reader, etc. as these programs take over the entire browser window. With MathML, math can be copied from such web pages into the user's own work and to be used as the basis for further calculation and analysis.

It will be possible to create web pages that implement fancy calculators that show mathematical expressions in standard math notation. Other pages can demonstrate math, science, and engineering concepts where the user plugs in numbers and mathematical expressions, clicks a button, and sees the results presented graphically.

Teachers can prepare tests as web pages, making use of any of the techniques outlined above. As a counter to cheating, numbers and variables can be changed algorithmically in order to present a slightly different test to each student, while still distinguishing right answers from wrong ones.

We at Design Science are working on a major new version of our popular MathType software, the Windows version to be released in February of 1999. One of its most important new features is a powerful translator mechanism. Earlier versions of MathType have had built-in conversion to the TeX language, a powerful but hard-to-use typesetting system for technical documentation. In MathType 4.0, we have extended this capability to allow the translation of its equations into many other languages. The translation process is controlled by a "translator definition file", a text file containing simple translation commands. A MathType installation allows for any number of translator definition files, giving several powerful advantages:

- Translators for virtually any mathematical notation can be created. In fact, Design Science will be providing four versions of its TeX translator: Plain TeX, LaTeX, AMS-TeX, and AMS-LaTeX. Other MathType customers are using it to generate the math notation specified in several SGML DTDs (Document Type Definitions).
- New translators can be created from scratch or by renaming and modifying an existing translator.
- Translators may be exchanged among users via email and the Web.
- Design Science will be updating and improving its own translators and distributing them via its Web site, http://www.mathtype.com.

MathType 4.0 includes several MathML translators, one for each of the web browser plug-ins that are currently available:

**WebEQ compatible:**WebEQ is a suite of Java tools and applets for processing MathML and putting interactive math on the Web. WebEQ has the fullest MathML implementation available, lacking only some of the rarer symbols. The WebEQ compatible translator currently produces the best results with MathType 4.0. Visit the WebEQ web site at http://www.webeq.com/.**IBM Techexplorer compatible:**Techexplorer is both a plug-in and stand-alone scientific document browser. Techexplorer has excellent LaTeX support and many other features, but it currently only has prototype MathML support. In particular, it has very limited symbol character support. Simple MathType equations will translate and render fine, but you shouldn't expect to use it with complicated expressions. Visit IBM's Techexplorer web site at http://www-4.ibm.com/software/network/techexplorer/.**Amaya compatible:**Amaya is the World Wide Web consortium's test-bed browser. Pre-compiled Amaya binaries are only available for Linux and Solaris, but Amaya has many interesting and avant-garde features coming from the ongoing Web technology development at W3C. Amaya has support for the presentation part of MathML, but uses non-standard entity names. See http://www.w3.org/.**STIX compatible:**The STIX compatible translator generates standard MathML, with some extra symbol character names coming from the STIX consortium. The STIX consortium is a collection of science and technical publishers, who have submitted an extensive list of math character names (based on the tables in the MathML proposal) to the Unicode Committee. STIX has stated its intention to develop a freely available font set containing all of the official MathML character entities together with the extensions they have proposed to Unicode. Consequently, the STIX characters come as close as possible to a definitive list of math entity names, and these are the names the MathType STIX-compatible MathML output translator uses.

All of these MathML translators are much the same. They differ chiefly in the "wrapper" code required by their corresponding plug-ins. Eventually, it is expected that MathType will ship with only one MathML translator.

Although MathML is now a W3C Recommendation, MathML support is somewhat experimental at this point for several reasons:

- Browsers do not directly support MathML (with the exception of Amaya, the W3C's experimental testbed browser).
- Support for the character entity sets upon which MathML relies is limited in current browsers.

We at Design Science and the W3C Math Working Group (Math WG) are working at solutions to these problems. To keep up-to-date on MathML support, visit the Math WG's page at http://www.w3.org/Math/.

Design Science will be updating its translators as MathML support in web browsers matures. Although MathType 4.0 will not contain a translator that can convert arbitrary MathML into MathType equations, it will be able to edit MathML material generated by its MathML translator. Eventually, we plan to support two-way conversion of both presentation and content markup.

Although there are browser plug-ins available that will display MathML (see MathType 4.0 and MathML), they really do not provide a completely satisfactory solution to displaying MathML in web pages. Each of them has one or more of the following limitations:

- Limited character set. Many mathematical symbols cannot be displayed.
- Display of characters depends on particular fonts being installed on the client system.
- Printing is limited to screen resolution and, therefore, looks no better than GIF images.
- Math placed in a line of text does not align properly with the baseline of the text.
- Each plug-in requires "glue" code around each equation that invokes the plug-in.
- Equation display does not respect "ambient" browser properties like the user's font choices, screen/document width, etc.
- Clumsy document preparation steps.
- No links to calculation engines like Maple and Mathematica.

As of this writing, the makers of the most popular web browsers, Microsoft and Netscape, are devoting some time and energy to making it possible to add MathML support to their next-generation, version 5.0 browsers. Once this is complete, in theory, software developers will be able to add much more complete and powerful support for MathML into these browsers. We at Design Science expect this to happen by the end of 1999.

On a somewhat separate front, there are several technologies being worked on by various W3C groups for which the major browser makers have pledged support:

- SVG (Scalable Vector Graphics): This is an XML-based language for drawing vector graphics in web pages. It is expected to be at least as powerful as PostScript and use a similar imaging model. Once browsers support SVG, script code can be embedding in web pages that will convert MathML into SVG, essentially using SVG as a graphic output medium.
- DOM (Document Object Model): This is an API that scripts can use to gain access to ambient page properties such as column width, font size, etc. It also allows access to MathML chunks embedded in the page. The DOM is being developed in levels. Although the current wave of browsers claim to support DOM Level 1, it is not powerful enough to do some of the things needed for good MathML support. DOM Level 2 promises to rectify the situation.

When (and if) some or all of the above becomes a reality, it should be possible for software developers to add MathML support to browsers in a completely standard adherent way. Then it will be simply a matter of waiting for the promised support for these standards to materialize.

- W3C's HTML-Math Working Group, http://www.w3.org/Math/
- Presenting XML, Richard Light, 1997, Sams.net Publishing (ISBN 1-57521-334-6)
- XML: Principles, Tools, and Techniques, Dan Connolly (ed.), 1997, O'Reilly and Assoc., Inc. (ISBN 1-56592-349-9)
- Cascading Style Sheets (CSS), http://www.w3.org/TR/REC-CSS1
- Extensible Style Language (XSL), http://www.w3.org/Submission/1997/13/
- Chemical Markup Language (CML), http://xml.coverpages.org/cml.html
- Channel Definition Format (CDF ), http://en.wikipedia.org/wiki/Channel_Definition_Format.
- "HTML-Math", Robert R. Miner and Patrick D. F. Ion, article in [3].