to STM Publishing

Paul Topping,

Design Science, Inc.

http://www.dessci.com

August 3, 2004

STM (Scientific, Technical, Medical) publishers, like all for-profit businesses, are looking to increase profits by either cutting costs, increasing sales, or both. To cut costs, they look to the application of improved computer and software technology (eg, introducing XML-based workflows). Increasing sales is a tougher problem requiring a creative solution. Product differentiation is harder in STM publishing than in most other business areas because competitors have access to the same authors and subject knowledge, while the print medium does not allow much scope for adding value.

The presence and success of the World Wide Web changes everything. With its ability to deliver information in many varied forms, it has the power to break the value addition logjam. Initially, STM publishers have taken advantage of the new medium by simply using it to avoid the substantial cost of paper and printing. However, this approach does not add much value from the reader's perspective. Yes, a tree is saved but surely the web has greater potential than that!

In this paper, we will show how value can be added by enriching online text with other kinds of information — mathematical meaning, in particular. While other media, such as video and sound, may be used to add value to web publishing, they are costly to produce and are not necessarily of interest to scientists and engineers. Authors normally do not include such media with their work and, therefore, STM publishers are solely responsible for adding them to the product. They usually do not have expertise in these technologies and must outsource such work, making it even more costly. On the other hand, since STM authors do supply mathematics with their work, incorporating mathematical meaning in web content mostly requires publishers to simply adjust their workflow so as not to discard it before it reaches the reader.

The most important document format for web publishing is, of course, HTML. While other media (eg, graphics, video, sound) can be embedded in web pages and other kinds of documents (eg, PDF, spreadsheets, word processing documents) can be delivered via the web, HTML is the glue that binds them all together. Although great strides have been made since the invention of HTML to add more powerful formatting facilities, HTML still has no facilities for formatting MathML notation. This is a problem for STM publishers as the presence of mathematical notation is one of the unique characteristics of STM content.

The World Wide Web Consortium (W3C) [1] sets most of the standards for the web. In 1997, the W3C's Math Working Group finished the MathML 1.0 Specification (superceded in 2001 by MathML 2.0 [2]). MathML is one of several XML-based languages intended by the W3C to extend HTML. To learn more about MathML, visit the W3C's math home page [3] or see our articles, "MathML for Math and Science Communication" [4] and "A Gentle Introduction to MathML" [5].

Although MathML is useful in any XML-based exchange of mathematical information, it was always the hope of the MathML community to see it displayed directly in web pages. As most observers of computer technology know, it is one thing to invent a standard but another to make software vendors support it. Up until recently, MathML support within browsers has been absent. But two events have occurred this year that combine to make HTML+MathML a viable platform for web publishing:

**Netscape 7.0 and later includes native MathML display support.**Netscape 7.0 and later is available for free download and is based on the Mozilla [7] open-source browser project.**MathPlayer 2.0 adds MathML support to Microsoft's Internet Explorer browser (Windows version 6.0 and later).**MathPlayer is available as free download [8] from our web site.

While application of new software technology to the production process has great potential to cut costs, real savings are notoriously hard to realize. Adding value to the product, if it can be done, should be more attractive to the STM publisher.

The first stage in publishing's transition to the web is to use some form of electronic paper. Adobe's PDF as a content delivery medium exemplifies this approach. Its advantages include faithfulness to print and the ease with which it can be produced by an existing print workflow. It is perfect for delivering the electronic equivalent of print journals and books.

As good as PDF is, it has its disadvantages:

- Its faithfulness to print makes it harder to read online. Its columns of text do not reflow to adjust for different browser environments.
- The Acrobat Reader takes over the browser window making PDF content less integrated with other web content.
- The PDF format is limited in its ability to combine non-text media, such as video and sound.
- Although PDF text can be manipulated by the reader, mathematical notation must be displayed as graphics, limiting its usefulness to the reader.

The bottom line on PDF is that, while it is cheap to produce and duplicates print media well, it does not take full advantage of the potential represented by the web.

Scientists, engineers, researchers, and educators are the market for STM publications. Libraries may purchase them but they ultimately serve the same group. Unlike the readers of novels, whose sharing consists of the occasional book report, STM readers are driven by the desire to share information. Science and technology move forward by researchers making small steps and then reporting them to each other. Although words are by far the main medium for such information sharing, mathematics is also a key component. In many ways, the words are there to support the mathematics.

It is surprising, then, that publishers don't try to make the mathematical part of their content more useful to their readership. The answer, of course, is that today's STM readers are satisfied with just being able to read the math. However, just as scientists and engineers have incorporated web searching as a primary tool, other web-enabled technologies and practices will soon become essential to their work. The ability to work with mathematics in publications will soon be important to STM readers.

Although the ability to copy text is an important feature of most of the computer software we use, copying text from someone else's work is considered cheating even in the STM world. Although scientists want their ideas disseminated as widely as possible, they do not want their exact words duplicated (except in the context of a review, of course). Their attitude toward copying mathematics is different. Like their ideas, the math is present in the publication as a base for others to work with and build upon. Although math displayed as a graphic can be copied, it can't be calculated, analyzed, or graphed. When MathML is used to display math in the web page, on the other hand, the meaning of the mathematics is available and all of these operations become possible. This is the essence of our claim that MathML adds value to STM publishing.

The fraction, *x*/2, is represented in MathML as:

<math> <mfrac> <mi>x</mi> <mn>2</mn> </mfrac> </math>

As you can see, MathML is somewhat verbose, like HTML, and, also like HTML, although it can be typed in directly, MathML is usually created using tools such as equation editors or converted from some other representation.

MathML consists of two sub-languages, Presentation MathML and Content MathML. Both kinds of MathML describe mathematical structure but with differing emphasis. The two sub-languages can be used separately or together.

Presentation MathML focuses on the formatting aspects of mathematical notation. The fraction example above uses Presentation MathML. The "mfrac" element specifies a particular notation, that of two sub-expressions separated by a horizontal or diagonal bar. Although this notation commonly means that the first sub-expression is divided by the second, only the notation to be used is specified by Presentation MathML.

Content MathML focuses on mathematical meaning. The following example uses Content MathML to express the mathematical operation commonly associated with the earlier example:

<math> <apply> <divide/> <ci>x</ci> <cn>2</cn> </apply> </math>

Although this operation is commonly expressed in notation as a fraction, only
the mathematical operation is being specified. Alternate notations exist for
this operation (eg, *x* ¸ 2).

The short answer is "like mathematics", of course. Here is a partial screen shot of MathML displayed in Internet Explorer using our MathPlayer software:

It is important to note that the non-math text shown above is displayed using plain old HTML, allowing mathematical notation to be fully integrated into normal web pages. If that was all MathML could do, it would still be an improvement over PDF because it doesn't take over the entire page in the browser. Text and math can reflow to fit the width of the browser window. If the user set his browser to display text in at a large size, the math will also be displayed in that larger size.

Above, we claimed that MathML makes it possible for the reader to do more with the mathematics. Let's look at how this works:

When the reader right-clicks on a MathPlayer-rendered equation, a menu is displayed. The Copy MathML command copies the underlying MathML of the equation to the Windows clipboard, ready to be pasted into any program that accepts MathML. The latest versions of the two major computer algebra systems, Mathematica [9] and Maple [10], both accept MathML via the clipboard. MathML may also be pasted directly into the reader's favorite HTML editor for use in new content and into WebEQ [12], Design Science's popular MathML editor.

MathPlayer 2.0 also has a Commands sub-menu on its right-click menu:

In the current version of MathPlayer, the commands are limited to opening the equation in MathType [12] or WebEQ [11], Design Science's own products. These items will only be enabled if the reader has the corresponding software product installed on their computer. In future versions of MathPlayer, we expect to add more commands to this menu that will allow the reader to directly calculate, graph, and analyze with the math. Remember, this is already possible with the current version using cut and paste via the clipboard. Now that MathML can be published on the web, we expect more software vendors to add MathML support to their products in the near future. We are working with such vendors, as well as readers and publishers, to help define future items on MathPlayer's Commands menu.

According to a Microsoft survey [13], 17% of computer users have a mild visual difficulty or impairment, and 9% have a severe visual difficulty or impairment. Visually impaired readers use applications called screen readers to speak the text on web pages using a computer-synthesized voice. The use of images such as GIFs to display mathematics in web pages prevents screen readers from speaking the mathematics, thereby preventing the reader from understanding the paper. Representing the math using MathML makes the math accessible.

In late 2003, Design Science was awarded an NSF grant to research ways of making mathematics accessible [14] to the visually impaired. This research has started to bear fruit. MathPlayer now enables the math in a web page to be read using the major screen readers for the Windows platform. While we still have a long way to go with this research, we've made a strong start.

Another benefit of using MathML to encapsulate math knowledge in online content is that it makes math searchable. Everyone knows the value of searching for text on the web. Why shouldn't searching for math also be valuable? After all, the mathematics in STM content a good deal of its meaning. In late 2003, Design Science was awarded another NSF grant to hold a workshop to gather requirements for math-based searching [15]. This workshop was attended by MathML experts, STM publishers, professional society members, and library scientists, making it evident that there is a lot of interest in math searching. Although this effort is still in its infancy, we expect it to become a powerful tool for future scientific research and education.

Accessibility is already a requirement for most STM publications, especially those for education. In addition, government standards, such as Section 508 of the US government's Rehabilitation Act [16], require that online materials be accessible. For publications containing mathematics, MathML is a key technology to achieving accessibility.

We feel that STM Publishers can increase the value of their products to readers by publishing online content in the HTML+MathML format. Scientists, engineers, educators, and students are driven by a need to share information with their colleagues. Mathematical notation is an important part of STM content, perhaps a defining characteristic. Recent advances in web browser technology have made HTML+MathML a viable delivery medium for STM content. All that remains is for publishers to take advantage of it.

In a related white paper, MathML Workflows in STM Publishing [17], we describe how MathML content can be created, edited, and published in the STM publishing context.

- World Wide Web Consortium (W3C), http://www.w3.org
- MathML 2.0, http://www.w3.org/TR/2001/REC-MathML2-20010221
- W3C's math home page, http://www.w3.org/Math
- "MathML for Math and Science Communication", http://www.dessci.com/en/reference/webmath/tech/mathml.htm
- "A Gentle Introduction to MathML", http://www.dessci.com/en/reference/mathml/default.htm
- Netscape 7.0 and later, http://channels.netscape.com/ns/browsers/default.jsp
- Mozilla open-source browser project, http://www.mozilla.org
- MathPlayer, http://www.dessci.com/mathplayer
- Mathematica, http://www.wolfram.com
- Maple, http://www.maplesoft.com
- WebEQ, (formerly http://www.dessci.com/en/products/webeq/ since replaced by MathFlow Components)
- MathType, http://www.dessci.com/en/products/mathtype/default.htm
- Microsoft's accessibility statistics, http://www.microsoft.com/enable/research/computerusers.aspx
- NSF grant to research ways of making mathematics accessible, http://www.dessci.com/en/company/press/releases/031209.htm
- NSF grant to hold a workshop to gather requirements for math-based searching, http://www.dessci.com/en/company/press/releases/031201.htm
- US government's Rehabilitation Act, Section 508, http://www.section508.gov/
- "MathML Workflows in STM Publishing", http://www.dessci.com/en/reference/white_papers/mathml_workflows.htm