Going Wordless at the Advanced Mathematica Summer School
July 31, 2008 — André Kuzniarek, Director of Document and Media Systems
No, not a vow of silence, but rather, some suggestions about how to move documents from Microsoft Word into Mathematica.
A number of us Wolfram Research staffers contributed to our recent Summer School effort by sharing mentoring duties. In my case I worked with Richard Werthamer, a physicist who is publishing a book on the science of casino gambling strategies. His project includes programs verifying his research, and he’s eager to translate them into Mathematica in order to exploit all the new dynamics and plotting features of Version 6. At the same time, he quite naturally wants to move his existing manuscript into Mathematica notebook form to deliver a computable document, combining text and interactive Mathematica content distributable on the Mathematica Player platform.
Richard’s situation is pretty common. He prepared his manuscript with MS Word, and a great new feature delivered in Mathematica 6.0.3 allows for the exchange of MathML on the clipboard with MS Word 2007 straight “out of the box”. In other words, after creating a formula in Word using its new native math typesetting system, simply select the formula, copy, then switch to Mathematica to paste into a notebook.
The result will be in traditional typeset format, though set as bold because of it being an input cell, which means it can be evaluated. The result of the evaluation is dependent on the clarity of the input, but there remains enormous potential for computation of otherwise static expressions. Wolfram Research led the development of MathML and it’s great to see MS Word embrace this standard for cross-application sharing of information. And of course, copy and paste certainly works equally well in the reverse direction.
In Richard’s specific case, we were dealing with a manuscript containing hundreds of inline and display formulas composed in an older version of Word using MathType. While MathType is no longer used by Word for math typesetting, legacy material can still exploit Mathematica‘s MathML support with MS Word 2007 or older versions, as well as Mathematica‘s import of XML documents, because like Mathematica, MathType has supported MathML translations for a number of years already. With the MathType window open, simply select the Preferences > Translators menu to configure MathML 2.0 with no namespace as the translation method, allowing formulas to be copied to the clipboard and pasted into any recent version of Mathematica.
While copy and paste is fine for some small number of formulas, it’s better to deal with a whole book by exporting it as XML and importing the result into Mathematica. Using MathType in Word, select Publish to MathPage under the MathType menu, and choose “MathML using: XHTML+MathML” in the equations area of the dialog. In Mathematica, use Import[filename, "XML"]. Be explicit with the XML argument, even if the file is named with the .xml extension, to make sure Mathematica does not parse it as the HTML variant that it is, but rather brings it in as SymbolicXML. The result is an expression that can be manipulated in any number of ways to format text and math formulas as a notebook expression.
Given SymbolicXML based in imported XHTML+MathML, the commands in this notebook offer a quick and simple extraction of text, heads, and math to get started on creating an interactive Mathematica document. I thank Buddy Ritchie in our document systems group for providing this code and comments.
Postscript: At this time Word 2007 does not export XML/MathML directly on its own, producing instead XML with Microsoft’s OMML (Office MathML) which is an internal representation of math in Word documents. Microsoft provides XSLTs in the Microsoft Office application layout, specifically omml2mml.xsl, which can be used to process XML produced by word into XHTML+MathML. In future we will see about making this transformation happen directly using Mathematica import tools, but until then, further discussion about this conversion process and related resources can be found here:
• XHTML and MathML from Office 2007
• Science and Nature Have Difficulties with Word 2007 Mathematics