The Wolfram Language Worldwide Translations Project
December 18, 2015 — Eila Stiegler, Quality Analysis Manager, Wolfram|Alpha Quality Analysis
It has been quite a while since I graduated from college in Germany with a degree in mathematics. Of course, I have plenty of memories of long study nights, difficult homework assignments, and a general lack of a social life. But I also vividly remember having to take programming classes. I had done my best to avoid these for as long as I could. But when they became part of my curriculum, I could not continue ignoring them. Not being a native English speaker, I was not just dealing with the concept of programming, which was completely abstract to me—I also had to find my way around function names always given in English. Though I struggled in those classes, I successfully graduated, and years later am now part of a project that would have helped me tremendously back then: the Wolfram Language Worldwide Translations Project.
The Wolfram Language Worldwide Translations Project provides any non-English-speaking programming novice with an effortless way into the Wolfram Language. It aims to introduce the Wolfram Language while at the same time addressing any lack of English language skills.
How does one typically learn to program? In my experience, new students were given a piece of code with an explanation of its purpose. That way they could familiarize themselves with the coding structure and functions. To help with this situation, Wolfram Research has added functionality that lets you enable Wolfram Language code annotations in a language of your choosing. This is an ongoing effort, and we are planning on covering as many languages as possible. But we have already added support for Japanese, Traditional as well as Simplified Chinese, Korean, Spanish, Russian, Ukrainian, Polish, German, French, and Portuguese.
As part of this project, we added menu translations in Traditional Chinese and Spanish to the already-available Japanese and Simplified Chinese translations.
Back to my struggling days: had I, for example, been given the code behind “Major Multinational Languages,” an example of the Wolfram Demonstrations Project, I would have been able to view the annotated version in German. The annotations do not change or limit the functionality of the code. It is still fully evaluatable and editable, with code captions updating on the fly:
If I felt adventurous and wanted to test my eight years of Russian, I could try that as well:
You can conveniently enable this new feature by cell, by notebook, or even globally. This way I was able to compare the same piece of code above, annotated in German, to its Russian version. You can find code annotations as part of our autocompletion as well. I opted for French in this case:
A look at string length differences
One issue that was instantly raised at the beginning of this project was the length of the translations. The design of the descriptive and camel cased English symbol names can provide challenges when trying to keep translations to a reasonable length.
Let’s take Spanish to illustrate the issue. The function String was translated as “cadena de caracteres.” This translation is already much longer than its English original. Now taking into account that we have a multitude of system symbols containing the substring “string,” e.g. StringFreeQ and StringReplacePart, you can imagine what lengths these translations can reach.
Let’s compare the Russian and Japanese translations of $FrontEnd. Conveniently, the translations are not just accessible interactively but also programmatically through WolframLanguageData’s "Translations" property:
The string lengths of the two translations differ by 66. That begs the question: how do our translations compare in length to their English counterparts? Let’s first load all Wolfram Language symbols as well as their translations:
Now let’s have a look at how the string lengths of the translations compare to the underlying English symbols:
Clearly the Asian writing systems allow for much more condensed translations. On the other hand, digging deeper into the minimum and maximum of the string length differences between Polish and English, we can find the following (hover over the ListPlot for tooltips that compare the English names with their Polish translations):
There are currently 251 cases of translation pairs where the two elements match in length. Here are a few examples:
This is the pair with the greatest string length difference, an astounding 75:
Let’s look at string length differences in all available languages:
Given these discrepancies, we can find some interesting tidbits about languages and their relationships in this data.
Which language has the largest percentage of white spaces? Korean:
And what language beats all others in average string length? German:
Here’s a quick visualization:
Given these discrepancies in translation lengths, how did we accommodate the differences in our interface? Let’s return to our example code for “Major Multinational Languages” with German code captions enabled. Any time the length of a code caption exceeds the length of the English original, we trim the caption with ···. Upon hovering over the code caption, it is fully expanded and emphasized in bold:
Using the new WordCloud functionality in the Wolfram Language, we can get graphical overviews of translations with words sized according to the frequency of their use. Taking the translations of the 120 most frequently used symbols, and making use of the recently added GeoGraphics functionality, the Wolfram Language allows us to generate word clouds in country shapes. Here’s a look at Germany and Portugal:
We can take it a bit further and place the country-shaped word cloud on the actual country polygon. This works quite nicely, as can be seen for Spain:
After playing with the translations and looking at them from different angles, the logical consequence is to see if we can produce code that does not just show annotations but also uses the translations in lieu of the English Wolfram Language symbol names. And sure enough, with the programmatic version of the translations at your fingertips, one can easily implement such a function, TranslateCodeCompletely. Provided a piece of code and a target language as arguments, our new function returns a full translation. For the sake of showing complete translations, we are avoiding symbol shortcuts:
Here is the code producing the different string length histograms above—in Korean. User-defined symbols appear as they were, whereas system symbols are fully translated and emphasized through the use of a gray background. You can mouse over them in the attached PDF to read the original symbol name:
An incredible side effect for the Wolfram Language programming wizard: if you are already firm in your understanding and use of our symbols, this might in turn give you a chance to “programmatically” learn a new language…
I hope this functionality is going to help a great number of new users and will pave their way into the world of the Wolfram Language. Going forward, we do not just intend to extend the collection of languages. We are planning to add translations to more aspects of the Wolfram Language as well—for example, more menu items and shortcuts. We’d be happy to hear back from you about what languages you would like to see the Wolfram Language translated into. And stay tuned: there might be future opportunities for you to contribute.
Download this post as a Computable Document Format (CDF) file.