Wolfram Blog
Eila Stiegler

The Wolfram Language Worldwide Translations Project

December 18, 2015 — Eila Stiegler, Quality Analysis Manager, Wolfram|Alpha Quality Analysis

It has been quite a while since I graduated from college in Germany with a degree in mathematics. Of course, I have plenty of memories of long study nights, difficult homework assignments, and a general lack of a social life. But I also vividly remember having to take programming classes. I had done my best to avoid these for as long as I could. But when they became part of my curriculum, I could not continue ignoring them. Not being a native English speaker, I was not just dealing with the concept of programming, which was completely abstract to me—I also had to find my way around function names always given in English. Though I struggled in those classes, I successfully graduated, and years later am now part of a project that would have helped me tremendously back then: the Wolfram Language Worldwide Translations Project.

The Wolfram Language Worldwide Translations Project provides any non-English-speaking programming novice with an effortless way into the Wolfram Language. It aims to introduce the Wolfram Language while at the same time addressing any lack of English language skills.

How does one typically learn to program? In my experience, new students were given a piece of code with an explanation of its purpose. That way they could familiarize themselves with the coding structure and functions. To help with this situation, Wolfram Research has added functionality that lets you enable Wolfram Language code annotations in a language of your choosing. This is an ongoing effort, and we are planning on covering as many languages as possible. But we have already added support for Japanese, Traditional as well as Simplified Chinese, Korean, Spanish, Russian, Ukrainian, Polish, German, French, and Portuguese.

As part of this project, we added menu translations in Traditional Chinese and Spanish to the already-available Japanese and Simplified Chinese translations.

The annotations

Back to my struggling days: had I, for example, been given the code behind “Major Multinational Languages,” an example of the Wolfram Demonstrations Project, I would have been able to view the annotated version in German. The annotations do not change or limit the functionality of the code. It is still fully evaluatable and editable, with code captions updating on the fly:

Annotated version in German

If I felt adventurous and wanted to test my eight years of Russian, I could try that as well:

Annotation in Russian

You can conveniently enable this new feature by cell, by notebook, or even globally. This way I was able to compare the same piece of code above, annotated in German, to its Russian version. You can find code annotations as part of our autocompletion as well. I opted for French in this case:

Annotations as part of autocompletion

A look at string length differences

One issue that was instantly raised at the beginning of this project was the length of the translations. The design of the descriptive and camel cased English symbol names can provide challenges when trying to keep translations to a reasonable length.

Let’s take Spanish to illustrate the issue. The function String was translated as “cadena de caracteres.” This translation is already much longer than its English original. Now taking into account that we have a multitude of system symbols containing the substring “string,” e.g. StringFreeQ and StringReplacePart, you can imagine what lengths these translations can reach.

Let’s compare the Russian and Japanese translations of $FrontEnd. Conveniently, the translations are not just accessible interactively but also programmatically through WolframLanguageData’s "Translations" property:

Comparing Russian and Japanese translations of $FrontEnd

The string lengths of the two translations differ by 66. That begs the question: how do our translations compare in length to their English counterparts? Let’s first load all Wolfram Language symbols as well as their translations:

Loading all Wolfram Language symbols and their translations

Now let’s have a look at how the string lengths of the translations compare to the underlying English symbols:

String lengths of the translations compared to the underlying English symbols

Clearly the Asian writing systems allow for much more condensed translations. On the other hand, digging deeper into the minimum and maximum of the string length differences between Polish and English, we can find the following (hover over the ListPlot for tooltips that compare the English names with their Polish translations):

Numbers of pairs per difference in string length
String length pairs

There are currently 251 cases of translation pairs where the two elements match in length. Here are a few examples:

Examples of pairs where two elements match in length

This is the pair with the greatest string length difference, an astounding 75:

Pair with the greatest string length difference

Let’s look at string length differences in all available languages:

Minimum of difference in string length
Maximum of difference in string length

Given these discrepancies, we can find some interesting tidbits about languages and their relationships in this data.

For example, what five symbols are closest in length to their translations in all languages? Here, Byte, ColorQ, ListQ, and Ball:

Five symbols closest in length to their translations in all languages
Five symbols closest in length to their translations in all languages

Which language has the largest percentage of white spaces? Korean:

Language with largest percentage of white spaces

And what language beats all others in average string length? German:

Language with best average string length

Here’s a quick visualization:

Visualization of string length

Given these discrepancies in translation lengths, how did we accommodate the differences in our interface? Let’s return to our example code for “Major Multinational Languages” with German code captions enabled. Any time the length of a code caption exceeds the length of the English original, we trim the caption with ···. Upon hovering over the code caption, it is fully expanded and emphasized in bold:

Expanded view of caption code

Word clouds

Using the new WordCloud functionality in the Wolfram Language, we can get graphical overviews of translations with words sized according to the frequency of their use. Taking the translations of the 120 most frequently used symbols, and making use of the recently added GeoGraphics[] functionality, the Wolfram Language allows us to generate word clouds in country shapes. Here’s a look at Germany and Portugal:

Germany and Portugal word clouds

We can take it a bit further and place the country-shaped word cloud on the actual country polygon. This works quite nicely, as can be seen for Spain:

Spain word cloud

Full translations

After playing with the translations and looking at them from different angles, the logical consequence is to see if we can produce code that does not just show annotations but also uses the translations in lieu of the English Wolfram Language symbol names. And sure enough, with the programmatic version of the translations at your fingertips, one can easily implement such a function, TranslateCodeCompletely. Provided a piece of code and a target language as arguments, our new function returns a full translation. For the sake of showing complete translations, we are avoiding symbol shortcuts:

TranslateCodeCompletely

Here is the code producing the different string length histograms above—in Korean. User-defined symbols appear as they were, whereas system symbols are fully translated and emphasized through the use of a gray background. You can mouse over them in the attached PDF to read the original symbol name:

Code producing different string lengths for histogram

An incredible side effect for the Wolfram Language programming wizard: if you are already firm in your understanding and use of our symbols, this might in turn give you a chance to “programmatically” learn a new language…

I hope this functionality is going to help a great number of new users and will pave their way into the world of the Wolfram Language. Going forward, we do not just intend to extend the collection of languages. We are planning to add translations to more aspects of the Wolfram Language as well—for example, more menu items and shortcuts. We’d be happy to hear back from you about what languages you would like to see the Wolfram Language translated into. And stay tuned: there might be future opportunities for you to contribute.

Download this post as a Computable Document Format (CDF) file.

Download additional code for this post.

Posted in: Wolfram News
Leave a Comment

3 Comments


Michael Kelly

A brilliant blog Eila. I really enjoyed reading it and look forward to using this facility to help me understand other languages

Michael

Posted by Michael Kelly    December 18, 2015 at 2:35 pm
Robert

In the good old days of IT & programming only english was used.
Sadly then came the babylonian speech confusion of IT.
Now we have the struggle with translated SW systems e.g.:
Decimal point vs. comma, localized names of programming statements & functions etc.
Ever since wondering who realy likes this sort of confusion.

Posted by Robert    December 22, 2015 at 10:04 am
Bruce Miller

Is Burmese ( Myanmar ) supported?

Posted by Bruce Miller    January 15, 2016 at 12:50 am


Leave a comment

Loading...

Or continue as a guest (your comment will be held for moderation):