Code Length Measured in 14 Languages
Update: See our latest post on How the Wolfram Language Measures Up.
I stumbled upon a nice project called Rosetta Code. Their stated aim is “to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different, and to aid a person with a grounding in one approach to a problem in learning another.”
After amusing myself by contributing a few solutions (Flood filling, Mean angle, and Sum digits of an integer being some of mine), I realized that the data hidden in the site provided an opportunity to quantify a claim that I have often made over the years—that Mathematica code tends to be shorter than equivalent code in other languages. This is due to both its high-level nature and built-in computational knowledge.
Here is what I found.
Mathematica code is typically less than a third of the length of the same tasks written in other languages, and often much better.
Before the comments section fills up with objections, I should state that there are many sources of bias in this approach, not least of which are bias in the creation of tasks, bias in the kinds of persons who provide solutions, and selectivity in which tasks have been solved. But if we worry about such problems too much, we never do anything!
It should also be said that short code is not the same thing as good code. But short good code is better than long good code, and short bad code is a lot better than long bad code!
Now, I start importing some key web pages. The first lists all the languages supported by the project. I use the special “Hyperlinks” option to HTML Import, and then string match away links of the wrong type.
There is a special page for each language that lists completed tasks, so I do something similar to that…
…and extend the command to take a list of languages and return tasks that have been completed for all of them.
The next step isn’t necessary, but you have to process all that slow internet access at some point, and I prefer to get it out of the way at the start by systematically calling every import that I will need to do. I will also dump the data to disk in a compact binary .mx file, so that I can come back to it without having to re-scrape the website. This is a good point to break for some lunch while it works!
Now that all the data gathering is done, we can start analyzing it. First, how many tasks have been completed by the Mathematica community?
That’s a good number; the most complete on the site is Tcl with 694 tasks. More importantly, there are plenty of tasks that have been completed in both Mathematica and other key languages. This is vital for the like-for-like comparison that I want to do. For example, there are 440 tasks that have a solution in both Mathematica and C.
The thorny part of this problem is extracting the right information from crowdsourced, handwritten wiki pages. Correctly written pages wrap the code in a <lang> tag, with a rather inconsistent argument for the language type. But some of them are not correctly tagged, and for those I have to look at the position of code blocks relative to appearance of the language names in section headings. All that results in this ugly bit of XML pattern matching. I’m sure I could do it better, but it seems to work.
The <lang> tag, when it has been used, is usually the language name in lowercase, without spaces. But not always! So I have to map some of the special cases.
For completely un-marked-up code, or where the solution is descriptive or is an image rather than code, this will return an empty string, and we will treat these as if no solution was provided. With the exception of LabVIEW (where all solutions are images), I suspect that this is fairly unbiased by language, but probably biased toward excluding very small problems.
Here is the code in action, extracting my solution for “flood filling”:
The next thing we need are some metrics for code length. The industry norm is “lines of code”:
But that is as much a measure of code layout as length (at least for languages like Mathematica that can put more than one statement on a line), so non-white space characters counts might be better.
That disadvantages Mathematica a bit, with its long, descriptive command names (a good thing), so I will also implement a “token” count metric—where a token is a word separated by any non-letter characters.
Here is that piece of code measured by each of the metrics.
The line count doesn’t match what you see above because it is counting lines in the original website, and the narrow page design of the Wolfram Blog is causing additional line wrapping.
Now to generate comparison data for two languages, we just extract the code for each and measure it and repeat this for every task the two languages have in common.
If we look at the first three tasks that Mathematica and C have in common, we see that the Mathematica solution has fewer characters in each case.
Here is all the Mathematica versus C data.
There is a lot of noise, but one thing is clear—nearly every Mathematica solution is shorter than the C solution. Some of the outliers are caused by multiple solutions being given for the same language, which my code will just add together.
The best way to deal with such outliers is to do all our smoothing and averaging using Median.
This shows an interesting trend. As the tasks get longer in C, they get longer in Mathematica, but not in a linear way. It looks like the formula for estimating Mathematica code length is 5.5√c, where c is the number of characters in the C solution.
You see similar behavior compared to other languages.
This is perhaps not surprising, since some tasks are extremely simple. There is little difference between one language and another for assigning a variable, or accessing an array. But there is more opportunity to benefit from Mathematica‘s high-level abstractions, in larger tasks like “Implement the game Minesweeper.” This trend is unlikely to continue though; for very large projects, they should start to scale more linearly at the ratio reached for the typical size of individual code modules within the project.
There are 474 languages listed in the website. Too many to be bothered with this kind of analysis, and quite a lot have too few solutions to analyze. I am going to look at a list of popular languages, and some computation-oriented languages. My, somewhat arbitrary, choices are:
To make a nice table, I need to reduce the data down to a single number. I have two approaches. One is to reduce all comparisons to a ratio (length of code in language A) / (length of code in language B) and find the median of these values over all tasks. The other approach is to argue that code length only matters for longer problems, and to do the same, but only for the top 50% of tasks by average code length.
And finally, here are the tables looking at all the permutations of code-length metric and averaging method.
In all cases, the number represents how many times longer the code in the language at the top of the chart is compared to the language on the left of the chart. That is, big numbers mean the language on the left is better!
Despite the many possible issues with the data, it is an independent source (apart from the handful of solutions that I provided) with code that was not contrived to be short above all other considerations (as happens in code golf comparisons). Perhaps as close to a fair comparison as we are likely to get. If you want to contribute a program to Rosetta Code, take a look at unsolved tasks in Mathematica, or improve one of the existing ones.
While the “Large tasks – Line count ratio” gives the most impressive result for Mathematica, I think that the “Large tasks – Character count ratio” is the really the fairest comparison. But however you slice it, Mathematica is presenting shorter code, on average, than these other languages. On average, five to ten times shorter than the equivalent in C or C++, and that should mean shorter development time, lower code complexity, and easier maintenance.