Modern data scientists are often self-limited by their choice of methods and technology: traditional statistical tools with specific uses that only apply to numerical data. This outmoded process provides answers to only a small subset of possible questions. MPDS turns this approach around, starting with key questions and broadly exploring with different methods and data types. The exploration process is then automated, giving you the freedom to iterate on new questions and find better answers more quickly.

MPDS with Wolfram technology brings a variety of sophisticated algorithms and interfaces to bear on your data, increasing the scope and accuracy of your analytics while also streamlining development, collaboration and publishing. We remove the constraints of standard workflows by offering the full range of analyses, visualizations and deployment targets. This leads to unique, actionable insights that can be shared across organizations.

Like all interactive courses from Wolfram U, the MPDS course provides a variety of resources that run on our cloud and notebook technology. It includes 21 videos ranging in length from 3 to 12 minutes, each with a transcript and a course notebook with copyable code. A scratch notebook is also provided for trying out code and taking notes as you go. The course is split into 5 sections: an introduction to the MPDS workflow, followed by detailed examination of the various stages.

Throughout the course, you’ll find out how you can use the Wolfram Language to:

- Build an end-to-end data science workflow
- Wrangle and clean different types of data
- Assemble a multiparadigm toolkit for analysis and visualization
- Examine and combine data from multiple sources
- Perform visual exploratory data analysis
- Communicate results effectively with a variety of visualizations and interactive interfaces

Each section (excluding the introduction) ends with a quiz to test your knowledge and encourages you to learn more about the relevant Wolfram Language functionality through our extensive Documentation Center. You can watch the lecture videos, take the quizzes and track your progress toward certification in the course certificate window. After watching all 21 videos and completing the four quizzes, you’ll receive a personalized downloadable course certificate.

We are also working on two further levels of certification, available to those who work through course exercises associated with each lecture video and complete an MPDS project from a list of topics to be graded by one of our experts. These advanced certifications will be available in the near future, so stay tuned!

This Wolfram U course is completely free and open; all you need to get started is a Wolfram ID and a browser. With our bite-sized modules, convenient navigation tools and progress tracking, you can learn at your own pace. So embrace the multiparadigm approach today for a more flexible, integrated workflow—giving you real, quantifiable answers to data science problems too complex for traditional methods.

For hands-on experience with expert instructors and mentors, join us at the Wolfram Data Science Boot Camp, July 29–August 16 in Champaign, Illinois. |

Since it was first launched about ten years ago, Wolfram|Alpha has been one of the most useful sites on the web. You can use it to do arithmetic, solve differential equations, find out how many calories there are in a cake, track the airplanes near your current location, track any given constellation, find out how many runs Ken Griffey Jr. scored in 1995 and even perform calculations that make absolutely no sense.

In October 2009, a few months after the website launched, we released Wolfram|Alpha 1.0 for the iPhone. Today, we are announcing the latest evolution in Wolfram|Alpha for your iOS phone or tablet, Version 2.0, which is available now on the iOS App Store.

The iOS app is not, nor has it ever been, an app that just displays the website in a web view. It is a fully native client that uses Apple’s UIKit framework for drawing the user interface and Apple’s Foundation framework for interacting with the Wolfram|Alpha back end. It supports tracking your query history and favorite queries, and even allows you to query with images. Version 2.0 makes one of Wolfram|Alpha’s most popular use cases better than ever, and makes additional improvements to the user experience.

We know that many of you like to use Wolfram|Alpha to obtain step-by-step solutions for math and science problems. You query the problem, and Wolfram|Alpha will provide the solution, plus a button that says **Step-by-step solution**. Touching that button will reveal the work necessary to get the solution. The following image shows what this looked like in the previous version of the iOS app.

(Note: all screenshots were taken on an iPad. The app also runs on an iPhone; the iPhone UI has only minor differences, in order to fit the smaller screen size.)

Here are the query results in Version 2.0. You may notice the new button design:

In Wolfram|Alpha 1.8 and earlier, the step-by-step solutions were presented in the query results, and touching the button to show the steps revealed all the steps at once. In Wolfram|Alpha 2.0, step-by-step solutions are presented in a brand-new view, one step at a time:

You can touch the **Show all steps** button to see every step at once, if you’d prefer. Touching **Next step** will show you the next step in the sequence. In addition to the new steps, we now provide hints if you want to figure out a step by yourself:

You can also turn off hints by touching the **Hide hints** button in the top-right corner of the new step-by-step view. Touching **Next step** while a hint is up will show you the true next step:

If a step has intermediate steps, you’ll see an orange, rounded rectangle around it, like you see in the previous screenshot. (The pop-up you can also see in the window only appears once.) You can touch the orange rectangle or the **Show intermediate steps** text to reveal the intermediate steps:

You can hide the intermediate steps by touching the little gray X, the orange rectangle a second time, the **Hide intermediate steps** text or, if there are multiple intermediate steps, a different orange rectangle to show its intermediate steps.

When you load a new step, the app scrolls down to show it. The steps are scrollable, so you can scroll back up to see previous steps if you get too far ahead of yourself:

And when you get to the last step, the **Next step** button becomes **Start over**. Touching that button sends you back to step one and hides the other steps:

Some step-by-step solutions have multiple forms. When you see a button near the top with a downward-facing caret in its title, you can touch that button to change the form, like this:

The new step-by-step solutions require that you subscribe to Wolfram|Alpha Pro. If you already have a Wolfram|Alpha Pro subscription, then you’re all set. Just sign in with your Wolfram ID prior to running your query, and the new step-by-step solutions will be available to you. We have a new in-app sign-in form for that purpose:

If you don’t have a Wolfram|Alpha Pro subscription, you can still get the old step-by-step solutions. All of the original functionality still remains. Every now and then, when you get step-by-step solutions, the app will put up a view advertising a Wolfram|Alpha Pro subscription to you. You can now also subscribe by touching the **Go Pro Now** button in the new Account view:

If you do subscribe to Wolfram|Alpha Pro in the app, your subscription will also work when you sign in with your Wolfram ID on the Wolfram|Alpha website.

If you don’t have a subscription or a Wolfram ID, you can create a free ID using our new in-app sign-up form, which appears when you touch the **New user? Create a Wolfram ID** button in the sign-in form. You can also create one from your web browser and sign in to it inside the app.

We think you’ll like the updated step-by-step solutions. But that’s not the only new feature we have in Wolfram|Alpha 2.0:

- In the Examples view, the segmented control that was used to switch to other tabs, like History and Favorites, has been replaced with a tab bar. That actually involved a pretty large code rewrite.

- The navigation bar title now changes to say “WolframAlpha|Pro” when you are signed in to an account with a Pro subscription.

- If you have a Pro subscription, you’ll also get the image-as-input feature (the camera button you see next to the query bar) thrown in for free. If you don’t want to subscribe to Pro, then the feature is still available as a one-time, in-app purchase, just as it was in Version 1.8.
- If you touch the button in the query bar, the app will now rerun your query, just like it does on the website.
- The built-in examples have been updated to contain all the latest examples and categories that are up on the website.
- As usual, there have been various bug fixes. If we missed a spot, then please send us your feedback using the in-app feedback form. We do read all of your comments.

Wolfram|Alpha 2.0 is available for your iPhone, iPod Touch or iPad running iOS 10 or later. The app runs at full resolution on all shipping iOS devices, supports landscape orientation on the iPhone and supports Slide Over and Split View on the iPad. The app requires an internet connection in order to perform queries, but you can perform arithmetic in the query bar and get results while the device is offline. Upgrades from Version 1.x are free.

In case you’re wondering, we’re also working on bringing Wolfram|Alpha 2.0 to Android phones. It’s not out yet, but you’ll see it soon. Until then, you can download the current release from Google Play, and the 2.0 upgrade will be free once it’s out.

Download Wolfram|Alpha 2.0 on the iOS App Store!

]]>With the recent announcement of the all-new Raspberry Pi 4, we are proud to announce that our latest development, Version 12 of Mathematica and the Wolfram Language, is available for you to use when you get your hands on the Raspberry Pi 4.

Mathematica 12 is a major milestone in our journey that has spanned 30 years, significantly extending the reach of Mathematica and introducing a whole array of new features, including significant expansion of numerical, mathematic and geometric computation, audio and signal processing, text and language processing, machine learning, neural networks and much more. Version 12 gives Mathematica users new levels of power and effectiveness. With thousands of different updates across the system, and 278 new functions in 103 areas, there is so much to explore.

Mathematica 12 performs significantly faster on the Raspberry Pi 4 than previous versions. We found that on average, among the 15 tests included in our benchmark, the Raspberry Pi 4 is 100% quicker in operating Mathematica 12—with certain tests performing even better than that! We are as excited as ever to see the amazing programs and applications users develop using the new Mathematica 12 and Raspberry Pi 4. The full table of the benchmark can be seen here:

We’re pleased to have partnered with Raspberry Pi for more than five years, and one of our collaborative efforts is Wolfram Language Projects for Raspberry Pi. These are small- to medium-sized projects that can be undertaken by anyone who wants an introduction to Mathematica and the Wolfram Language. They range from creating weather dashboards to building tools that use machine learning like sentiment analysis, or using AI for facial recognition. And if you would like to delve deeper, you can run command-line scripts and even do parallel computing.

Stay tuned for upcoming examples of the Wolfram Language on the Raspberry Pi, including explorations of our shared projects!

Check out all of our projects, and see how much you can do with the Wolfram Language on the Raspberry Pi! |

This week, I won some money applying a mathematical strategy to a completely unpredictable gambling game. But before I explain how, I need to give some background on last-mover advantage.

Some time ago, I briefly considered doing some analysis of the dice game Yahtzee. But I was put off by the discovery that several papers (including this one) had already enumerated the entire game state graph to create a strategy for maximizing the expected value of the score (which is 254.59).

However, maximizing the expected value of the score only solves the solo Yahtzee game. In a competitive game, and in many other games, we are not actually trying to maximize our score—we are trying to win, and these are not always the same thing.

To understand this, let’s make a super-simplified version of Yahtzee. In Yahtzee, you throw five dice to try and make poker-like hands that score points. Crucially, if you don’t succeed, you can pick up some or all of the dice and rethrow them up to two times—but if you do, you can’t go back to their previous values.

In my simplified version, we will have only one die and get only one rethrow, and the score is the die value (like the “chance” option in Yahtzee). To make it more nuanced, we will use a 100-sided die. The Wolfram Language lets me represent the distribution of scores after two throws as a symbolic distribution:

✕
throwTwiceDistribution[t_] := Block[{a, b}, TransformedDistribution[ Piecewise[{{a, a > t}}, b], {a \[Distributed] DiscreteUniformDistribution[{1, 100}], b \[Distributed] DiscreteUniformDistribution[{1, 100}]}]] |

In this distribution, *t* represents our threshold for throwing again. It seems pretty obvious that if we want to maximize our expected score, the threshold is 50. If we throw 50 or less on the first throw, we are more likely to improve our score with a second throw than to make it worse. We can compute with our distribution to generate some sample outcomes:

✕
RandomVariate[throwTwiceDistribution[50], 20] |

And calculate the expected value:

✕
Mean[throwTwiceDistribution[50]] |

We can check that our intuition is right by comparing the outcomes of different threshold choices. The maximum expected score is 63 when the threshold is 50:

✕
Max[Table[Mean[throwTwiceDistribution[i]], {i, 100}]] |

✕
ListPlot[Table[Mean[throwTwiceDistribution[i]], {i, 1, 100}], AxesLabel -> {"Threshold", "Expected\nvalue"}] |

So far, so obvious. Why is this not the end of the story? Well, like many games, Yahtzee and my simplified version are “sequential games.” Player 2 is actually playing a different game than Player 1. The rules are the same, but the situation is different.

When Player 2 comes to the table, Player 1’s outcome is already known. Player 2’s aim is not to get the best score, but is rather only to beat Player 1’s score. So the threshold for throwing must be Player 1’s score. If we are losing after our first throw, it doesn’t matter how unlikely we are to improve—we *must* try again. (Anyone who has played Yahtzee knows the situation of being so far behind at the end of the game that only two Yahtzees—five dice the same—in a row can save them. We know we won’t get it, but we try anyway.) Equally, if we have won after one throw, then we don’t throw again, even if that is expected to give us a better score.

The expected value of this strategy is actually lower than Player 1’s:

✕
Mean[ParameterMixtureDistribution[throwTwiceDistribution[t], t \[Distributed] throwTwiceDistribution[50]]] // N |

But if we simulate outcomes and look only at the sign of the difference in scores (so that 1 represents a win for Player 2, –1 a win for Player 1, and 0 a draw), we see that Player 2 wins more than 53% of the time:

✕
relativeCounts[t_] := Counts[Sign[Table[ player1 = RandomVariate[throwTwiceDistribution[50]]; player2 = RandomVariate[throwTwiceDistribution[player1]]; player2 - player1, {10000}]]]/10000.; |

✕
relativeCounts[50] |

Using their last-mover advantage, Player 2 has traded a few big wins for more, smaller wins. But it’s winning that counts, not the size of the win. (There’s actually a little more optimization to be done if Player 2’s first throw is a draw above 50, which might improve Player 2’s chances by another 0.0025.)

As a reminder that one cannot always trust intuition, I have assumed we really were at the end of the story—Player 1 can do nothing about Player 2’s advantage, since he has no advance knowledge of Player 2’s outcome. But a quick experiment demonstrates that Player 1 does know something about the distribution of Player 2’s outcomes, and can optimize his play a little:

✕
player1WinRates = Table[relativeCounts[i][-1], {i, 1, 100}]; |

Knowing that the odds are against him, Player 1 is now a little more reckless. He must throw again if he gets less than 61 on his first throw:

✕
Position[player1WinRates, Max[player1WinRates]] |

It’s still a losing game for Player 1, but slightly less so:

✕
Max[player1WinRates] |

This is now the Nash equilibrium point (where neither side can improve on their tactics, even though they are fully aware of their opponent’s tactics). Interestingly, we see that neither player is trying to optimize their expected score, as they would in a solo game.

The existing solutions for solo Yahtzee involves enumerating around 11 billion outcomes—feasible with modest time and compute resources. But even the two-player game has over 2^48 states, putting it beyond a brute-force solution.

Now back to my gambling challenge, and how I made some money.

I was at a village quiz night, and as is common at such events around here, there was an extra game in the interval to raise more money. In this particular game, everyone who wants to play puts in £1 and stands up. They indicate “heads” or “tails” by placing their hands on their head or body, and the game-master flips a coin. Everyone who is wrong sits down, and the process is repeated until one person is left standing. The winner takes half the money, and the rest goes to a good cause.

Math won’t help me predict the coin, but like Yahtzee, the score is not the point: it is winning that matters, and we can use last-mover knowledge to gain an advantage.

Fortunately, I was stood at the back of the room, and observed that about 60% of the people chose tails, so I chose heads. Each round I chose the least popular option. I am no more likely to be correct; but if I am, I am closer to winning than if I had chosen the opposite choice. On average, I need fewer correct answers to win than anyone else. Let’s analyze….

✕
winP[n_Integer, p_] := Once[Module[{h}, Expectation[1/2 winP[Min[h, n - h], p], h \[Distributed] BinomialDistribution[n, p]]]]; |

✕
winP[0, _] = 1; |

My rule says my chances of winning against *n* players is 1/2 times the chance of winning against the number of players who survived the round with me. My opponents are split according to the `BinomialDistribution`.

I was playing against 44 other players. And assuming that people are unbiased, my chances of winning were 4%:

✕
winP[44, 1/2] // N |

That doesn’t sound good, but it is much better than the probability of 1/45 (0.022) achieved by random guesses. And with a payout of 22.5 to 1, it makes the expected value of playing 0.957, which is almost break-even.

It’s more obvious if you consider the extreme case. If 40 people all choose heads and I choose tails, I have a 1/2 chance of winning. The other 40 people have to share the other 1/2 chance among them, giving them a 1/80 chance of winning.

That calculation assumes that other people played randomly and unbiasedly. One feature of the game is that everyone who survived so far has shared the same prior guesses, and people are not very good at being random—something I relied upon in my rock–paper–scissors blog post. Anecdotally, it felt like around 60% of people shared the same guess most of the time, but for 45 players, only a 5% bias is needed to make the game a winning proposition:

✕
winP[44, 0.55]*22.5 |

The relative advantage of this strategy increases with the number of players. And for the observed bias level, I am three times more likely to win when there are 100 players:

✕
ListPlot[Table[winP[i - 1, 0.6]*i, {i, 1, 100}], AxesLabel -> {"Players", "Relative\nadvantage"}] |

And even if our opponents are unbiased, the expected value of the game becomes positive if there are at least 68 players, assuming they play randomly:

✕
ListPlot[Table[winP[i - 1, 1/2]*i/2, {i, 1, 150}], Epilog -> InfiniteLine[{{0, 1}, {1, 1}}], AxesLabel -> {"Players", "Expected\nvalue"}] |

In many competitive situations, it is important to remember that you are trying to optimize the win, rather than the way you measure the win. For example, many voting systems, such as my own country’s and the United States’s, allow leaders to be elected with a minority of votes, as long as they win more of the regional contests.

It is also important to be aware of how environments are changed by sequential moves. The first move can be advantageous if it reduces the options for later players (such as dominating space in chess, or capturing early adopter customers in a marketplace). But second movers come to the game with more information, such as market reaction to a product launch or by providing information on your opponents’ strategy, as in a bidding or negotiation situation.

I won £22.50 at the quiz night; not exactly life-changing, but if “high-stakes group heads-or-tails” games catch on in Las Vegas, I am ready!

Optimize your own chances of winning with Version 12 of the Wolfram Language, with a host of additions and improvements to probability and statistics functionality. |

Wolfram Community is our favorite, continually growing forum to share and show support for projects using the Wolfram Language, connect with other Mathematica aficionados and find solutions for coding questions. It’s also a great platform for sharing computational innovations that can benefit your local community—or beyond. We’ve collected some of the exciting ways Wolfram Community members have been giving back through Wolfram technology—check them out!

It’s fairly simple to discuss climate change on a global scale, but figuring out impact at the local level can be challenging. John Shonder, business development manager at energy provider NORESCO, took advantage of the range of tools built into the Wolfram Language to comprehensively process and depict data from the National Oceanic and Atmospheric Administration (NOAA), which maintains a dataset of monthly average temperatures for every county in the 48 contiguous states from 1895 to the present. Find your county to see the change in mean annual temperature over the past hundred years.

Traffic-related deaths are on the rise around the world, and several cities have already begun to use three-dimensional painted variations of the standard pedestrian crosswalks to nudge drivers into paying more attention while driving across road junctions. Erik Mahieu used `Manipulate` to create a 3D sidewalk template with many different parameters available, which is an excellent tool to anamorphically keep pedestrians safe in your town.

Martijn Froeling, a biomedical engineer at the University Medical Center Utrecht in the Netherlands, has been developing QMRITools for years, so it’s exciting to see the finished product posted on Wolfram Community! QMRITools can be used for processing and visualizing quantitative MRI data. Martijn developed this toolbox in the context of muscle, nerve and cardiac magnetic resonance imaging, with the primary goals of allowing for fast and batch data processing, as well as facilitating development and prototyping of new functions.

Public health officials currently measure ambulance coverage using fairly outdated techniques. Emergency response times could be improved with a more realistic description using isochrones, which represent all locations that can be traveled to within a certain time limit. William Rudman, a student from King’s College London, built isochrones to show how current official measures overestimate and underestimate coverage in various areas of the South Side of Chicago.

Wolfram’s Jeff Bryant used math and the Wolfram Language to design the surface of an egg and map images of planets’ surfaces onto them. While Easter has come and gone, the Wolfram Community conversation about decorating Easter eggs to look like planets carried on for some time. Check out the post’s comments section for several alternative decorating designs, created by other members!

There are many different styles of learning, and Mark Greenberg, an educational games developer and retired educator from Arizona, kept those in mind while taking a computational dive into teaching poetry. Using the Wolfram Language to graph patterns used by poets, Mr. Greenberg shows how to visually represent subtle poetic concepts, which can be otherwise difficult to teach. Try this tool out to teach elision, secondary stress, masculine/feminine rhyme and more!

Kotaro Okazaki, an inventor working at Fujitsu Limited in Japan, was inspired by the recently released Version 12 of the Wolfram Language to use his favorite new functions, `GeometricScene` and `FindGeometricConjectures`, to test nine famous theorems of geometry. Some of the theorems he tested in this post include the Finsler–Hadwiger theorem, Thales’s theorem and the Brahmagupta theorem. Be sure to read the comments section to see how he tested an additional theorem suggested by another Community member.

Henrik Schachner, a physicist from the Radiation Therapy Center in Weilheim, Germany, was inspired by earlier posts on Wolfram Community to knit images using the Wolfram Language. He explored a computational art challenge that culminated in the invention of a new way to use image processing to design patterns for string art.

Henrik’s post engaged many other users, who flooded the comments section with questions, compliments and their own variations of Henrik’s project. Martijn Froeling, whose work using MRI data was discussed earlier in this post, joined the conversation to figure out a way to add color to Henrik’s black-and-white example. Medical experts from different countries collaborating on a computationally charged art project is just what you’d expect from Wolfram Community.

If you haven’t yet signed up to be a member of Wolfram Community, please do so! You can join in on similar discussions, post your own work in groups that cover your interests and browse the complete list of Staff Picks.

Want to get information about similar Community posts delivered straight to your inbox? Subscribe to the *Wolfram Community Insider* here.

]]>Geocomputation is an indispensable modern tool for analyzing and viewing large-scale data such as population demographics, natural features and political borders. And if you’ve read some of my other posts, you can probably tell that I like working with maps. Recently, a Wolfram Community member asked:

“How do I make an interactive map of the

Byzantine Empire through the years?”

To figure out a solution, we’ll tap into the Wolfram Knowledgebase for some historical entities, as well as some of the high-level geocomputation and visualizations of the Wolfram Language. Once we’ve created our brand-new function, we’ll submit it to the Wolfram Function Repository for anyone to use.

First, let’s see what knowledge is available on the Byzantine Empire. We can grab the appropriate `Entity` using natural language input ( + =):

✕
\!\(\*NamespaceBox["LinguisticAssistant", DynamicModuleBox[{Typeset`query$$ = "byzantine empire", Typeset`boxes$$ = TemplateBox[{"\"Byzantine Empire\"", RowBox[{"Entity", "[", RowBox[{"\"HistoricalCountry\"", ",", "\"ByzantineEmpire\""}], "]"}], "\"Entity[\\\"HistoricalCountry\\\", \ \\\"ByzantineEmpire\\\"]\"", "\"historical country\""}, "Entity"], Typeset`allassumptions$$ = {{"type" -> "Clash", "word" -> "byzantine empire", "template" -> "Assuming \"${word}\" is ${desc1}. Use as ${desc2} instead", "count" -> "3", "Values" -> {{"name" -> "HistoricalCountry", "desc" -> "a historical country", "input" -> "*C.byzantine+empire-_*HistoricalCountry-"}, {"name" -> "HistoricalPeriod", "desc" -> "a historical period", "input" -> "*C.byzantine+empire-_*HistoricalPeriod-"}, {"name" -> "Word", "desc" -> "a word", "input" -> "*C.byzantine+empire-_*Word-"}}}}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1}, Typeset`querystate$$ = {"Online" -> True, "Allowed" -> True, "mparse.jsp" -> 0.401559`6.055294357536461, "Messages" -> {}}}, DynamicBox[ ToBoxes[AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$], Dynamic[Typeset`allassumptions$$], Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], Dynamic[Typeset`querystate$$]], StandardForm], ImageSizeCache -> {128., {7., 16.}}, TrackedSymbols :> {Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}], DynamicModuleValues :> {}, UndoTrackedVariables :> {Typeset`open$$}], BaseStyle -> {"Deploy"}, DeleteWithContents -> True, Editable -> False, SelectWithContents -> True]\) |

✕
empire = Entity["HistoricalCountry", "ByzantineEmpire"]; |

As with most entities, we can use the `"Properties"` element here for a convenient list of built-in information:

✕
empire["Properties"] |

In this case, we’re mainly looking for the political borders; for geographic entities, this is normally stored as the `"Polygon"` property. We can use `GeoGraphics` for a quick preview:

✕
GeoGraphics[empire["Polygon"]] |

The Byzantine Empire (a “historical country”) no longer exists, but this is a snapshot of the empire at its largest state. `Dated` provides access to historical properties by year. Using `All` as the second argument, we can get a list of all polygons over the lifetime of the empire, indexed by year (with `DeleteMissing` removing empty entries):

✕
polygonList = DeleteMissing[Dated[empire, All]["Polygon"], 1, 2];v |

This list contains both dates and polygons, so splitting it into two separate lists will help simplify our code later:

✕
dateList = DateObject /@ polygonList[[All, 1]]; polygonList = polygonList[[All, 2]]; |

To add an additional layer of info to our map, let’s also get a list of the modern countries that overlap with the empire’s former territory:

✕
countryList = empire["CurrentCountries"] |

Now we have all the data we need to start creating our interactive maps; let’s see how we can tweak styles for a well-polished final result.

All of our maps should have a consistent plot range that covers the empire at its peak. We can compute the bounds for our plot using `GeoBounds` and to add a five-degree buffer space around the edges:

✕
bounds = GeoBounds[polygonList, Quantity[5, "AngularDegrees"]] |

To visualize the countries, we’ll use `GeoListPlot` with the appropriate `GeoRange`, applying the `"Satellite"` setting for `GeoBackground` and adding interactive `Tooltip` labels with `GeoLabels`:

✕
countryMap = GeoListPlot[countryList, GeoBackground -> "Satellite", GeoLabels -> (Tooltip[#1, #2["Name"]] &), PlotStyle -> Gray, GeoRange -> bounds] |

As shown in the previous section, we can use `GeoGraphics` to show the empire’s polygon. Using `GeoStyling` directives, we give the polygon a look to distinguish it from the background. The Wolfram Language provides a number of ways to represent colors; for simplicity, we’ll use the built‐in `Orange`. Since this is going to display in front of the other graphic, we’ll set `GeoBackground` to `None`:

✕
empireOverlay = GeoGraphics[{GeoStyling[{ FaceForm[{Opacity[0.7], Orange}], EdgeForm[{Dashed, Darker@Orange}]}], empire["Polygon"]}, GeoBackground -> None, GeoRange -> bounds] |

To make a label for the graphic, we can generate a `Grid` with the name of the empire, the year (standardized using `DateString`), and the total `GeoArea` of the polygon:

✕
date = Extract[dateList, FirstPosition[polygonList, empire["Polygon"]]] |

✕
Grid[{ {empire["Name"], SpanFromLeft}, {"Year: ", DateString[date, {"Year", " ", "CEBCE"}]}, {"Area: ", GeoArea@empire["Polygon"]} }] |

Then we add a frame and wrap the result in `Inset` to define its position; the second argument `{Left, Bottom}` refers to the bottom-left corner of the graphic, and the third argument `{-1.1, -1.2}` gives some extra space around the label:

✕
Inset[ Framed[%], {Left, Bottom}, {-1.1, -1.2}] |

Adding a few more styling options, we can create a function to generate the appropriate labels:

✕
makeLabel[name_, date_, area_] := Inset[ Framed[Grid[{ {Style[name, 14], SpanFromLeft}, {Style["Year: ", Gray], DateString[date, {"Year", " ", "CEBCE"}]}, {Style["Area: ", Gray], QuantityForm[area // Round, "Abbreviation"]} }, Alignment -> Left, BaseStyle -> {"Text", Orange, 10}], FrameStyle -> GrayLevel[0.9], Background -> White, RoundingRadius -> 5], {Left, Bottom}, {-1.1, -1.2}] |

✕
label = makeLabel[empire["Name"], date, GeoArea[empire["Polygon"]]] |

Finally, we combine all the graphics with `Overlay`, setting the third argument to `1` to keep the tooltips active in the first layer. Our inline label can be added to the polygon using `Show` and `Epilog` for a fully styled and labeled interactive map:

✕
Overlay[{countryMap, Show[empireOverlay, Epilog -> {label}, GeoRange -> bounds]}, All, 1] |

Adapting this code slightly, we can make an animation of the empire’s territory over time. First we’ll use `Table` to make lists of empire overlays and labels for each year:

✕
empireOverlays = Table[GeoGraphics[{GeoStyling[{ FaceForm[{Opacity[0.7], Orange}], EdgeForm[{Dashed, Darker@Orange}]}], p}, GeoBackground -> None], {p, polygonList}]; |

✕
labels = Table[ makeLabel[empire["Name"], dateList[[d]], GeoArea[polygonList[[d]]]], {d, Length@dateList}]; |

Then we use `Animate` to display the combined graphics in sequence:

✕
Animate[ Overlay[{countryMap, Show[empireOverlays[[i]], Epilog -> {labels[[i]]}, GeoRange -> bounds]}, All, 1], {{i, 1, "Year"}, 1, Length[labels], 1}] |

On some systems (and for some empires), this animation may not perform very well. That’s because these raw `GeoGraphics` objects are computationally complex, and the animation is trying to overlay two of them in every frame. One easy trick for improving the performance of a heavy-duty animation is to apply `Rasterize` to all the frames and use `ListAnimate` to assemble them:

✕
frames = Table[Rasterize[ Overlay[{countryMap, Show[empireOverlays[[i]], Epilog -> {labels[[i]]}, GeoRange -> bounds]}, All, 1]], {i, 1, Length[labels]}]; ListAnimate[frames] |

This smooths out the final animation, but it also takes substantially longer to evaluate and removes our interactive tooltips. Deciding between more features, faster loading or better performance can be tricky, but fortunately we don’t need to choose—yet. For now, it’s sufficient to note that the interactive version works better in a desktop session, whereas the rasterized version is more appropriate for cloud/web deployment.

Although we started with the Byzantine Empire in mind, this entire workflow can actually work for many of the historical country entities in the language. Others might find this useful—so we’ll publish it through the Wolfram Function Repository for quick access from any Wolfram Language interface.

We can launch the `ResourceFunction` template notebook by selecting File > New > Repository Item > Function Repository Item. First we’ll give the function a name—let’s go with `HistoricalCountryAnimate`—and describe what it does:

Our code goes in the Definition section as a `SetDelayed` (`:=`) expression, structured like so:

✕
HistoricalCountryAnimate[arg_, opts___] := Module[{vars}, expr] |

For this function our argument (` arg`) should be the

✕
HistoricalCountryAnimate[empire_Entity, opts___] := Module[{vars}, expr] |

Using `OptionsPattern`, we can provide a few options (` opts`) for our different output types. We’ll add

✕
HistoricalCountryAnimate[ empire_Entity, OptionsPattern[{ "Tooltips" -> True, GeoBackground -> "Satellite"}] ] := Module[{vars}, expr] |

Most of the code within the `Module` follows what we’ve done in this post so far—defining all local variables (` vars`) and entering the rest of the code to be evaluated (

✕
GeoListPlot[countryList, GeoLabels -> If[OptionValue["Tooltips"], Tooltip[#1, #2["Name"]] &, None], ... ] |

`GeoBackground` can be passed to the graphics using the appropriate `OptionValue` expression:

✕
countryMap = GeoListPlot[countryList, GeoBackground -> OptionValue[GeoBackground], ... ] |

Using similar strategies, we can add any number of options to our function. Check out the published function in the Wolfram Function Repository to see some additional ideas, including a “Rasterize” option for producing a non-interactive version.

In the Documentation section, we add our single usage case, along with details about the function’s behavior (e.g., as in this case, a long evaluation time) and possible options:

The submission notebook also provides space for usage examples, keywords, related symbols and additional information. Once everything is filled out, clicking the Submit to Repository button converts the notebook and sends it for review. This particular function has already been accepted for publication—you can now access it directly from Wolfram Language 12:

✕
ResourceFunction["HistoricalCountryAnimate"] |

With that, we have a fully implemented and published function for displaying an interactive map of the Byzantine Empire or any other historical country. Using the Wolfram Language’s range of geographic entities, visualizations and styles, we can create any number of similar functions. And with the unified structure of the language, it’s easy to apply strategies for a single entity to an entire class. So take a few minutes to explore the Maps & Cartography reference guide and some of the latest geovisualization features. If you make something useful, don’t forget to share it in the Wolfram Function Repository!

Have questions on a project using the Wolfram Language? Share them here and browse other questions from Wolfram Community!

]]>We’re on an exciting path these days with the Wolfram Language. Just three weeks ago we launched the Free Wolfram Engine for Developers to help people integrate the Wolfram Language into large-scale software projects. Now, today, we’re launching the Wolfram Function Repository to provide an organized platform for functions that are built to extend the Wolfram Language—and we’re opening up the Function Repository for anyone to contribute.

The Wolfram Function Repository is something that’s made possible by the unique nature of the Wolfram Language as not just a programming language, but a full-scale computational language. In a traditional programming language, adding significant new functionality typically involves building whole libraries, which may or may not work together. But in the Wolfram Language, there’s so much already built into the language that it’s possible to add significant functionality just by introducing individual new functions—which can immediately integrate into the coherent design of the whole language.

To get it started, we’ve already got 532 functions in the Wolfram Function Repository, in 26 categories:

Just like the 6000+ functions that are built into the Wolfram Language, each function in the Function Repository has a documentation page, with a description and examples:

Go to the page, click to copy the “function blob”, paste it into your input, and then use the function just like a built-in Wolfram Language function (all necessary downloading etc. is already handled automatically in Version 12.0):

✕
ResourceFunction["LogoQRCode"]["wolfr.am/E72W1Chw", CloudGet["https://wolfr.am/EcBjBfzw"]] |

And what’s critical here is that in introducing `LogoQRCode` you don’t, for example, have to set up a “library to handle images”: there’s already a consistent and carefully designed way to represent and work with images in the Wolfram Language—that immediately fits in with everything else in the language:

✕
Table[ImageTransformation[ ResourceFunction["LogoQRCode"]["wolfr.am/E72W1Chw", ColorNegate[CloudGet["https://wolfr.am/EcBjBfzw"]]], #^k &], {k, 1, 2, .25}] |

I’m hoping that—with the help of the amazing and talented community that’s grown up around the Wolfram Language over the past few decades—the Wolfram Function Repository is going to allow rapid and dramatic expansion in the range of (potentially very specialized) functions available for the language. Everything will leverage both the content of the language, and the design principles that the language embodies. (And, of course, the Wolfram Language has a 30+ year history of design stability.)

Inside the functions in the Function Repository there may be tiny pieces of Wolfram Language code, or huge amounts. There may be calls to external APIs and services, or to external libraries in other languages. But the point is that when it comes to user-level functionality everything will fit together, because it’s all based on the consistent design of the Wolfram Language—and every function will automatically “just work”.

We’ve set it up to be as easy as possible to contribute to the Wolfram Function Repository—essentially just by filling out a simple notebook. There’s automation that helps ensure that everything meets our design guidelines. And we’re focusing on coverage, not depth—and (though we’re putting in place an expert review process) we’re not insisting on anything like the same kind of painstaking design analysis or the same rigorous standards of completeness and robustness that we apply to built-in functions in the language.

There are lots of tradeoffs and details. But our goal is to optimize the Wolfram Function Repository both for utility to users, and for ease of contribution. As it grows, I’ve no doubt that we’ll have to invent new mechanisms, not least for organizing a large number of functions, and finding the ones one wants. But it’s very encouraging to see that it’s off to such a good start. I myself contributed a number of functions to the initial collection. Many are based on code that I’ve had for a long time. It only took me minutes to submit them to the Repository. But now that they’re in the Repository, I can—for the first time ever—immediately use the functions whenever I want, without worrying about finding files, loading packages, etc.

We’ve had ways for people to share Wolfram Language code since even before the web (our first major centralized effort was MathSource, built for Mathematica in 1991, using CD-ROMs, etc.). But there’s something qualitatively different—and much more powerful—about the Wolfram Function Repository.

We’ve worked very hard for more than 30 years to maintain the design integrity of the Wolfram Language, and this has been crucial in allowing the Wolfram Language to become not just a programming language, but a full-scale computational language. And now what the Wolfram Function Repository does is to leverage all this design effort to let new functions be added that fit consistently into the framework of the language.

Inside the implementation of each function, all sorts of things can be going on. But what’s critical is that to the user, the function is presented in a very definite and uniform way. In a sense, the built-in functions of the Wolfram Language provide 6000+ consistent examples of how functions should be designed (and our livestreamed design reviews include hundreds of hours of the process of doing that design). But more than that, what ultimately makes the Wolfram Function Repository able to work well is the symbolic character of the Wolfram Language, and all the very rich structures that are already built into the language. If you’ve got a function that deals with images—or sparse arrays, or molecular structures, or geo positions, or whatever—there’s already a consistent symbolic representation of those in the language, and by using that, your function is immediately compatible with other functions in the system.

Setting up a repository that really works well is an interesting meta-design problem. Give too little freedom and one can’t get the functionality one wants. Give too much freedom and one won’t be able to maintain enough consistency. We’ve had several previous examples that have worked very well. The Wolfram Demonstrations Project—launched in 2007 and now (finally) running interactively on the web—contains more than 12,000 contributed interactive demonstrations. The Wolfram Data Repository has 600+ datasets that can immediately be used in the Wolfram Language. And the Wolfram Neural Net Repository adds neural nets by the week (118 so far) that immediately plug into the `NetModel` function in the Wolfram Language.

All these examples have the feature that the kind of thing that’s being collected is well collimated. Yes, the details of what actual Demonstration or neural net or whatever one has can vary a lot, but the fundamental structure for any given repository is always the same. So what about a repository that adds extensions to the Wolfram Language? The Wolfram Language is set up to be extremely flexible—so it can basically be extended and changed in any way. And this is tremendously important in making it possible to quickly build all sorts of large-scale systems in the Wolfram Language. But with this flexibility comes a cost. Because the more one makes use of it, the more one ends up with a separated tower of functionality—and the less one can expect that (without tremendous design effort) what one builds will consistently fit in with everything else.

In traditional programming languages, there’s already a very common problem with libraries. If you use one library, it might be OK. But if you try to use several, there’s no guarantee that they fit together. Of course, it doesn’t help that in a traditional programming language—as opposed to a full computational language—there’s no expectation of even having consistent built-in representations for anything but basic data structures. But the problem is bigger than that: whenever one builds a large-scale tower of functionality, then without the kind of immense centralized design effort that we’ve put into the Wolfram Language, one won’t be able to achieve the consistency and coherence needed for everything to always work well together.

So the idea of the Wolfram Function Repository is to avoid this problem by just adding bite-sized extensions in the form of individual functions—that are much easier to design in a consistent way. Yes, there are things that cannot conveniently be done with individual functions (and we’re soon going to be releasing a streamlined mechanism for distributing larger-scale packages). But with everything that’s already built into the Wolfram Language there’s an amazing amount that individual functions can do. And the idea is that with modest effort it’s possible to create very useful functions that maintain enough design consistency that they fit together and can be easily and widely used.

It’s a tradeoff, of course. With a larger-scale package one can introduce a whole new world of functionality, which can be extremely powerful and valuable. But if one wants to have new functionality that will fit in with everything else, then—unless one’s prepared to spend immense design effort—it’ll have to be smaller scale. The idea of the Wolfram Function Repository is to hit a particular sweet spot that allows for powerful functionality to be added while making it manageably easy to maintain good design consistency.

We’ve worked hard to make it easy to contribute to the Wolfram Function Repository. On the desktop (already in Version 12.0), you can just go to File > New > Repository Item > Function Repository Item and you’ll get a “Definition Notebook” (programmatically, you can also use `CreateNotebook["FunctionResource"]`):

There are two basic things you have to do: first, actually give the code for your function and, second, give documentation that shows how the function should be used.

Press the Open Sample button at the top to see an example of what you need to do:

Essentially, you’re trying to make something that’s like a built-in function in the Wolfram Language. Except that it can be doing something much more specific than a built-in function ever would. And the expectations for how complete and robust it is are much lower.

But you’ll need a name for your function, that fits in with Wolfram Language function naming principles. And you’ll need documentation that follows the same pattern as for built-in functions. I’ll say more later about these things. But for now, just notice that in the row of buttons at the top of the Definition Notebook there’s a Style Guidelines button that explains more about what to do, and there’s a Tools button that provides tools—especially for formatting documentation.

When you think you’re ready, press the Check button. It’s OK if you haven’t gotten all the details right yet. Because Check will automatically go through and do lots of style and consistency checks. Often it will make immediate suggestions for you to approve (“This line should end with a colon” and it’ll offer to put the colon in). Sometimes it will ask you to add or change something yourself. We’ll be continually adding to the automatic functionality of Check, but basically its goal to try to ensure that anything you submit to the Function Repository is already guaranteed to follow as many of the style guidelines as possible.

OK, so after you run Check, you can use Preview. Preview generates a preview of the documentation page that you’ve defined for your function. You can choose to create a preview either in a desktop notebook, or in the cloud. If you don’t like something you see in the preview, just go back and fix it, and press Preview again.

Now you’re ready to deploy your function. The Deploy button provides four options:

The big thing you can do is to submit your function to the Wolfram Function Repository, so it’s available to everyone forever. But you can also deploy your function for more circumscribed use. For example, you can have the function just deployed locally on your computer, so it will be available whenever you use that particular computer. Or you can deploy it to your cloud account, so it will be available to you whenever you’re connected to the cloud. You can also deploy a function publicly through your cloud account. It won’t be in the central Wolfram Function Repository, but you’ll be able to give anyone a URL that’ll let them get your function from your account. (In the future, we’ll also be supporting organization-wide central repositories.)

OK, let’s say you’re ready to actually submit your function to the Wolfram Function Repository. Then, needless to say, you press Submit to Repository. So what happens then? Well, your submission immediately goes into a queue for review and approval by our team of curators.

As your submission goes through the process (which will typically take a few days) you’ll get status messages—as well as maybe suggestions. But as soon as your function is approved, it’ll immediately be published in the Wolfram Function Repository, and available for anyone to use. (And it’ll show up in New Functions digests, etc. etc.)

We have very high standards for the completeness, robustness—and overall quality—of the 6000+ functions that we’ve painstakingly built into the Wolfram Language over the past 30+ years. The goal of the Wolfram Function Repository is to leverage all the structure and functionality that already exists in the Wolfram Language to add as many as possible, much more lightweight, functions.

Yes, functions in the Wolfram Function Repository need to follow the design principles of the Wolfram Language—so they fit in with other functions, and with users’ expectations about how functions should work. But they don’t need to have the same completeness or robustness.

In the built-in functions of the Wolfram Language, we work hard to make things be as general as possible. But in the Wolfram Function Repository, there’s nothing wrong with having a function that just handles some very specific, but useful, case. `SendMailFromNotebook` can accept notebooks in one specific format, and produce mail in one specific way. `PolygonalDiagram` makes diagrams only with particular colors and labeling. And so on.

Another thing about built-in functions is that we go to great pains to handle all the corner cases, to deal with bad input properly, and so on. In the Function Repository it’s OK to have a function that just handles the main cases—and ignores everything else.

Obviously it’s better to have functions that do more, and do it better. But the optimization for the Function Repository—as opposed to for the built-in functions of the Wolfram Language—is to have more functions, covering more functionality, rather than to deepen each function.

What about testing the functions in the Function Repository? The expectations are considerably lower than for built-in functions. But—particularly when functions depend on external resources such as APIs—it’s important to be continually running regression tests, which is what automatically happens behind the scenes. In the Definition Notebook, you can explicitly give (in the Additional Information section) as many tests as you want, defined either by input and output lines or by full symbolic `VerificationTest` objects. In addition, the system tries to turn the documentation examples you give into tests (though this can sometimes be quite tricky, e.g. for a function whose result depends on random numbers, or the time of day).

There’ll be a whole range of implementation complexity to the functions in the Function Repository. Some will be just a single line of code; others might involve thousands or tens of thousands of lines, probably spread over many subsidiary functions. When is it worth adding a function that takes only very little code to define? Basically, if there’s a good name for the function—that people would readily understand if they saw it in a piece of code—then it’s worth adding. Otherwise, it’s probably better just to write the code again each time you need to use it.

The primary purpose of the Function Repository (as its name suggests) is to introduce new functions. If you want to introduce new data, or new entities, then use the Wolfram Data Repository. But what if you want to introduce new kinds of objects to compute with?

There are really two cases. You might want a new kind of object that’s going to be used in new functions in the Function Repository. And in that case, you can always just write down a symbolic representation of it, and use it in the input or output of functions in the Function Repository.

But what if you want to introduce an object and then define how existing functions in the Wolfram Language should operate on it? Well, the Wolfram Language has always had an easy mechanism for that, called upvalues. And with certain restrictions (particularly for functions that don’t evaluate their arguments), the Function Repository lets you just introduce a function, and define upvalues for it. (To set expectations: getting a major new construct fully integrated everywhere in the Wolfram Language is typically a very significant undertaking, that can’t be achieved just with upvalues—and is the kind of thing we do as part of the long-term development of the language, but isn’t what the Function Repository is set up to deal with.)

But, OK, so what can be in the code for functions in the Function Repository? Anything built into the Wolfram Language, of course (at least so long as it doesn’t pose a security risk). Also, any function from the Function Repository. But there are other possibilities, too. A function in the Function Repository can call an API, either in the Wolfram Cloud or elsewhere. Of course, there’s a risk associated with this. Because there’s no guarantee that the API won’t change—and make the function in the Function Repository stop working. And to recognize issues like this, there’s always a note on the documentation page (under Requirements) for any function that relies on more than just built-in Wolfram Language functionality. (Of course, when real-world data is involved, there can be issues even with this functionality—because actual data in the world changes, and even sometimes changes its definitions.)

Does all the code for the Wolfram Function Repository have to be written in the Wolfram Language? The code inside an external API certainly doesn’t have to be. And, actually, nor even does local code. In fact, if you find a function in pretty much any external language or library, you should be able to make a wrapper that allows it to be used in the Wolfram Function Repository. (Typically this will involve using `ExternalEvaluate` or `ExternalFunction` in the Wolfram Language code.)

So what’s the point of doing this? Basically, it’s to leverage the whole integrated Wolfram Language system and its unified design. You get the underlying implementation from an external library or language—but then you’re using the Wolfram Language’s rich symbolic structure to create a convenient top-level function that makes it easy for people to use whatever functionality has been implemented. And, at least in a perfect world, all the details of loading libraries and so on will be automatically taken care of through the Wolfram Language. (In practice, there can sometimes be issues getting external languages set up on a particular computer system—and in the cloud there are additional security issues to worry about.)

By the way, when you first look at typical external libraries, they often seem far too complicated to just be covered by a few functions. But in a great many cases, most of the complexity comes from building up the infrastructure needed for the library—and all the functions to support that. When one’s using the Wolfram Language, however, the infrastructure is usually already built in, and so one doesn’t need to expose all those support functions—and one only needs to create functions for the few “topmost” applications-oriented functions in the library.

If you’ve written functions that you use all the time, then send them in to the Wolfram Function Repository! If nothing else, it’ll be much easier for you to use the functions yourself. And, of course, if you use the functions all the time, it’s likely other people will find them useful too.

Of course, you may be in a situation where you can’t—or don’t want to—share your functions, or where they access private resources. And in such cases, you can just deploy the functions to your own cloud account, setting permissions for who can access them. (If your organization has a Wolfram Enterprise Private Cloud, then this will soon be able to host its own private Function Repository, which can be administered within your organization, and set to force review of submissions, or not.)

Functions you submit to the Wolfram Function Repository don’t have to be perfect; they just have to be useful. And—a bit like the “Bugs” section in classic Unix documentation—there’s a section in the Definition Notebook called “Author Notes” in which you can describe limitations, issues, etc. that you’re already aware of about your function. In addition, when you submit your function you can include Submission Notes that’ll be read by the curation team.

Once a function is published, its documentation page always has two links at the bottom: “Send a message about this function”, and “Discuss on Wolfram Community”. If you send a message (say reporting a bug), you can check a box saying you want your message and contact information to be passed to the author of the function.

Often you’ll just want to use functions from the Wolfram Function Repository like built-in functions, without looking inside them. But if you want to “look inside”, there’s always a Source Notebook button at the top. Press it and you’ll get your own copy of the original Definition Notebook that was submitted to the Function Repository. Sometimes you might just want to look at this as an example. But you can also make your own modifications. Maybe you’ll want to deploy these on your computer or in your cloud account. Or maybe you’ll want to submit these to the Function Repository, perhaps as a better version of the original function.

In the future, we might support Git-style forking in the Function Repository. But for now, we’re keeping it simpler, and we’re always having just one canonical version of each function. And basically (unless they abandon it and don’t respond to messages) the original author of the function gets to control updates to it—and gets to submit new versions, which are then reviewed and, if approved, published.

OK, so how does versioning work? Right now, as soon as you use a function from the Function Repository its definition will get permanently stored on your computer (or in your cloud account, if you’re using the cloud). If there’s a new version of the function, then when you next use the function, you’ll get a message letting you know this. And if you want to update to the new version, you can do that with `ResourceUpdate`. (The “function blob” actually stores more information about versioning, and in the future we’re planning on making this conveniently accessible.)

One of the great things about the Wolfram Function Repository is that any Wolfram Language program anywhere can use functions from it. If the program appears in a notebook, it’s often nice to format Function Repository functions as easy-to-read “function blobs” (perhaps with appropriate versioning set).

But you can always refer to any Function Repository function using a textual `ResourceFunction[...]`. And this is convenient if you’re directly writing code or scripts for the Wolfram Engine, say with an IDE or textual code editor. (And, yes, the Function Repository is fully compatible with the Free Wolfram Engine for Developers.)

Inside the Wolfram Function Repository it’s using exactly the same Resource System framework as all our other repositories (Data Repository, Neural Net Repository, Demonstrations Project, etc.) And like everything else in the Resource System, a `ResourceFunction` is ultimately based on a `ResourceObject`.

Here’s a `ResourceFunction`:

✕
ResourceFunction["StringIntersectingQ"] |

It’s somewhat complicated inside, but you can see some of what’s there using `Information`:

✕
Information[ResourceFunction["StringIntersectingQ"]] |

So how does setting up a resource function work? The simplest is the purely local case. Here’s an example that takes a function (here, just a pure function) and defines it as a resource function for this session:

✕
DefineResourceFunction[1 + # &, "AddOne"] |

Once you’ve made the definition, you can use the resource function:

✕
ResourceFunction["AddOne"][100] |

Notice that in this function blob, there’s a black icon . This indicates the function blob refers to an in-memory resource function defined for your current session. For a resource function that’s permanently stored on your computer, or in a cloud account, there’s a gray icon . And for an official resource function in the Wolfram Function Repository there’s an orange icon .

OK, so what happens when you use the Deploy menu in a Definition Notebook? First, it’ll take everything in the Definition Notebook and make a symbolic `ResourceObject` out of it. (And if you’re using a textual IDE—or a program—you can also explicitly create the `ResourceObject`.)

Deploying locally on your computer uses `LocalCache` on the resource object to store it as a `LocalObject` in your file system. Deploying in your cloud account uses `CloudDeploy` on the resource object, and deploying publicly in the cloud uses `CloudPublish`. In all cases, `ResourceRegister` is also used to register the name of the resource function so that `ResourceFunction["`*name*`"]` will work.

If you press Submit to Function Repository, then what’s happening underneath is that `ResourceSubmit` is being called on the resource object. (And if you’re using a textual interface, you can call `ResourceSubmit` directly.)

By default, the submission is made under the name associated with your Wolfram ID. But if you’re submitting on behalf of a group or an organization, then you can set up a separate Publisher ID, and you can instead use this as the name to associate with your submissions.

Once you’ve submitted something to the Function Repository, it’ll go into the queue for review. If you get comments back, they’ll usually be in the form of a notebook with extra “comment cells” added. You can always check on the status of your submission by going to the Resource System Contributor Portal. But as soon as it’s approved, you’ll be notified (by email), and your submission will be live on the Wolfram Function Repository.

At first, it might seem like it should be possible to take a Definition Notebook and just put it verbatim into the Function Repository. But actually there are quite a few subtleties—and handling them requires doing some fairly sophisticated metaprogramming, symbolically processing both the code defining the function, and the Definition Notebook itself. Most of this happens internally, behind the scenes. But it has some consequences that are worth understanding if you’re going to contribute to the Function Repository.

Here’s one immediate subtlety. When you fill out the Definition Notebook, you can just refer to your function everywhere by a name like `MyFunction`—that looks like an ordinary name for a function in the Wolfram Language. But for the Function Repository documentation, this gets replaced by `ResourceFunction["MyFunction"]`—which is what users will actually use.

Here’s another subtlety: when you create a resource function from a Definition Notebook, all the dependencies involved in the definition of the function need to be captured and explicitly included. And to guarantee that the definitions remain modular, one needs to put everything in a unique namespace. (Needless to say, the functions that do all this are in the Function Repository.)

Usually you’ll never see any evidence of the internal context used to set up this namespace. But if for some reason you return an unevaluated symbol from the innards of your function, then you’ll see that the symbol is in the internal context. However, when the Definition Notebook is processed, at least the symbol corresponding to the function itself is set up to be displayed elegantly as a function blob rather than as a raw symbol in an internal context.

The Function Repository is about defining new functions. And these functions may have options. Often these options will be ones (like, say, `Method` or `ImageSize`) that have already been used for built-in functions, and for which built-in symbols already exist. But sometimes a new function may need new options. To maintain modularity, one might like these options to be symbols defined in a unique internal context (or to be something like whole resource functions in their own right). But to keep things simple, the Function Repository allows new options to be given in definitions as strings. And, as a courtesy to the final user, these definitions (assuming they’ve used `OptionValue` and `OptionsPattern`) are also processed so that when the functions are used, the options can be given not only strings but also as global symbols with the same name.

Most functions just do what they do each time they are called. But some functions need initialization before they can run in a particular session—and to deal with this there’s an Initialization section in the Definition Notebook.

Functions in the Function Repository can immediately make use of other functions that are already in the Repository. But how do you set up definitions for the Function Repository that involve two (or more) functions that refer to each other? Basically you just have to deploy them in your session, so you can refer to them as `ResourceFunction["`*name*`"]`. Then you can create the examples you want, and then submit the functions.

Today we’re just launching the Wolfram Function Repository. But over time we expect it to grow dramatically, and as it grows there are a variety of issues that we know will come up.

The first is about function names and their uniqueness. The Function Repository is designed so that—like for built-in functions in the Wolfram Language—one can refer to any given function just by giving its name. But this inevitably means that the names of functions have to be globally unique across the Repository—so that, for example, there can be only one `ResourceFunction["MyFavoriteFunction"]` in the Repository.

This might seem like a big issue. But it’s worth realizing it’s basically the same issue as for things like internet domains or social network handles. And the point is that one simply has to have a registrar—and that’s one of the roles we’re playing for the Wolfram Function Repository. (For private versions of the Repository, their administrators can be registrars.) Of course an internet domain can be registered without having anything on it, but in the Function Repository the name of a function can only be registered if there’s an actual function definition to go with it.

And part of our role in managing the Wolfram Function Repository is to ensure that the name picked for a function is reasonable given the definition of the function—and that it fits in with Wolfram Language naming conventions. We’ve now had 30+ years of experience in naming built-in functions in the Wolfram Language, and our curation team brings that experience to the Function Repository. Of course, there are always tradeoffs. For example, it might seem nice to have a short name for some function. But it’s better to “name defensively” with a longer, more specific name, because then it’s less likely to collide with something one wants to do in the future.

(By the way, just adding some kind of contributor tag to disambiguate functions wouldn’t achieve much. Because unless one insists on always giving the tag, one will end up having to define a default tag for any given function. Oh, and allocating contributor tags again requires global coordination.)

As the Wolfram Function Repository grows, one of the issues that’s sure to arise is the discoverability of functions. Yes, there’s search functionality (and Definition Notebooks can include keywords, etc.). But for built-in functions in the Wolfram Language there’s all sorts of cross-linking in documentation which helps “advertise” functions. Functions in the Function Repository can link to built-in functions. But what about the other way around? We’re going to be experimenting with various schemes to expose Function Repository functions on the documentation pages for built-in functions.

For built-in functions in the Wolfram Language, there’s also a level of discoverability provided by the network of “guide pages” that give organized lists of functions relevant to particular areas. It’s always complicated to appropriately balance guide pages—and as the Wolfram Language has grown, it’s common for guide pages to have to be completely refactored. It’s fairly easy to put functions from the Function Repository into broad categories, and even to successively break up these categories. But it’s much more valuable to have properly organized guide pages. It’s not yet clear how best to produce these for the whole Function Repository. But for example `CreateResourceObjectGallery` in the Function Repository lets anyone put up a webpage containing their “picks” from the repository:

The Wolfram Function Repository is set up to be a permanent repository of functions, where any function in it will always just work. But of course, there may be new versions of functions. And we fully expect some functions to be obsoleted over time. The functions will still work if they’re used in programs. But their documentation pages will point to new, better functions.

The Wolfram Function Repository is all about providing new functions quickly—and exploring new frontiers for how the Wolfram Language can be used. But we fully expect that some of what’s explored in the Function Repository will eventually make sense to become built-in parts of the core Wolfram Language. We’ve had a slightly similar flow over the past decade from functionality that was originally introduced in Wolfram|Alpha. And one of the lessons is that to achieve the standards of quality and coherence that we insist on for anything built into the Wolfram Language is a lot of work—that usually dwarfs the original implementation effort. But even so, a function in the Function Repository can serve as a very useful proof of concept for a future function built into the Wolfram Language.

And of course the critical thing is that a function in the Function Repository is something that’s available for everyone to use right now. Yes, an eventual built-in function could be much better and stronger. But the Function Repository lets people get access to new functions immediately. And, crucially, it lets those new functions be contributed by anyone.

Earlier in the history of the Wolfram Language this wouldn’t have worked so well. But now there is so much already built into the language—and so strong an understanding of the design principles of the language—that it’s feasible to have a large community of people add functions that will maintain the design consistency to make them broadly useful.

There’s incredible talent in the community of Wolfram Language users. (And, of course, that community includes many of the world’s top people in R&D across a vast range of fields.) I’m hoping that the Wolfram Function Repository will provide an efficient platform for that talent to be exposed and shared. And that together we’ll be able to create something that dramatically expands the domain to which the computational paradigm can be applied.

We’ve taken the Wolfram Language a long way in 30+ years. Now, together, let’s take it much further. And let’s use the Function Repository—as well as things like the Free Wolfram Engine for Developers—as a platform for doing that.

*To comment, please visit the copy of this post at the Stephen Wolfram Blog »*

You know what’s harder than learning the piano? Learning the piano without a piano, and without any knowledge of music theory. For me, acquiring a real piano was out of the question; I had neither the funds nor space in my small college apartment. So naturally, it looked like I would have to build one myself—digitally, of course. And luckily, I had Mathematica, Unity and a few hours to spare. Because working in Unity is incredibly quick and efficient with the Wolfram Language and UnityLink, I’ve created a playable section of piano, and even learned a bit of music theory in the process.

First, I determined that building the piano requires the following:

- Audio for each musical note
- Geometry for the piano keys
- A portable, interactive, real-time-rendering audio and 3D-physics engine

The first two can be accomplished trivially in the Wolfram Language. As for the last one, I opted to use the newly introduced UnityLink—a powerful link between the Wolfram Language and the real-time development platform Unity. Using UnityLink, it’s now possible to combine the advantages of the Wolfram Language’s impressive simulations with regards to rendering, audio and physics, with Unity’s efficient packaging of all three into standalone applications for web, desktop, mobile and console platforms.

Before I dive into the code, let’s explore some of the background on the piano and the musical notes it plays. Understanding the theory behind the physical piano will help us to better recreate it digitally in Unity.

The piano traces its origins back to early 18th-century Italy, where it was invented by Bartolomeo Cristofori. Since then, it’s undergone many design changes, eventually resulting in a (mostly) standardized key configuration.

The modern piano has a total of 88 keys, 52 of which are white and are used to play the natural notes (A, B, C, D, E, F and G). The remaining 36 keys are black and are used to play the accidentals (A♯/B♭, C♯/D♭, D♯/E♭, F♯/G♭ and G♯/A♭). The ♯ and ♭ symbols stand for *sharp* and *flat*, respectively. Here you can see all 88 keys with their corresponding notes labeled:

The notes can be further divided into octaves, each of which contains 12 keys. Two keys with the same note but in different octaves will have different pitches. The octaves of a piano are color-coded in this diagram:

A piano contains seven full octaves, with four extra keys on the ends. These extra keys allow the scales of A minor and C major to be played in all seven octaves.

In this blog post, and for simplicity’s sake, I’ll focus on a single musical scale (ordered list of notes), but you can apply this method to create the entire piano. Let’s use one of the most common scales—the C major scale. This scale contains only the natural notes in the order C, D, E, F, G, A and B. Any C note can be chosen as the start of the scale. Here, I’m going to use the C note in the fourth octave (also known as C4 or middle C):

If you take a closer look, you can see that this subsection of our piano contains all seven natural notes and all five accidentals. Note that I also included the C key from the next octave (C5) in the scale, as this helps “round off” the scale:

Whew! With the background out of the way, I can finally get to the code. To get the sounds of the piano keys I use the symbol `SoundNote`, which can generate any note from a large collection of instruments. For a single note, you simply give it the note name, duration and instrument. When wrapped in `Audio`, it creates an audio object that can be played directly in a notebook:

✕
Audio[SoundNote["C", 3, "Piano"]] |

To get a note in a specific octave, you simply concatenate the octave number to the end of the note name. For instance, I can get all the natural notes in the fourth octave using the code shown here:

✕
naturalNotes = {"C", "D", "E", "F", "G", "A", "B"}; Table[Audio[SoundNote[note <> "4"]], {note, naturalNotes}] // AudioJoin |

The exact shape and dimensions of piano keys vary by manufacturer. I opted to keep things simple by approximating each key as a prism. The advantage of using prisms is that I only need to specify the base polygon and extrude upward. However, ensuring no keys overlap requires five base polygon variations:

All that’s left is to convert the base polygons into 3D prisms. This can be done easily using `RegionProduct` to multiply the polygons by a line segment with a given height:

✕
line = BoundaryMeshRegion[{{0}, {height}}, Point[{{1}, {2}}]]; regions = Table[RegionBoundary[ RegionProduct[BoundaryMeshRegion[polygon], line]], {polygon, polygons}]; Row[regions] |

Now that I have the audio and geometry, it’s time to combine them in Unity to make a working piano. As I mentioned previously, this is made possible with UnityLink.

With Unity installed, loading UnityLink is as simple as a single function call:

✕
Needs["UnityLink`"] |

I start by opening a new Unity project, which I’ve named `"`MyPiano`"`:

✕
UnityOpen["MyPiano"] |

With the project open, I can now send and receive data from Unity. I will eventually want to create my piano in a Scene—a 3D environment that can act as a menu, game level or any other distinct part of a Unity application. But before I create my Scene, I have to first transfer the audio and geometry content I created earlier to Unity. Once it has been added, I will be free to use it in my Scene.

While not required, it’s good practice to keep your Unity project organized with subdirectories in the project’s Assets directory. The Assets directory contains all of the assets used in the project (textures, audio clips, meshes, etc.). In the line shown here, I create five directories in the Assets directory using `CreateUnityAssetDirectory`:

✕
CreateUnityAssetDirectory[{"Meshes", "Audio", "Materials", "Scenes", "Scripts"}]; |

Now I go about transferring the audio. I do this by passing the `Audio` of each note to the function `CreateUnityAudioClip`, which automatically converts it to Unity’s AudioClip object and stores it in the Assets directory. These AudioClip objects are represented as `UnityAudioClip` expressions in the Wolfram Language:

✕
notes = {"C4", "D4", "E4", "F4", "G4", "A4", "B4", "C#4", "D#4", "F#4", "G#4", "A#4", "C5"}; clips = Association[Table[ audio = Audio[SoundNote[note, 2.5, "Piano"]]; note -> CreateUnityAudioClip[File["Audio/note_" <> note], audio], {note, notes}]]; clips // Short |

Next, I transfer the geometry of my piano keys. This time, however, I use `CreateUnityMesh` to automatically convert my `MeshRegions` to Unity’s Mesh objects, represented as `UnityMesh` expressions in the Wolfram Language.

✕
meshes = Table[ CreateUnityMesh[File["Meshes/mesh_" <> ToString[i]], regions[[i]]], {i, Length[regions]}]; meshes // Short |

I do something similar to create a black and a white material, as well as a script component for controlling the user interaction with the piano keys. I’ve left these out for brevity, but the full code can be found in the downloadable notebook for this post.

With all of the Assets transferred, I can finally make the Scene for my piano. I start by creating a new default Scene:

✕
CreateUnityScene[File["Scenes/Piano"]] |

If you’re new to Unity, here’s a brief description of Scenes. Scenes contain Game Objects, which in turn act as containers for Components. You can think of the Scene as an environment, Game Objects as the things in that environment and Components as the behaviors of those things.

In my piano Scene, I’m going to make a Game Object for each key. I’ll then attach the script component I created earlier to each of these game objects, so they make sound and move when the user interacts with them.

I could just add each key one at a time; however, that would prove to be tedious and difficult to extend in the future. Instead, I define the information about each white key and each black key in two lists. I can then iterate over these lists to create each key automatically. For each key, I specify the computer keyboard key it corresponds to, the musical note it should play and the index of the mesh it should use. Note that the mesh index for black keys is implicitly assumed to be 5:

✕
whiteKeys = {<|"Keycode" -> "q", "Note" -> "C4", "Mesh" -> 3|>, Sequence[ Association["Keycode" -> "w", "Note" -> "D4", "Mesh" -> 4], Association["Keycode" -> "e", "Note" -> "E4", "Mesh" -> 2], Association["Keycode" -> "r", "Note" -> "F4", "Mesh" -> 3], Association["Keycode" -> "t", "Note" -> "G4", "Mesh" -> 4], Association["Keycode" -> "y", "Note" -> "A4", "Mesh" -> 4], Association["Keycode" -> "u", "Note" -> "B4", "Mesh" -> 2], Association["Keycode" -> "i", "Note" -> "C5", "Mesh" -> 1]]}; blackKeys = {<|"Keycode" -> "2", "Note" -> "C#4" |>, Sequence[ Association["Keycode" -> "3", "Note" -> "D#4"], Null, Association["Keycode" -> "5", "Note" -> "F#4"], Association["Keycode" -> "6", "Note" -> "G#4"], Association["Keycode" -> "7", "Note" -> "A#4"]]}; |

To keep my Scene organized, I also group all of my keys under a parent game object named `"Piano Scale"`:

✕
parent = CreateUnityTransform["Piano Scale"] |

I iterate over all the white keys first:

✕
Do[ key = whiteKeys[[i]]; name = "Key " <> key["Note"] <> " (White)"; go = CreateUnityGameObject[name, meshes[[key["Mesh"]]]]; go[["Transform", "Position"]] = {(i - 1)*(whiteWidth + gap), 0, 0}; go[["Transform", "Parent"]] = parent; script = CreateUnityComponent[go, "PianoKey"]; script[["Key"]] = key["Keycode"]; script[["Clip"]] = clips[key["Note"]]; , {i, Length[whiteKeys]} ] |

This is followed by the black keys:

✕
Do[ key = whiteKeys[[i]]; name = "Key " <> key["Note"] <> " (White)"; go = UnityLink`CreateUnityGameObject[name, meshes[[key["Mesh"]]]]; go[["Transform", "Position"]] = {(i - 1) (whiteWidth + gap), 0, 0}; go[["Transform", "Parent"]] = parent; script = UnityLink`CreateUnityComponent[go, "PianoKey"]; script[["Key"]] = key["Keycode"]; script[["Clip"]] = clips[key["Note"]]; , {i, Length[blackKeys]} ] |

For each key, I create a Game Object with the appropriate mesh using `CreateUnityGameObject`. After setting the position of this Game Object, I attach the custom script I created earlier by passing the Game Object and script name to CreateUnityComponent. I finish by specifying the keycode and audio clip for that key.

And just like that, I have a working (partial) piano. However, it doesn’t look as good as it could. To remedy this, I adjust the object materials along with the lighting and camera (full code in the downloadable notebook). With this, we get the final result:

Now that looks better! Before moving on, I also want to save all the changes I just made to my Scene by calling `SaveUnityScene`:

✕
SaveUnityScene[] |

To test the piano in the Unity editor, I can use `UnityPlay` and `UnityStop` to switch between the Play and Edit modes. When I’m satisfied with the results, I can build the project to a standalone application using `UnityBuild`.

The following command will automatically build the project to a file in my project directory for my current platform (macOS):

✕
UnityBuild[] |

With the build successful, I can immediately open and play my piano application:

✕
SystemOpen[%["Application"]] |

One of the advantages of working in Unity is its ability to build to numerous platforms without having to change your code. If you can play a game on a platform, odds are that Unity can build to it.

It can even be built to run in a web browser. Go ahead and try it!

This small section of the piano can easily be extended to a full piano keyboard. With more than 160 styles and percussions available in `SoundNote`, you could also build other instruments or even combine them into a single synthesizer.

To start working with UnityLink in the Wolfram Language, visit the online documentation page or try out one of the sample projects. There’s so much you can do with the built-in interface, and I look forward to seeing what projects you come up with on Wolfram Community!

Version 12 brings a host of major new areas into the Wolfram Language, including a seamless interface to the Unity game engine. Start coding today with Wolfram|One or Mathematica, on the desktop or in the Wolfram Cloud. |

I wrote a blog post about the disputed *Federalist Papers*. These were the 12 essays (out of a total of 85) with authorship claimed by both Alexander Hamilton and James Madison. Ever since the landmark statistical study by Mosteller and Wallace published in 1963, the consensus opinion has been that all 12 were written by Madison (the Adair article of 1944, which also takes this position, discusses the long history of competing authorship claims for these essays). The field of work that gave rise to the methods used often goes by the name of “stylometry,” and it lies behind most methods for determining authorship from text alone (that is to say, in the absence of other information such as a physical typewritten or handwritten note). In the case of the disputed essays, the pool size, at just two, is as small as can be. Even so, these essays have been regarded as difficult for authorship attribution due to many statistical similarities in style shared by Hamilton and Madison.

Since late 2016 I have worked with a coauthor, Catalin Stoean, on a method for determining authorship from among a pool of candidate authors. When we applied our methods to the disputed essays, we were surprised to find that the results did not fully align with consensus. In particular, the last two showed clear signs of joint authorship, with perhaps the larger contributions coming from Hamilton. This result is all the more plausible because we had done validation tests that were close to perfect in terms of correctly predicting the author for various parts of those essays of known authorship. These validation tests were, as best we could tell, more extensive than all others we found in prior literature.

Clearly a candidate pool of two is, for the purposes at hand, quite small. Not as small as one, of course, but still small. While our method might not perform well if given hundreds or more candidate authors, it does seem to do well at the more modest (but still important) scale of tens of candidates. The purpose of this blog post is to continue testing our stylometry methods—this time on a larger set of candidates, using prior Wolfram Blog posts as our data source.

As I gave some detail in the “Disputed *Federalist Papers*” blog (DFPb), I’ll recap in brief. We start by lumping certain letters and other characters (numerals, punctuation, white space) into 16 groups. Each of these is then regarded as a pair of digits in base 4 (this idea was explained, with considerable humor, in Tom Lehrer’s song “New Math” from the collection “That Was the Year That Was”). The base-4 digits from a given set of training texts are converted to images using a method called the Frequency Chaos Game Representation (FCGR, a discrete pixelation variant of Joel Jeffrey’s Chaos Game Representation). The images are further processed to give much smaller lists of numbers (technical term: vectors), which are then fed into a machine learning classifier (technical term: black magic). Test images are similarly processed, and the trained classifier is then used to determine the most likely (as a function of the training data and method internals) authorship.

Shortly after we posted the DFPb, I gave a talk about our work at the 2018 Statistical Language and Speech Processing (SLSP) conference (our paper appeared in the proceedings thereof). The session chair, Emmanuel Rayner, was kind enough to post a nice description of this approach. It was both complimentary to our work, and complementary to my prior DFPb discussion of the method (how often does one get to use such long homonyms in the same sentence?), with the added benefit that it is a more clear description than I have given. With his permission, I include a link to this write‐up.

So how well does this work? Short answer: quite well. It is among the highest-scoring methods in the literature on all of the standard benchmark tests that we tried (we used five, not counting the *Federalist Papers*, and that is more than one sees in most of the literature). There are three papers from the past two years that show results that surpass this one on some of these tests. These papers all use ensemble scoring from multiple methods. Maybe someday this one will get extended or otherwise incorporated into such a hybridized approach.

The Wolfram Language code behind all this is not quite ready for general consumption, but I should give some indication of what is involved.

- We convert the text to a simpler alphabet by changing capitals to lowercase and removing diacritics. The important functions are aptly named
`ToLowerCase`and`RemoveDiacritics`. - Replacement rules do the substitutions needed to bring us to a 16-character “alphabet.” For example, we lump the characters {b,d,p} into one class using the rule
`"`p`"|"`d`"`→`"`b`"`. - More replacements take this to base-4 pairs, e.g. {
`"`g`"`→ {0,0},`"`i`"`→ {0,1},…}. - Apply the Frequency Chaos Game Representation algorithm to convert the base-4 strings to two-dimensional images. It amazes me that a fast implementation, using the Wolfram Language function
`Compile`, is only around 10 lines of code. Then again, it amazes me that functional programming constructs such as`FoldList`can be handled by`Compile`. Possibly I am all too easily amazed. - Split the blog posts into training and test sets. Split individual posts further into multiple chunks (this is so we get sufficiently many inputs for the training).
- Process the training images into numeric vectors using the
`SingularValueDecomposition`function (`SVD`, for short). This is commonly used in data science to reduce dimension. The wonderful thing is that it gives good results despite going from two dimensions to one. - Use
`NetChain`and`NetTrain`to construct and train a neural network to associate the numeric vectors with the respective blog authors. - Use information from the
`SVD`of step 6 to convert test images. - Run the trained neural net to assess authorship of the test texts. This gives probability scores. We aggregate scores from all chunks associated with a particular blog, and the highest score determines the final guess at the authorship of that blog.

We turn now to the Wolfram blog posts. These have been appearing since around 2007, and many employees past and present have contributed to this collection. The blog posts are comprised of text, Wolfram Language code (input and output), and graphics and other images. For purposes of applying the code I developed, it is important to have only text; while the area of code authorship is interesting in its own right, and is a growing field, it is quite likely outside the capabilities of the method I use and definitely outside the scope of this blog post. My colleague Chapin Langenheim used blogspertise (something I lack) to work, and was able to provide me with text‐only data—that is, blog posts stripped of code and pictures (thanks, Chapin!).

As the methodology requires some minimal amount of training data, I opted to use only authors who had contributed at least four blog posts. Also, one does not want a large imbalance of data that might bias the classifier in training, so I restricted the data to a maximum of 12 blog posts per author. As it happens, some of the posts are not in any way indicative of the author’s style. These include interviews, announcements of Wolfram events (conferences, summer school programs), lists of Mathematica‐related new books, etc. So I removed these from consideration.

With unusable blog posts removed there are 10 authors with 8 or more published posts, 19 authors with 6 or more and 41 authors with at least 4. I will show results for all 3 groups.

It might be instructive to see images created from blog posts written by the group of most prolific authors. Here we have paired initial and final images from the training data used for each of those 10 authors.

The human eye will be unlikely to make out differences that distinguish between the authors. The software does a reasonable job, though, as we will soon see.

I opted to use around three-quarters of the blog posts from each author for training, and the remaining for testing. More specifically, I took the largest fraction that did not exceed three-quarters for training—so if an author wrote 10 posts, for example, then seven were used for training and three for testing. The selections were done at random, so there should be no overall effect of chronology, e.g. from using exclusively the oldest for training.

In order to give a larger set of training data inputs, each blog was then chopped into three parts of equal length (chunks). The same was done for the test set posts. This latter is not strictly necessary, but it seems to be helpful for assessing authorship of all but very short text. The reason is that parts of a text might fool a given classifier, but this is less likely when probability scores from separate parts are then summed on a per-text basis.

There are, as noted earlier, 10 authors with at least eight blog posts (I am in that set, at the bare minimum not counting this one). Processing for the method in use on this set takes a little over a minute on my desktop machine. After running the classifier, results are that 85.7% of blog chunks are correctly recognized. This is, in my opinion, not bad, though also not great. Aggregating chunk scores for each blog gives a much better result: there are 28 test blog posts (two or three per author), and all 28 are correctly classified as to author.

Now our set of candidates is larger and our average amount of training data has dropped. The correctness results do drop, although they remain respectable. Around 82.6% of chunks are correctly recognized. For the per‐blog aggregated scores, we correctly recognize 41 of 46, with three more being the second “guess” (that is, the author with the second-highest assessed probability).

Here we start to fare poorly. This is not due to the relatively larger candidate pool (at 41), but more because for many, the amount of training data is just not sufficient. We now get around 60% correct recognition at the chunk level, with 64% of the blog posts (50 out of 78) correctly classified according to summed chunk scores, and another 17% going to the second guess.

Here we step into what I will call “blog introspection.” I took the initial draft of this blog post, up to but not including this subsection (since this and later parts were not yet written). After running through the same processing (but not breaking into chunks), we obtain an image.

When tested with the first classifier (the one trained on the 10 most prolific authors), the scores for the respective authors are as follows:

This is fairly typical of the method: several scores are approximately equal at some baseline, with a small number (one, in this case) being somewhat larger. That outlier is the winner. As it happens, that outlier was me, so this blog authorship was correctly attributed.

The experiments indicate how a particular method of stylometry handles a data source with a moderate number of authors (which tends to make the problem easier), but also does not have large amounts of training data per author (making the task more difficult). We have seen an expected degradation in performance as the task becomes harder. Results were certainly not bad, even for the most difficult of the experiment variants.

The method described here and, in more detail, in the DFPb (and in published literature), is only one among a sea of literature. While it has outperformed nearly all others on several benchmarks, it is by no means the last word in authorship attribution via stylometry. A possible future direction might involve hybridizing with one or more different methods (wherein some sort of “ensemble” vote determines the classification).

It should be noted that the training and test data are in a certain sense homogeneous: all come from blog posts that are relevant to Wolfram Research. Another open area is whether, or to what extent, training from one area (where there may be an abundance of data for a given set of authors) might apply to test data from different genres. This is of some interest in the world of digital forensics, and might be a topic for another day.

]]>Any approach to data science can only be as effective as the computational tools driving it; luckily for us, we had the Wolfram Language at our disposal. Leveraging its universal symbolic representation, high-level automation and human readability—as well as its broad range of built-in computation, knowledge and interfaces—streamlined our process to help bring Wolfram|Alpha to fruition. In this post, I’ll discuss some key tenets of the multiparadigm approach, then demonstrate how they combine with the computational intelligence of the Wolfram Language to make the ideal workflow for not only discovering and presenting insights from your data, but also for creating scalable, reusable applications that optimize your data science processes.

Given all the buzzwords floating around over the past few years—automated machine learning, edge AI, adversarial neural networks, natural language processing—you might be tempted to grab a sleek new method from arXiv and call it a solution. And while this can be a convenient way to solve a specific problem at hand, it also tends to limit the range of questions you can answer. One main goal of the multiparadigm approach is to remove those kinds of constraints from your workflow, instead letting questions and curiosity drive your analysis.

Leading with questions is easiest when you start from a high level. Wolfram Language functions use built-in parameter selection, which lets you focus on the overarching task rather than the technical details. Your workflow might require data from any number of sources and formats; `Import` automatically detects the file type and structure of your data:

✕
Import["surveydata.csv"]// Shallow |

`SemanticImport` goes even further, interpreting expressions in each field and displaying everything in an easy-to-view `Dataset`:

✕
data = SemanticImport["surveydata.csv"] |

Apply `FindDistribution` to your data, and it auto-selects a fitting method to give you an approximate distribution:

✕
dist = FindDistribution[ages = Normal@data[All, "Age"]] |

Another quick line of code generates a `SmoothHistogram` plot comparing the actual data to the computed distribution:

✕
SmoothHistogram[{ages, RandomVariate[dist, Length[ages]]}, PlotLegends -> {"Data", "Computed"}] |

You can then ask yourself, “What is going on in that graph?”, drill down with more computation and get more info about the fit:

✕
DistributionFitTest[ages, dist, "TestDataTable"] |

Go deeper by trying specific distributions:

✕
Column[Table[ DistributionFitTest[ages, FindDistribution[ages,TargetFunctions->{d}],"TestDataTable"],{d,{NormalDistribution,PascalDistribution,NegativeBinomialDistribution}}]] |

The simple input-output flow of a Wolfram Notebook helps move the process forward, with each step building on the previous computation. Every evaluation gives clear output that can be used immediately in further analysis, letting you code as fast as you think (or, at least, as fast as you type).

Using a question-answer workflow with human-readable functions and interactive coding gives you unprecedented freedom for computational exploration.

Although it’s easy to think of data science as a numbers game, the best insights often come from exploring images, audio, text, graphs and other data. In most systems, this involves either tracking down and switching between frameworks to support different data types or writing custom code to convert everything to the appropriate type and structure.

But again, underlying technical details shouldn’t be the focus of your data science workflow. To simplify the process, the Wolfram Data Framework (WDF) expresses everything the same way: tables and matrices, text, images, graphics, graphs and networks are all represented as symbols:

This reduces the time and effort for reading standard data types, sorting through unstructured data or handling mixed data types. Wolfram Language functions generally work on a variety of data types, such as `LearnDistribution` (big brother to `FindDistribution`), which can generate a probability distribution for a set of images:

✕
images = ResourceData["CIFAR-100"][[All, 1]]; |

✕
ld = LearnDistribution[images] |

The distribution can be used to examine the likelihood of a given image being from the set:

✕
Grid[Table[{i, RarerProbability[ld, i]} , {i, CloudGet["https://wolfr.am/DLdsWtJA"]}]] |

You can even generate new representative samples:

✕
RandomVariate[ld, 10] |

WDF also makes it easy to construct high-level entities with uniformly structured data. This is especially useful for representing complex real-world concepts:

✕
us = Entity["Country", "UnitedStates"]; RandomSample[us["Properties"], 5] |

You can use high-level natural language input for immediate access to entities and their properties:

✕
states = EntityList[\!\(\*NamespaceBox["LinguisticAssistant", DynamicModuleBox[{Typeset`query$$ = "all us states", Typeset`boxes$$ = TemplateBox[{"\"all US states with District of Columbia\"", RowBox[{"EntityClass", "[", RowBox[{"\"AdministrativeDivision\"", ",", "\"AllUSStatesPlusDC\""}], "]"}], "\"EntityClass[\\\"AdministrativeDivision\\\", \ \\\"AllUSStatesPlusDC\\\"]\"", "\"administrative divisions\""}, "EntityClass"], Typeset`allassumptions$$ = {{"type" -> "SubCategory", "word" -> "all us states", "template" -> "Assuming ${desc1}. Use ${desc2} instead", "count" -> "2", "Values" -> {{"name" -> "AllUSStatesPlusDC", "desc" -> "all US states with District of Columbia", "input" -> "*DPClash.USStateEC.all+us+states-_*AllUSStatesPlusDC-\ "}, {"name" -> "USStatesAllStates", "desc" -> "all US states", "input" -> "*DPClash.USStateEC.all+us+states-_*USStatesAllStates-\ "}}}}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, Typeset`querystate$$ = {"Online" -> True, "Allowed" -> True, "mparse.jsp" -> 0.560174`6.199867941042123, "Messages" -> {}}}, DynamicBox[ ToBoxes[AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$], Dynamic[Typeset`allassumptions$$], Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], Dynamic[Typeset`querystate$$]], StandardForm], ImageSizeCache -> {439., {7., 15.}}, TrackedSymbols :> {Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}], DynamicModuleValues :> {}, UndoTrackedVariables :> {Typeset`open$$}], BaseStyle -> {"Deploy"}, DeleteWithContents -> True, Editable -> False, SelectWithContents -> True]\)] |

Entities and associated data values can then be used and combined in computations:

✕
GeoRegionValuePlot[ states -> EntityProperty["AdministrativeDivision", "Population"], GeoRange -> us] |

You can build entire computational workflows based on this curated knowledge. Custom entities made with `EntityStore` have the same flexibility. With data unified through WDF, you won’t have to worry about the size, type or structure of data—leaving more time for finding answers.

Discovery comes from trying new ideas, so a truly discovery-focused data science workflow should go beyond standard areas like statistics and machine learning. You can find more by testing and combining different computational techniques on your data—borrowing from geocomputation, time series analysis, signal processing, network analysis and engineering. To do that, you need algorithms for every subject and discipline, as well as the ability to change techniques without a major code rewrite.

Fortunately, Wolfram Language code uses the same symbolic structure as its data, ensuring maximum compatibility with no preprocessing required. Computational methods (e.g. model selection, data resampling, plot styles) are also standardized and automated across a range of functionality. Syntax, logics and conventions are uniform no matter what domain you’re exploring:

For a case in point, look at our data exploration of sensors from the ThrustSSC supersonic car. Standard data partitioning and time series techniques were useful in translating and understanding the data. But we also opted to try a few unconventional approaches, such as using `CommunityGraphPlot` to group together similar datasets:

A dive into signal processing—specifically, wavelet analysis—led to the discovery of certain discontinuities in the vibrational frequency of a wheel:

As it turned out, these gaps represented the wheel’s top edge crossing the sound barrier—a phenomenon that was understood by the engineers but had not been verified by previous analyses. Even a quick exploration using a broad, high-level toolset can provide better insights with less expertise (and a *lot* less code) than more siloed approaches.

Data science doesn’t stop with the discovery of answers; you need to interpret and share your results before anyone can act on them. That means presenting the right information to the right people in the right way, whether it’s sending a few static visualizations, deploying an interactive desktop or web app, generating an automated report or making a full write-up of your project. One major goal of the multiparadigm approach is to express insights in the clearest way possible, regardless of context.

For the basic cases, the Wolfram Language has visualizations for a range of analyses, with high-level plot themes and the ability to add labels, frames and other details all inline:

✕
cities = EntityClass[ "City", {EntityProperty["City", "Country"] -> Entity["Country", "UnitedStates"], EntityProperty["City", "Population"] -> TakeLargest[10]}][ "Entities"]; |

✕
BubbleChart[ EntityValue[cities, {"GiniIndex", "Area", "PerCapitaIncome"}], PlotTheme -> "Marketing", ColorFunction -> "TemperatureMap", ChartLabels -> Callout[ EntityValue[cities, "Name"]], PlotLabel -> Style[ "Gini Index Data: 10 Largest U.S. Cities", "Title", 24]] |

And using functions like `Manipulate`, you can interactively explore additional parameters to find patterns in the data:

✕
With[{params = { EntityProperty["City", "HousingAffordability"], EntityProperty["City", "MedianHouseholdIncome"], EntityProperty["City", "PopulationByEducationalAttainment"], EntityProperty["City", "PublicTransportationAnnualPassengerMiles"], EntityProperty["City", "UnemploymentRate"], EntityProperty["City", "RushHours"]}}, Manipulate[ BubbleChart[EntityValue[cities, {"GiniIndex", y, z}]], {y, params}, {z, params}]] |

Documenting your project’s workflow can be equally important; a clear, detailed narrative gives real-world context to an analysis and makes it understandable to a broad audience. The combination of interactive, human-readable code with plain language properly organized creates what Stephen Wolfram calls a computational essay:

This kind of high-level document is easy to produce using Wolfram Notebooks, which let you mix code, text, images, interfaces and other expressions in a hierarchical cell structure. With built-in interactivity and live code, the audience can follow the same discovery process as the author.

Computational essays are typical of the kind of output produced by the multiparadigm approach. But sometimes you need more information with fewer words—say, a financial dashboard:

From there, you can send the notebook to anyone for interactive viewing with Wolfram Player. Or deploy it as a web form in the Wolfram Cloud, add permissions for your colleagues and set up an automated report. You could even set up an API so others can create their own interfaces. It’s all built into the language.

Every interface has its unique use for data scientists and end users. The Wolfram Language gives you access to the full spectrum of interfaces for analyzing and reporting on your data—and they can be made permanently accessible for interactive viewing from anywhere, making them ideal for sharing ideas with the wider world.

Following these principles leads to an optimized workflow that includes every aspect of the data science process, from data to deployment. Taking it a step further, the multiparadigm approach uses automation as much as possible, both simplifying the task at hand and making subsequent explorations easier.

This brings us back to Wolfram|Alpha: an adaptive web application that accepts a broad range of input styles, automatically chooses the appropriate data source for a given task and runs optimized computations on that data to provide answers at scale. When viewed as a whole, the system exemplifies the multiparadigm approach.

For instance, take a question involving revenue and GDP:

In this case, the system must first interpret the natural language statement, retrieving the entities and functions necessary to compute the ratio of revenue to GDP during a given time period. But beyond having access to the right data, it must be able to bring those different data sources together instantaneously when a response is needed.

Another example is revenue forecasting:

On top of the steps from the previous example, this computation also uses automated model selection, in this case choosing log-normal random walks. And in both cases, the system returns additional information to the user, all organized in a high-level report.

Wolfram|Alpha can be used in this way for all kinds of analysis, at any scale, always using the latest algorithms and data—making the full data science process available through simple natural language queries.

Ultimately, the best insights come from augmenting human curiosity with intelligent computation—and that’s exactly what multiparadigm data science in the Wolfram Language achieves. The result is a scalable, start-to-finish computational workflow designed around human understanding and usability.

Making high-level computation more accessible leads to the democratization of data science, giving anyone with questions immediate access to answers. The multiparadigm approach is more than just a new method for data science; creating and sharing high-level tools for interactive exploration make possible a new generation of data science.

So what kind of insights does your data hold? There’s only one way to find out: start exploring!

Preview Wolfram U’s upcoming open online course to learn more about the multiparadigm data science project workflow.

]]>