Wolfram Computation Meets Knowledge

New Wolfram U Course Explores Data Visualization

Go to Free Course

After a few months of brainstorming ideas, developing notebooks and scripts and refining videos through several rounds of editing and refilming, we are pleased to announce that the Visual Explorations in Data Science massive open online course (MOOC) is now available.

The two guiding principles of this course are visualization and an example-driven approach. We employ a hands-on methodology for teaching data science with examples that slowly introduce various technical features, all of which are supplemented with an emphasis on visualization. The course consists of a dozen case studies spanning geography to engineering and analyzing flag similarity to periodic trends.

Another point stressed in the course is that data is literally everywhere. That’s why its topics include a wide range of fields, including society and culture, engineering, weather, physical science, geography and geology, and last but not least, finance and money. In this post, we’ll explore just a few examples that introduce the course’s style and unique approach.

Which Languages Have the Longest Words?

As with any data science challenge, the first step—and usually the biggest headache—is finding clean and (if you’re lucky) curated data. To answer the question of which languages have the longest words, we’ll begin with a list of words across languages. And, as we emphasize repeatedly in the course, a characteristic trait of the Wolfram Language is its ready access to the vast Wolfram database of knowledge about the world.

In this case, you can use the built-in function WordList to extract clean, organized data. Before we do that to solve this problem, however, let’s take a peek into the distribution of word length across languages:

Engage with the code in this post by downloading the Wolfram Notebook
Multiple language histograms
&#10005


The next step is to build probability distributions based on these histograms. A cursory glance of the English histogram shows there are very few words over 15 letters. So, let’s set the threshold for a “very long” word at 15:

Module
&#10005


A few languages have a large percentage of really long words: Germanic languages (including German, Danish, Faroese and Swedish) and Uralic languages (such as Finnish and Hungarian) are the longest; Romance languages (like Spanish, French and Italian) are medium length; and Semitic languages (like Arabic and Hebrew) are the shortest. If you crunch the numbers, it turns out you’re hundreds of times more likely to encounter a very long word in German or Finnish than Hebrew or Arabic.

Through interesting examples like this one, we hope to offer a unique look into how data science can be used in different scenarios and show that it is very much a practical science. Throughout all course videos, we encourage curiosity, even asking students to do their own side explorations and suggesting some possible questions to investigate.

Now let’s explore a totally different concept that demands a completely different type of data.

Which US City Is the Windiest?

Sure, everyone says it’s Chicago. But we set out to find the true windiest city using 2019 weather data, and what we found was shocking! From the get-go, visualizing the wind speed distribution of big cities like Chicago showed it’s not the windiest city.

You can generate the following insightful visualization with only a few lines of Wolfram Language code. Here we obtain the wind data of the largest US cities throughout 2019 and visualize the wind speed distribution of each city in a box-whisker chart:

Module
&#10005


After more analysis, it was evident San Francisco is much windier than Chicago—at least in 2019. Perhaps we should name San Francisco the new Windy City!

Which Isotopes Are Stable?

This last example is much more technical. The course video Exploring the Stability of Isotopes uses a wide variety of visualizations to demonstrate isotope stability and the trends behind it.

One example is the popular illustration of the “belt” or “band” of stability, a region in the plot of atomic versus neutron numbers that harbors particularly stable isotopes:

Module
&#10005


As you can see, a few lines of code produce a clean visualization.

From technical scientific inquiries like this one to interesting cultural-reference explorations, the course’s content was deliberately designed to cater to all different tastes and interests and to also show the versatility of data science.

In the spirit of this blog post, we encourage you to use the Wolfram Language to investigate something you’re curious about. The Wolfram Language has built-in knowledge that spans everything from countries and celebrities to Pokémon characters. Who knows what new insights you may develop? Go out there and explore the world’s data! If you’re in need of a guide for doing this, we encourage you to take Visual Explorations in Data Science and learn valuable skills for your next data adventure.

Finally, creating this course would not have been possible without many subject matter experts and great teamwork across different groups at Wolfram. We’d like to thank Michael Trott for his insights into possible examples, the Wolfram U team for their nonstop support and the video team for their technical feedback.

Check out Wolfram U for a wealth of free interactive courses, video courses and special events.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.