Wolfram Blog http://blog.wolfram.com News, views, & ideas from the front lines at Wolfram Research Thu, 16 Apr 2015 21:11:36 +0000 en hourly 1 http://wordpress.org/?v=3.2.1 Scientific Bug Hunting in the Cloud: An Unexpected CEO Adventure http://blog.wolfram.com/2015/04/16/scientific-bug-hunting-in-the-cloud-an-unexpected-ceo-adventure/ http://blog.wolfram.com/2015/04/16/scientific-bug-hunting-in-the-cloud-an-unexpected-ceo-adventure/#comments Thu, 16 Apr 2015 18:39:22 +0000 Stephen Wolfram http://blog.internal.wolfram.com/?p=25667 The Wolfram Cloud Needs to Be Perfect

The Wolfram Cloud is coming out of beta soon (yay!), and right now I’m spending much of my time working to make it as good as possible (and, by the way, it’s getting to be really great!). Mostly I concentrate on defining high-level function and strategy. But I like to understand things at every level, and as a CEO, one’s ultimately responsible for everything. And at the beginning of March I found myself diving deep into something I never expected…

Here’s the story. As a serious production system that lots of people will use to do things like run businesses, the Wolfram Cloud should be as fast as possible. Our metrics were saying that typical speeds were good, but subjectively when I used it something felt wrong. Sometimes it was plenty fast, but sometimes it seemed way too slow.

We’ve got excellent software engineers, but months were going by, and things didn’t seem to be changing. Meanwhile, we’d just released the Wolfram Data Drop. So I thought, why don’t I just run some tests myself, maybe collecting data in our nice new Wolfram Data Drop?

A great thing about the Wolfram Language is how friendly it is for busy people: even if you only have time to dash off a few lines of code, you can get real things done. And in this case, I only had to run three lines of code to find a problem.

First, I deployed a web API for a trivial Wolfram Language program to the Wolfram Cloud:

In[1]:= CloudDeploy[APIFunction[{}, 1 &]]

Then I called the API 50 times, measuring how long each call took (% here stands for the previous result):

In[2]:= Table[First[AbsoluteTiming[URLExecute[%]]], {50}]

Then I plotted the sequence of times for the calls:

In[3]:= ListLinePlot[%]

And immediately there seemed to be something crazy going on. Sometimes the time for each call was 220 ms or so, but often it was 900 ms, or even twice that long. And the craziest thing was that the times seemed to be quantized!

I made a histogram:

In[4]:= Histogram[%%, 40]

And sure enough, there were a few fast calls on the left, then a second peak of slow calls, and a third “outcropping” of very slow calls. It was weird!

I wondered whether the times were always like this. So I set up a periodic scheduled task to do a burst of API calls every few minutes, and put their times in the Wolfram Data Drop. I left this running overnight… and when I came back the next morning, this is what I saw:

Graph of API calls, showing strange, large-scale structure

Even weirder! Why the large-scale structure? I could imagine that, for example, a particular node in the cluster might gradually slow down (not that it should), but why would it then slowly recover?

My first thought was that perhaps I was seeing network issues, given that I was calling the API on a test cloud server more than 1000 miles away. So I looked at ping times. But apart from a couple of weird spikes (hey, it’s the internet!), the times were very stable.

Ping times

 

Something’s Wrong inside the Servers

OK, so it must be something on the servers themselves. There’s a lot of new technology in the Wolfram Cloud, but most of it is pure Wolfram Language code, which is easy to test. But there’s also generic modern server infrastructure below the Wolfram Language layer. Much of this is fundamentally the same as what Wolfram|Alpha has successfully used for half a dozen years to serve billions of results, and what webMathematica started using even nearly a decade earlier. But being a more demanding computational system, the Wolfram Cloud is set up slightly differently.

And my first suspicion was that this different setup might be causing something to go wrong inside the webserver layer. Eventually I hope we’ll have pure Wolfram Language infrastructure all the way down, but for now we’re using a webserver system called Tomcat that’s based on Java. And at first I thought that perhaps the slowdowns might be Java garbage collection. Profiling showed that there were indeed some “stop the world” garbage-collection events triggered by Tomcat, but they were rare, and were taking only milliseconds, not hundreds of milliseconds. So they weren’t the explanation.

By now, though, I was hooked on finding out what the problem was. I hadn’t been this deep in the trenches of system debugging for a very long time. It felt a lot like doing experimental science. And as in experimental science, it’s always important to simplify what one’s studying. So I cut out most of the network by operating “cloud to cloud”: calling the API from within the same cluster. Then I cut out the load balancer, that dispatches requests to particular nodes in a cluster, by locking my requests to a single node (which, by the way, external users can’t do unless they have a Private Cloud). But the slowdowns stayed.

So then I started collecting more-detailed data. My first step was to make the API return the absolute times when it started and finished executing Wolfram Language code, and compare those to absolute times in the wrapper code that called the API. Here’s what I saw:

The blue line shows the API-call times from before the Wolfram Language code was run; the gold line, after.

The blue line shows times before the Wolfram Language code is run; the gold line after. I collected this data in a period when the system as a whole was behaving pretty badly. And what I saw was lots of dramatic slowdowns in the “before” times—and just a few quantized slowdowns in the “after” times.

Once again, this was pretty weird. It didn’t seem like the slowdowns were specifically associated with either “before” or “after”. Instead, it looked more as if something was randomly hitting the system from the outside.

One confusing feature was that each node of the cluster contained (in this case) 8 cores, with each core running a different instance of the Wolfram Engine. The Wolfram Engine is nice and stable, so each of these instances was running for hours to days between restarts. But I wondered if perhaps some instances might be developing problems along the way. So I instrumented the API to look at process IDs and process times, and then for example plotted total process time against components of the API call time:

Total process time plotted against components of the API call time

And indeed there seemed to be some tendency for “younger” processes to run API calls faster, but (particularly noting the suppressed zero on the x axis) the effect wasn’t dramatic.

 

What’s Eating the CPU?

I started to wonder about other Wolfram Cloud services running on the same machine. It didn’t seem to make sense that these would lead to the kind of quantized slowdowns we were seeing, but in the interest of simplifying the system I wanted to get rid of them. At first we isolated a node on the production cluster. And then I got my very own Wolfram Private Cloud set up. Still the slowdowns were there. Though, confusingly, at different times and on different machines, their characteristics seemed to be somewhat different.

On the Private Cloud I could just log in to the raw Linux system and start looking around. The first thing I did was to read the results from the “top” and “ps axl” Unix utilities into the Wolfram Language so I could analyze them. And one thing that was immediately obvious was that lots of “system” time was being used: the Linux kernel was keeping very busy with something. And in fact, it seemed like the slowdowns might not be coming from user code at all; they might be coming from something happening in the kernel of the operating system.

So that made me want to trace system calls. I hadn’t done anything like this for nearly 25 years, and my experience in the past had been that one could get lots of data, but it was hard to interpret. Now, though, I had the Wolfram Language.

Running the Linux “strace” utility while doing a few seconds of API calls gave 28,221,878 lines of output. But it took just a couple of lines of Wolfram Language code to knit together start and end times of particular system calls, and to start generating histograms of system-call durations. Doing this for just a few system calls gave me this:

System-call durations--note the clustering...

Interestingly, this showed evidence of discrete peaks. And when I looked at the system calls in these peaks they all seemed to be “futex” calls—part of the Linux thread synchronization system. So then I picked out only futex calls, and, sure enough, saw sharp timing peaks—at 250 ms, 500 ms and 1s:

System-call durations for just the futex calls--showing sharp timing peaks

But were these really a problem? Futex calls are essentially just “sleeps”; they don’t burn processor time. And actually it’s pretty normal to see calls like this that are waiting for I/O to complete and so on. So to me the most interesting observation was actually that there weren’t other system calls that were taking hundreds of milliseconds.

 

The OS Is Freezing!

So… what was going on? I started looking at what was happening on different cores of each node. Now, Tomcat and other parts of our infrastructure stack are all nicely multithreaded. Yet it seemed that whatever was causing the slowdown was freezing all the cores, even though they were running different threads. And the only thing that could do that is the operating system kernel.

But what would make a Linux kernel freeze like that? I wondered about the scheduler. I couldn’t really see why our situation would lead to craziness in a scheduler. But we looked at the scheduler anyway, and tried changing a bunch of settings. No effect.

Then I had a more bizarre thought. The instances of the Wolfram Cloud I was using were running in virtual machines. What if the slowdown came from “outside The Matrix”? I asked for a version of the Wolfram Cloud running on bare metal, with no VM. But before that was configured, I found a utility to measure the “steal time” taken by the VM itself—and it was negligible.

By this point, I’d been spending an hour or two each day for several days on all of this. And it was time for me to leave for an intense trip to SXSW. Still, people in our cloud-software engineering team were revved up, and I left the problem in their capable hands.

By the time my flight arrived there was already another interesting piece of data. We’d divided each API call into 15 substeps. Then one of our physics-PhD engineers had compared the probability for a slowdown in a particular substep (on the left) to the median time spent in that substep (on the right):

Bars on the left show the probability for a slowdown in particular substeps; bars on the right show the median time spent in each of those substeps

With one exception (which had a known cause), there was a good correlation. It really looked as if the Linux kernel (and everything running under it) was being hit by something at completely random times, causing a “slowdown event” if it happened to coincide with the running of some part of an API call.

So then the hunt was on for what could be doing this. The next suspicious thing noticed was a large amount of I/O activity. In the configuration we were testing, the Wolfram Cloud was using the NFS network file system to access files. We tried tuning NFS, changing parameters, going to asynchronous mode, using UDP instead of TCP, changing the NFS server I/O scheduler, etc. Nothing made a difference. We tried using a completely different distributed file system called Ceph. Same problem. Then we tried using local disk storage. Finally this seemed to have an effect—removing most, but not all, of the slowdown.

We took this as a clue, and started investigating more about I/O. One experiment involved editing a huge notebook on a node, while running lots of API calls to the same node:

Graph of system time, user time, and API time spent editing a huge notebook--with quite a jump while the notebook was being edited and continually saved
The result was interesting. During the period when the notebook was being edited (and continually saved), the API times suddenly jumped from around 100 ms to 500 ms. But why would simple file operations have such an effect on all 8 cores of the node?

 

The Culprit Is Found

We started investigating more, and soon discovered that what seemed like “simple file operations” weren’t—and we quickly figured out why. You see, perhaps five years before, early in the development of the Wolfram Cloud, we wanted to experiment with file versioning. And as a proof of concept, someone had inserted a simple versioning system named RCS.

Plenty of software systems out there in the world still use RCS, even though it hasn’t been substantially updated in nearly 30 years and by now there are much better approaches (like the ones we use for infinite undo in notebooks). But somehow the RCS “proof of concept” had never been replaced in our Wolfram Cloud codebase—and it was still running on every file!

One feature of RCS is that when a file is modified even a tiny bit, lots of data (even several times the size of the file itself) ends up getting written to disk. We hadn’t been sure how much I/O activity to expect in general. But it was clear that RCS was making it needlessly more intense.

Could I/O activity really hang up the whole Linux kernel? Maybe there’s some mysterious global lock. Maybe the disk subsystem freezes because it doesn’t flush filled buffers quickly enough. Maybe the kernel is busy remapping pages to try to make bigger chunks of memory available. But whatever might be going on, the obvious thing was just to try taking out RCS, and seeing what happened.

And so we did that, and lo and behold, the horrible slowdowns immediately went away!

So, after a week of intense debugging, we had a solution to our problem. And repeating my original experiment, everything now ran cleanly, with API times completely dominated by network transmission to the test cluster:

Clean run times! Compare this to the In[3] image above.

 

The Wolfram Language and the Cloud

What did I learn from all this? First, it reinforced my impression that the cloud is the most difficult—even hostile—development and debugging environment that I’ve seen in all my years in software. But second, it made me realize how valuable the Wolfram Language is as a kind of metasystem, for analyzing, visualizing and organizing what’s going on inside complex infrastructure like the cloud.

When it comes to debugging, I myself have been rather spoiled for years—because I do essentially all my programming in the Wolfram Language, where debugging is particularly easy, and it’s rare for a bug to take me more than a few minutes to find. Why is debugging so easy in the Wolfram Language? I think, first and foremost, it’s because the code tends to be short and readable. One also typically writes it in notebooks, where one can test out, and document, each piece of a program as one builds it up. Also critical is that the Wolfram Language is symbolic, so one can always pull out any piece of a program, and it will run on its own.

Debugging at lower levels of the software stack is a very different experience. It’s much more like medical diagnosis, where one’s also dealing with a complex multicomponent system, and trying to figure out what’s going on from a few measurements or experiments. (I guess our versioning problem might be the analog of some horrible defect in DNA replication.)

My whole adventure in the cloud also very much emphasizes the value we’re adding with the Wolfram Cloud. Because part of what the Wolfram Cloud is all about is insulating people from the messy issues of cloud infrastructure, and letting them instead implement and deploy whatever they want directly in the Wolfram Language.

Of course, to make that possible, we ourselves have needed to build all the automated infrastructure. And now, thanks to this little adventure in “scientific debugging”, we’re one step closer to finishing that. And indeed, as of today, the Wolfram Cloud has its APIs consistently running without any mysterious quantized slowdowns—and is rapidly approaching the point when it can move out of beta and into full production.


To comment, please visit the copy of this post at the Stephen Wolfram Blog »

]]>
http://blog.wolfram.com/2015/04/16/scientific-bug-hunting-in-the-cloud-an-unexpected-ceo-adventure/feed/ 0
Wolfram|Alpha Personal Analytics for Facebook: Last Chance to Analyze Your Friend Network http://blog.wolfram.com/2015/04/14/wolframalpha-personal-analytics-for-facebook-last-chance-to-analyze-your-friend-network/ http://blog.wolfram.com/2015/04/14/wolframalpha-personal-analytics-for-facebook-last-chance-to-analyze-your-friend-network/#comments Tue, 14 Apr 2015 19:15:14 +0000 Alan Joyce http://blog.internal.wolfram.com/?p=25652 Wolfram|Alpha’s Facebook analytics ranks high among our all-time most popular features. By now, millions of people have used Wolfram|Alpha to analyze their own activity and generate detailed analyses of their Facebook friend networks. A few years ago, we took data generously contributed by thousands of “data donors” and used the Wolfram Language’s powerful tools for social network analysis, machine learning, and data visualization to uncover fascinating insights into the demographics and interests of Facebook users.

At the end of this month, however, Facebook will be deprecating the API we relied on to extract much of this information.

Personal analytics for Facebook

You’ll still be able to generate an analysis of most of your own activity on Facebook, but you won’t have access to any information about your friends (except their names) unless they’ve also authorized our Facebook app. So in most cases, we won’t have enough data to generate a meaningful friend network graph, or to compute statistics about location, age, marital status, or other personal characteristics of your group of Facebook friends.

We completely support Facebook’s decision to increase the default security of users’ data, even though it will dramatically shorten reports for many people. So if you haven’t run your report in a while, or if you haven’t yet discovered what your report can tell you about yourself, we strongly suggest you ask Wolfram|Alpha to compute your Facebook personal analytics soon, while its full functionality is still available. And by all means, encourage friends and family who haven’t viewed their own Facebook analytics to do so—it’ll make everyone’s reports richer and more detailed.

]]>
http://blog.wolfram.com/2015/04/14/wolframalpha-personal-analytics-for-facebook-last-chance-to-analyze-your-friend-network/feed/ 2
New in the Wolfram Language: GrammarRules http://blog.wolfram.com/2015/04/10/new-in-the-wolfram-language-grammarrules/ http://blog.wolfram.com/2015/04/10/new-in-the-wolfram-language-grammarrules/#comments Fri, 10 Apr 2015 17:54:04 +0000 Jeremy Michelson http://blog.internal.wolfram.com/?p=25608 The Wolfram Language provides tools for programmatic handling of free-form input. For example, Interpreter, which was introduced in Version 10.0, converts snippets of text into computable Wolfram Language expressions. In smart form fields, this functionality can automatically translate input like “forty-two” into a Wolfram Language expression like “42.”

But what does it take to perform more complicated operations or customize responses and actions? For that you need a grammar. The grammar indicates the structure that should be matched and the action that should be taken using information extracted from the match.

A grammar gives you natural language control over your computer so that you can process language snippets to yield functions that perform commands. For example, telling your computer to “open a website” requires mapping snippets like “open” and “a website” to the Open command and the URL of a website.

Once a grammar is constructed, it is uploaded to the Wolfram Cloud where it can be used from anywhere to process natural language inputs.

In the following example, I’ve implemented natural language commands for visiting websites and saving notebooks:

Natural language commands for visiting websites and saving notebooks

With the grammar deployed to the cloud, I can visit the Wolfram company website with the natural language command “open the wolfram website”:

Visit the Wolfram company website with the natural language command open the wolfram website

And I can save the notebook I’m working on with the command “save”:

Save the notebook being worked on with command save

Notice the use of GrammarToken to recognize the “the wolfram website” as a URL, in the same fashion as Interpreter. This match is fed to the Print statement and SystemOpen via the grammar action on the right-hand side of the rule.

Many parsers fail in the presence of ambiguity, but the Wolfram Language’s GrammarRules and GrammarApply thrive on it. For example, consider a grammar that helps schedule flights between specified locations:

A grammar that helps schedule flights between specified locations

The grammar parses a natural language request, returning the GeoPosition of the departure and destination locations:

Grammar parses a natural language request, returning the GeoPosition of the departure and destination locations

But not all city names are as unambiguous as “Chicago” and “London.” The Wolfram Language knows about five cities named “Shelbyville” and 28 named “Springfield” in the United States. Yet GrammarApply chooses a Shelbyville and a Springfield that are likely to be most appropriate for me, using the same logic that Wolfram|Alpha does:

GrammarApply chooses the most likely Shelbyville and Springfield

To see or choose from all the choices, use AmbiguityFunction:

Choose from all Springfields and Shelbyvilles using AmbuguityFunction

As you can see, GrammarRules accomplishes a lot with small amount of code.

See the Interpreter documentation for the rich list of built-in interpreters that your grammars can use to understand natural language expressions for dates, math, currencies, and much more. GrammarRules is supported in Version 10.1 of the Wolfram Language and Mathematica, and is rolling out soon in all other Wolfram products.

]]>
http://blog.wolfram.com/2015/04/10/new-in-the-wolfram-language-grammarrules/feed/ 3
New in SystemModeler: FMI Import http://blog.wolfram.com/2015/04/08/new-in-systemmodeler-fmi-import/ http://blog.wolfram.com/2015/04/08/new-in-systemmodeler-fmi-import/#comments Wed, 08 Apr 2015 18:54:26 +0000 Johan Rhodin http://blog.internal.wolfram.com/?p=25484 An important emerging standard has been rapidly adopted by industry: the Functional Mock-up Interface (FMI). It’s an independent standard allowing model exchange between different tools. We introduced FMI export with Version 4.0 of SystemModeler. Exporting your model as a Functional Mock-up Unit (FMU) serves many purposes. First and foremost, it can be used in other tools and programming languages. It also protects your intellectual property by compiling the model code to a binary, which is useful when exchanging models with customers and collaborators. Now with Version 4.1 of SystemModeler, we are happy to announce that we also support FMI import.

Use subsystems from other tools in FMI import

FMI import allows you to use subsystems from other tools in your modeling and simulation workflow with SystemModeler and the Wolfram Language. Explore imported models within SystemModeler by changing parameters and observing the outcomes. Post-process and visualize simulation data from imported models with the Wolfram Language. With automated reports and cloud features, sharing the results and the insight gained from the simulations is a straightforward task.

Let’s take a look at what importing and using a Functional Mock-up Unit can look like. Drag and drop the unit into SystemModeler and follow the dialogs to import the model into the class browser. This is what it looks like when we import a model exported from Simulink using the FMI Toolbox from Modelon:

Imported model exported from Simulink, with the FMI Toolbox from Modelon

The unit is now available inside SystemModeler, and we can connect it to other components. In this case it’s a model of a cruise control. We connect it to a model of a car to see how well it can control the speed of the car.

Functional Mock-up Unit inside SystemModeler connected to cruise control in a car

Next we can perform a parameter sweep in the Wolfram Language to analyze the speed deviations when trying the controller at different speeds. These plots show speed deviations for a car going downhill:

Speed deviations in car going downhill

Another way to view the data is to create a DensityPlot showing the relative speed deviation from the reference speed.

DensityPlot showing the relative speed deviation from the reference speed

The plot shows that the cruise control performs best on flat or slightly uphill grades. This region is colored green, the color for less than 1% speed deviation. If you drive at 40 km/h on a 2-degree incline, you would be in this region. The worst performance can be found in the red region, with more than 5% speed deviation. If you drive at 30 km/h on a 3-degree decline, you would be in this region.

FMI import and export are included in SystemModeler and do not require any extra purchases or special add-ons. With FMI, simulations from modeling and simulation experts can be deployed to a large number of consumers who can use the models in their analysis and design tasks. Look here for an example of how to export a model from SystemModeler and use it in another tool.

]]>
http://blog.wolfram.com/2015/04/08/new-in-systemmodeler-fmi-import/feed/ 0
Solar Eclipses from Past to Future, Earth to Jupiter http://blog.wolfram.com/2015/04/02/solar-eclipses-from-past-to-future-earth-to-jupiter/ http://blog.wolfram.com/2015/04/02/solar-eclipses-from-past-to-future-earth-to-jupiter/#comments Thu, 02 Apr 2015 19:26:01 +0000 Vitaliy Kaurov http://blog.internal.wolfram.com/?p=25308 Eclipse splash graphic

You may have heard that on March 20 there was a solar eclipse. Depending on where you are geographically, a solar eclipse may or may not be visible. If it is visible, local media make a small hype of the event, telling people how and when to observe the event, what the weather conditions will be, and other relevant details. If the eclipse is not visible in your area, there is a high chance it will draw very little attention. But people on Wolfram Community come from all around the world, and all—novices and experienced users and developers—take part in these conversations. And it is a pleasure to witness how knowledge of the subject and of Wolfram technologies and data from different parts of the world are shared.

Five discussions arose recently on Wolfram Community that are related to the latest solar eclipse. They are arranged below in the order they appeared on Wolfram Community. The posts roughly reflect on anticipation, observation, and data analysis of the recent eclipse, as well as computations for future and extraterrestrial eclipses.

I will take almost everything here from the Wolfram Community discussions, summarizing important and interesting points, and sometimes changing the code or visuals slightly. For complete details, I encourage you to read the original posts.

First, before the total solar eclipse happened on March 20, 2015, Wolfram’s own Jeff Bryant and Francisco Rodríguez explained how to see where geographically the eclipse is totally or partially visible. Using GeoEntities, Francisco was able to also highlight with green the countries from which at least the partial solar eclipse would be visible:

Using GeoEntities to see where  geographical visibility is of March 20, 2015 eclipse

Map showing visibility of eclipse using GeoGraphics function

Jeff Bryant is in the US and Francisco Rodríguez is in Peru, so as you can see above, neither was able to see even the partial solar eclipse. The intense red area shows the visibility of the total eclipse, and the lighter red is the partial eclipse. I consoled them by telling them that quite soon—in the next decade—almost all countries in the world, including the US and Peru, will be able to observe at least a partial phase of a total solar eclipse:

Future global visibility of total and partial solar eclipses

Visual representation of future partial and total solar eclipses

Another great way to visualize chronological events is with a new Wolfram Language function, TimelinePlot. I’ve considered the last few years and the next few years, and have plotted the countries and territories (according to the ISO 3166-1 standard) where a total solar eclipse will be visible, as well as when:

TimelinePlot showing future total solar eclipses

Visual of TimelinePlot future total solar eclipses

The image above shows the incredible powers of computational infographics. You see right away that a spectacular total solar eclipse will span the US from coast to coast on August 21, 2017 (see a related discussion below). You can also see that Argentina and Chile will get lucky, viewing a total eclipse twice in a row. Most subtly and curiously, the recent solar eclipse is unique in the sense that it covered two territories almost completely: the Faroe Islands and Svalbard. This means any inhabitant of these territories could have seen the total eclipse from any geo location, cloudiness permitting. Usually it’s quite the opposite: the observational area of a total eclipse is much smaller than the territory area it spans, and most of the inhabitants would have to travel to observe the total eclipse (fortunately, no visas needed). The behavior of the Solar System is very complex. The NASA data on solar eclipses goes just several thousand years into the past and future, losing precision drastically due to the chaos phenomenon.

At the time of the eclipse, I was in Odesa, Ukraine, which was in the partial eclipse zone. I made a separate post showing my position relative to the eclipse zone and grabbing a few photos of the eclipse. Using the orthographic GeoProjection, it’s easy to show that the total eclipse zone did not really cover any majorly populated places, passing mostly above ocean water. The black line shows the boundary of the partial eclipse visibility, which covered many populated territories:

Using GeoProjection to show my position relative to eclipse zone

Visual showing location related to eclipse zone

The Faroe Islands were in the zone of the total solar eclipse, and above I show the shortest path, or geodesic, between the islands and my location. In a separate post (see further discussion below), Marco Thiel posted a link to mesmerizing footage of the total solar eclipse, shot from an airplane (to avoid any cloudiness) by a BBC crew while flying above the Faroe Islands (see related discussion below). Francisco actually showed in a comment how to compute the distance from Odesa to the partial eclipse border:

Using GeoDistance to compute distance from Odesa to partial eclipse boarder

My photos, shot with a toy camera, were of course nothing like the BBC footage. Dense cloud coverage above Ukraine permitted only a few glimpses of the chipped-off Sun. Most images were very foggy, but ImageAdjust did a perfect job of removing the pall. A sample unedited photo is available for download in my Wolfram Community post:

Using ImageAdjust on solar eclipse photos

Solar eclipse images filtered with ImageAdjust

By the way, can you guess why you see the candy below? As I said in my post, the kids in my neighborhood in Ukraine observed the eclipse through the wrapper of this and other similar types of Ukrainian candy. The candy is cheap, and the wrap is opaque enough to keep eyes safe when the Sun brightens in the patches between the clouds. Do you remember using floppy disks? It was typical in the past to look at the Sun through floppy disk film. Many people may remember.

Candy wrappers used to see solar eclipses through

And this is where the conversation got picked up by our users. Sander Huisman, a physicist from the University of Twente in the Netherlands, asked a great question: “Wouldn’t it be cool if you could find your location just from the photos? We can calculate the coverage of the Sun for each of your photos, and inside the photo we can also find the time when it was taken. Using those two pieces of information, we should be able to identify the location of your photo, right?” I did not know how to go about such calculations, but Marco Thiel, an applied mathematician from the University of Aberdeen, UK, posted another discussion, Aftermath of the solar eclipse. Marco and Henrik Schachner, a physicist from the Radiation Therapy Center in Weilheim, Germany, tried to at least estimate the percentage of the Sun coverage using image processing and computational geometry functionality. This is the first part of the problem. If you have an idea of how to solve second part, finding a location from a photo timestamp and percentage of the Sun cover, please join the discussion and post on Wolfram Community. Marco and Henrik used photos from Aberdeen, which was very close to the total eclipse zone.

Estimating percentage of Sun coverage using image processing and computational geometry functionality

Even though he was so close, Marco did not have a chance to capture the partial eclipse due to high cloudiness. What irony and luck that the photos he used came from a US student from Cornell University, Tanvi Chheda, who spent a semester abroad at Marco’s university. She grabbed the shots with her iPad, but what wonderful images with the eclipse and birds. Thank you, Tanvi, for sharing them on Wolfram Community! Here is one:

Image of eclipse from Tanvi Chheda

Well, that’s the turbulent nature of Wolfram Community—something interesting is always happening, and happening quite fast. I’ll summarize the main subject of Marco’s post in a moment (see the original Community post for more images and eclipse coverage estimation), but as Marco wrote: “Even before today’s eclipse, there were reports warning that Europe might face large-scale blackouts because the power grids would be strained by a lack of solar power. This is why I decided to use Mathematica to analyze some data about the effects on the power grid in the UK. I also used data from the Netatmo Weather Station to analyze variations in the temperature in Europe due to the eclipse.”

Marco owns a Netatmo Weather Station, and had written about its usage in an earlier post. He used an API to get data from many stations throughout Europe, and also tapped into the public data from the power grid. One of his interesting findings was a strong correlation between the eclipse period and a sharp rise in the hydroelectric power production:

Correlation between eclipse period and hydroelectric power

For more observations, code, data, and analysis, I encourage you to read through the original post. There, Marco also touched on the subject of global warming and the relevance of high-resolution crowd-sourced data. To visualize the diversity of the discussion, I imported the whole text and used the new Wolfram Language function WordCloud:

Using WordCloud to show the diversity of a Community discussion

WordCloud showing diversity of topics in Community post

It’s nice that the Wolfram Language code, as well as the text, is getting parsed, and you can see the most frequently used functions. In the code above, there are three handy tricks. First is that the option WordOrientation has diverse settings for words’ directions. Second is that the option ScalingFunctions can give the layout a good visual appeal, and the simple power law I’ve chosen is often more flexible than the logarithmic one. The third trick is subtler. It is the choice of background color to be the “bottom” color of the ColorFunction used. Then not only do the sizes of the words stress their weights, but they also fade into the background.

From the TimelinePlot infographics above, you can see that a total eclipse will span the US from northwest to southeast on August 21, 2017. I made yet another Wolfram Community post showcasing some computations with this eclipse. You should take a look at the original for all the details, but here is an image of all US counties that will be spanned during the total eclipse. Each county is colored according to the history of cloud cover above it from 2000 to 2015. This serves as an estimate for the probability of clear visibility of the eclipse. The colder the colors, the higher the chance of clear skies. That’s very approximate, though, especially taking into account the unreliability of weather stations. GeoEntities is a very nice function that selects only those geographical objects that intersect with the polygon of the total eclipse. Below is quite a cool graphic that I think only the Wolfram Language can build in a few lines of code:

Computing 2017 eclipse path and historical cloud coverage for areas

Map of historical cloud coverage and 2017 solar eclipse path

And now that we’ve looked into the past and the future of the total solar eclipses, is there anything left to ponder? As it turns out, yes—the extraterrestrial solar eclipses! We live in unique times and on a unique planet with the angular diameter of its only Moon and its only Sun pretty much identical. I mentioned above a documentary where a BBC crew shot a video of the total solar eclipse from an airplane above the Faroe Islands. Quoting the show host, Liz Bonnin, right from the airplane: “There is no other planet in the Solar System that experiences the eclipse like this one… even though the Sun is 400 times bigger than the Moon, at this moment in our Solar System’s history, the Moon happens to be 400 times closer to the Earth than the Sun, and so they appear the same size…”

So can we verify that our planet is unique? In a recent Wolfram Community post, Jeff Bryant addressed this question. He made some computations using PlanetData and PlanetaryMoonData to investigate the solar eclipses on other planets. The main goal is to compare the angular diameter of the Sun to the angular diameter of the Moon in question, when observed from the surface of the planet in question. He used the semimajor axis of the Moon’s orbit as an estimate of the Moon’s distance from its host planet. Please see the complete code in the original post. Here I mention the final results. For Earth, we have an almost perfect ratio of 1, meaning that the Moon exactly covers the Sun in a total eclipse:

Angular diameter of the Sun compared to the angular diameter of the Moon on Earth

Now here is Mars’ data. The largest Moon, Phobos, is only .6 the diameter of the Sun viewed from the surface of Mars, so it can’t completely cover the Sun:

Angular diameter on Sun compared to the Moons on Mars

With human missions to Mars becoming more realistic, would you not be curious how a solar eclipse looks over there? Here are some spectacular shots captured by NASA’s Mars rover Curiosity of Phobos, passing right in front of the Sun:

NASA's Mars rover Curiosity of Phobos

NASA/JPL-Caltech/Malin Space Science Systems/Texas A&M Univ.

These are the sharpest images of a solar eclipse ever taken from Mars. As you can see, Phobos covers the Sun only partially (60%, according to our calculations), as seen from the surface of Mars. Such a solar eclipse is called a ring, or annular, type. Jupiter’s data seems more promising:

Angular diameter of the Sun compared to the Moons of Jupiter

Jupiter’s Moon Amalthea is the closest with a ratio of 0.9, yet even if its orbit allows a perfect 90% of Sun cover, the spectacular Earth-eclipse coronas are probably not visible. During a total Earth solar eclipse, the solar corona can be seen by the naked eye:

Amalthea total solar eclipse
Image Courtesy of Luc Viatour

Do you have a few ideas of your own to share or a few questions to ask? Join Wolfram Community—we would love to see your contributions!

Download this post as a Computable Document Format (CDF) file.

]]>
http://blog.wolfram.com/2015/04/02/solar-eclipses-from-past-to-future-earth-to-jupiter/feed/ 4
Join the Wolfram Student Ambassador Program! http://blog.wolfram.com/2015/03/31/join-the-wolfram-student-ambassador-program/ http://blog.wolfram.com/2015/03/31/join-the-wolfram-student-ambassador-program/#comments Tue, 31 Mar 2015 17:25:46 +0000 Danielle Rommel http://blog.internal.wolfram.com/?p=23977 Are you a student and a technology junkie? If so, keep reading! The Wolfram Student Ambassador Program allows exemplary students the opportunity to further their tech career by acting as the face of Wolfram at their universities (plus earn some great swag, opportunities, and prizes).

For this pilot program, we are searching for one representative each from colleges and universities all around North America. We are looking for the top tier of technical talent, the peak of perfection, the coolest of coders. The ideal candidate will have 10—14 hours to dedicate to the program each month. They are already a leader on campus, charismatic and loved by all, and with an undying passion for Wolfram technologies.

A Wolfram Student Ambassador will:

  • teach and inspire others to use Wolfram technology in new and innovative ways
  • host talks/workshops/meet-ups that promote the use of the Wolfram Technology Stack across multiple fields of study
  • collaborate with other student ambassadors on coding projects, events, and a newsletter
  • meet monthly with our Student Ambassador Coordinator to provide and receive feedback and insight on program goals and initiatives
  • build excitement through official Student Ambassador social media accounts, as well as your own

interns from 2014

What do you get?

In addition to building up your resumé, the Wolfram Student Ambassador Program has some great perks:

  • real-world experience that will set you apart as a professional in the technology world
  • lasting connections with fellow student ambassadors and leading innovators in the industry
  • participation in quarterly video conference training with our top developers
  • free access to additional online training as well as new and existing Wolfram technologies
  • opportunities to guest blog, present projects, and publish results
  • admission to our annual Wolfram Technology Conference
  • cool Wolfram swag every month as a Student Ambassador in good standing

Do you have what it takes?

Here’s a checklist—are you:

  • articulate, brilliant, collaborative, daring, engaging, and friendly?
  • familiar with the Wolfram Technology Stack, including Mathematica, Wolfram Programming Cloud, Wolfram|Alpha, SystemModeler, etc.?
  • able to understand the capabilities of each product (while not necessarily needing to know how to write code for everything)?
  • proficient enough to lead explorations and hands-on workshops?
  • enrolled as a full-time student?
  • available 10–14 hours per month?
  • willing and able to learn new technologies?
  • able to thrive on social media such as Facebook, Twitter, Quora, Reddit, LinkedIn, etc.?
  • already active in the campus tech community?

If this is you—apply here.
We can’t wait to meet you!

]]>
http://blog.wolfram.com/2015/03/31/join-the-wolfram-student-ambassador-program/feed/ 0
New in the Wolfram Language: PlotThemes for Gauges http://blog.wolfram.com/2015/03/27/new-in-the-wolfram-language-plotthemes-for-gauges/ http://blog.wolfram.com/2015/03/27/new-in-the-wolfram-language-plotthemes-for-gauges/#comments Fri, 27 Mar 2015 14:04:51 +0000 Tim Shedelbower http://blog.internal.wolfram.com/?p=24907 Array of gauges

The first gauge I remember was a blue wrist watch I received from my parents as a child. Their hope was probably to correct my tardiness, but it proved valuable for more important tasks such as timing bicycle races. Today digital gauges help us analyze a variety of data on smart phones and laptops. Battery level, signal strength, network speed, and temperature are some of the common data elements constantly monitored.

Gauges have been a part of the Wolfram Language for a few years.

Multi-column customizable gauges

PlotTheme is an exciting new addition to gauges that provides instant styling. A theme name is the only item required. The theme automatically calls the necessary options to create a pre-styled gauge.

Using PlotTheme to create a pre-styled gauge

Here is a sample of the themes for AngularGauge.

themes for gauges using the AngularGauge function

Incorporating gauges into your work is simple. In fact, you might find using multiple gauges in the same notebook a common occurrence. Set the theme for an entire notebook with the global variable $PlotTheme. For example, to set all gauges of a notebook to the “Web” theme, just evaluate $PlotTheme = “Web”. $PlotTheme can also be used for a cluster of gauges within a single cell, as in the following time zone example.

Using $PlotTheme as example to create time zone example

As with time, the weather always seems to have our attention. A weather dashboard is a convenient way of monitoring current weather conditions. Construct the dashboard using WeatherData, which is included in the Wolfram Language. It gives current and historical weather data for all standard weather stations worldwide. AngularGauge will display the wind direction and speed, while VerticalGauge displays the temperature. GeoGraphics is used for the location.

Using WeatherData to create a dashboard displaying weather with gauges in a specific location

Interested in building your own weather station? Arnoud Buzing explains the details in a previous blog post. Interested in styling your own gauges? I can help. You might be wondering if it’s possible to change a particular aspect of a theme. User options automatically override PlotTheme, so altering a theme component or color is absolutely possible and encouraged. In essence, a theme can be a starting point for creating your own gauge styles.

Creating your own gauge style within PlotTheme

The world is full of constantly changing data, so what better way to visualize the data than with a colorful gauge from the Wolfram Language. PlotTheme handles the task of styling, so implementing a gauge has never been easier. Visit the Gauges documentation page for more information. PlotTheme details can be found in the options section of each gauge documentation page. The gauges are AngularGauge, BulletGauge, ClockGauge, HorizontalGauge, ThermometerGauge, and VerticalGauge.

Version 10.1 of the Wolfram Language is now supported in Mathematica and rolling out in all other Wolfram products.

]]>
http://blog.wolfram.com/2015/03/27/new-in-the-wolfram-language-plotthemes-for-gauges/feed/ 2
Reliability Analysis in SystemModeler 4.1 http://blog.wolfram.com/2015/03/25/reliability-analysis-in-systemmodeler-4-1/ http://blog.wolfram.com/2015/03/25/reliability-analysis-in-systemmodeler-4-1/#comments Wed, 25 Mar 2015 20:12:23 +0000 Jan Brugård http://blog.internal.wolfram.com/?p=25034 Today we are proud to announce the release of Wolfram SystemModeler 4.1. We will present some of the news in blog posts, beginning with this one, in which we will highlight the new reliability functionality.

We will illustrate this with an example, and you can try it out by downloading a trial version of SystemModeler and this example model, and a trial of the Wolfram Hydraulic library.

Most people probably have experiences with things they bought and liked, but that then suddenly failed for some reason. During the last few years we have both experienced this problem, including a complete engine breakdown in Johan’s car (the engine had to be replaced), and Jan’s receiver, which suddenly went completely silent (the receiver had to be sent in for repair and have its network chip replaced).

In both cases it caused problems for the customers (us) as well as for the producer. These are just a couple of examples, and we’re sure you have your own.

amplifier, satelitte, airplane
Consumer electronics, satellite systems, and flight systems all have different reasons for valuing reliability.

In general, a failure might imply warranty costs, like replacing the network chip of the receiver; huge complications in repairing, as for the car engine, or even more for a satellite; or even risk of human life, as with airplanes.

This raises the question how combining system simulation models with uncertainty quantification can be used to improve system reliability and functionality.

With the addition of system reliability analysis to SystemModeler, the reliabilities of systems can be computed from the reliabilities of the components. Let’s have a look.

Let’s start at the component level with a hydraulic pipe, and compute the probability that the hydraulic pipe fails:

diagram for pipe in normal, restricted, leaking, and blocked operations
Diagram for a pipe with normal operation, restricted operation, leaking operation, and blocked operation.

This is a relatively small and simple component, with three different failure modes: it can leak, it can be blocked, or it can be restricted.

Here’s a system incorporating three pipes in which we can examine the different failure modes:

system incorporating three pipes to examine different failure modes
A model with three pipes, one cylinder, and one pump. Pumping the fluid will lead to the cylinder pushing its rod out and a change of the measured position.

normal simulation vs simulation with blocked pipe
Fault detected! The measured position is not moving at all in the simulation with the blocked pipe.

By looking at what the simulation results should be compared to what they are, we can detect different failures and generate a list of candidates for what the culprit is. This is studied in the area of fault diagnosis and failure detection, which we won’t pursue here. In the remainder of this post, we’ll focus instead on the overall reliabilities of systems like these.

The pipe can be illustrated as a traditional fault tree, where failure in any of the leaf nodes results in system failure:

fault tree for a pipe
Fault tree for a pipe.

In the new Reliability view in SystemModeler, we can specify the lifetime distributions of the individual components:

reliability view in SystemModeler
The Reliability view in SystemModeler, where lifetime distributions are assigned to individual components.

Next we construct the fault tree for the pipe by specifying that a leak, or a restriction, or a blockage will lead to system failure:

fault tree construction specifying a leak will lead to system failure
Reliability view for a component with multiple lifetime distributions inside it. Here the fault tree is specified, by entering the Boolean expression for the configuration.

Now the fault tree is available for analysis in the Wolfram Language:

fault tree analysis in Wolfram Language

The WSMModelReliability function can return a FailureDistribution (when using a fault tree), a ReliabilityDistribution (when using a reliability block diagram), or the lifetime distribution of a component. The traditional way to illustrate the reliability of components or systems is by using the SurvivalFunction, which describes the probability that the system works at time t. For one pipe, it looks like this:

using SurvivalFunction for probability that the system works at time t

This distribution behaves as any probability distribution in the Wolfram Language. More than 30 properties can be computed from it, for example, the conditional probability that the pipe will last for longer than 20,000 hours given that it worked at 10,000 hours:

conditional probability that pipe will las longer than 20,000 hours

(The Conditioned Icon sign is the Conditioned operator, and Distributed Icon is the Distributed descriptor. The code above could be read out as: “The probability that a basic pipe still works after 20,000 hours if it worked for the first 10,000 hours”.)

Systems are, of course, made up of many pipes. Here is the schematic for the hydraulic power plant of a Cessna aircraft flap system, which incorporates several basic pipe components:

schematic for hydraulic power plant of Cessna aircraft flap system
The hydraulic power plant of a Cessna aircraft flap system, with one tank, two pumps, multiple valves, and fifteen pipes.

SystemModeler automatically detects that the pipes in the power plant have reliability annotations and can compute the reliability of the entire system from them. The first question we’ll ask is how much worse will the reliability be for the hydraulic power system as compared to the individual pipe:

reliability of one pipe vs hydraulic power system
Comparison of the reliability of one pipe and the hydraulic power system.

We can see that a system with many pipes performs far worse than a single one, which is not completely unexpected. This is an illustration of the “weakest link” phenomenon: failure in one pipe will cause system failure.

If we look at the same components in the flap system of the aircraft, we see a similar story.

Next we put the hydraulic power plant and the flap system together (a total of 75 components). In SystemModeler this is as easy as specifying that we want “hydraulicPower and flaps”.

reliability view for full Cessna aircraft
Reliability view for the full Cessna aircraft. Here the reliability distribution is specified using the two components “hydraulicPower” and “flaps”.

reliability functions diagrams of 2 systems

reliability functions for different parts of the system
The reliability functions for the different parts of the system.

The reliability of the combined system is less than the reliabilities of the flap and hydraulic power subsystems, a property that generalizes to all systems and subsystems when using connections that depend on a single failure.

Finally, let us find out which components are most cost-effective to improve. The Wolfram Language includes nine different importance measure functions, starting from the very basic StructuralImportance and going to more advanced measures. Let’s find out which failure in the basic pipe to improve:

improvement potential for the different failues in a pipe
The improvement potential for the different failures in a pipe.

The improvement importance describes how much the system reliability would be increased by replacing a component with a perfect component. The improvement importance is a relative measure, so for a figure to make sense, it has to be put in context with the other components in the system. From the plot it’s clear that figuring out ways to avoid the pipe becoming restricted would improve the reliability for the system the most. We can do the same thing for the full system and compare the flap system to the hydraulic power system:

improvement potential for hydraulic power system
The improvement potential for the hydraulic power system is strictly higher than the flap system’s improvement potential.

From this plot we can learn a couple of things. First, it pays off more to improve the hydraulic power system compared to the flap system throughout the full lifetime of the system. Second, it actually pays off more and more as the ratio between the power plant and the flaps starts at 1.66 (hard to see in the plot, but easier when comparing the real numbers) and from there is strictly increasing. For example, at time 3,788h, when the hydraulic power plant has the highest value, the ratio between the two is 2.08, and at time 10,000h the ratio is 3.38.

Reliability analysis can show you where to concentrate your engineering effort to produce the most reliable products, estimate where failure will happen, and price warranties accordingly.

For more on what’s new in SystemModeler 4 as well as examples, free courses, and fully functional trial software, check out the SystemModeler website.

Further Reading

In a previous blog post, “Modeling Aircraft Flap System Failure Scenarios with SystemModeler,” the impact of an electrical failure was studied, and the blog post “Reliability Mathematics with Mathematica” gives an in-depth look on the reliability analysis functionality in Mathematica. Finally, in the free course “Modeling Safety-Critical Systems,” you can learn how component faults can be modeled and how their effect on system behavior can be simulated.

Download this post as a Computable Document Format (CDF) file, and its accompanying models.

]]>
http://blog.wolfram.com/2015/03/25/reliability-analysis-in-systemmodeler-4-1/feed/ 3
A Rat Race, or a Great Way to Start the Day http://blog.wolfram.com/2015/03/24/a-rat-race-or-a-great-way-to-start-the-day/ http://blog.wolfram.com/2015/03/24/a-rat-race-or-a-great-way-to-start-the-day/#comments Tue, 24 Mar 2015 13:56:39 +0000 Mariusz Jankowski http://blog.internal.wolfram.com/?p=23986 Recently, during a particularly severe patch of winter weather and much too much shoveling of snow off my driveway, I decided, with help from the Wolfram Language, to bring back memories of fairer weather by looking at commuting to work on a bicycle.

This past year, I finally succumbed to the increasingly common practice of recording personal activity data. Over the last few years, I’d noted that my rides had become shorter and easier as the season progressed, so I was mildly interested in verifying this improvement in personal fitness. Using nothing more than a smart phone and a suitable application, I recorded 27 rides between home and work, and then used the Wolfram Language to read, analyze, and visualize the results.

Here is a Google Earth image showing my morning bike route covering a distance of a little under 11 miles, running from east to west.

Morning commute to work

To transfer data from the smart phone, I used GPX (GPS Exchange Format), a file format supported by major GPS device manufacturers and available in many software applications. A typical GPX file includes time-stamped location and elevation data, and the Wolfram Language returns this data as a list of rules, descriptively named Geometry, PointElevation, and PointTimestamp.

This displays a fragment of one of the GPX data files:

GPX data file with time-stamped location and elevation data

Taking advantage of new geographic data and computation functionality in the Wolfram Language and the available time and position data, I quickly and easily created an animation of the day’s ride (for details of functions, position, and time, see the Initializations in the attached CDF). Click the Play button to view animation.

Creating animation for morning bike ride

More interestingly, and with just a bit more effort, I next compared all the morning rides of the season in a single animated race, in effect a rat race to work! The season’s leader is shown in yellow, of course!

All morning rides in a race to work

The results of this rat race are as follows:

Results of bike riding rat race

Now for a quick peek at ride times in chronological order. This clearly supports my earlier observation that ride times improved as the season progressed, and as I logged more miles on the road bike.

Ride times in chronological order displayed as bar graph

While the preceding calculations and visualizations are nice, we can do much more. The GPX files contain time-stamped elevation data, so not only does this allow easy visualization of the common road profile, but even better: detection of downhill and uphill segments of the route via new signal processing peak detection functionality.

First, here is the standard road profile:

Standard road profile with visualization

Prior to locating the peaks and valleys, I smooth the elevation data so as to capture only the large-scale local maxima and minima in the signal. To accomplish this, I first use uniform linear resampling to correct for the highly irregular intervals at which the data was captured, then blur it with a GaussianFilter.

Smooth elevation data capturing large-scale maxima and minima only

The smoothing operation removes spurious peaks and valleys in the elevation data. I determine the remaining large-scale peaks using FindPeaks and segment the route into ascending and descending sections.

Determining large-scale peaks with FindPeaks

This shows the uphill and downhill sections of the morning ride on an elevation plot:

Uphill and downhill sections on elevation plot

For an arguably even more useful visualization, here are the rising and falling segments of the ride on a map:

Rise and fall segments of bike ride on map

The approximate distances of the uphill and downhill sections are readily available from the ascend and descend lists computed earlier. The following result confirms the simple truth that going to work is always harder (i.e., uphill), and therefore less pleasant than the return trip.

Results from uphill downhill going to and from work

I am already looking forward to the next season of riding and more fun in analyzing my data using the Wolfram Language.

Download this post as a Computable Document Format (CDF) file.

]]>
http://blog.wolfram.com/2015/03/24/a-rat-race-or-a-great-way-to-start-the-day/feed/ 2
New in the Wolfram Language: WikipediaData http://blog.wolfram.com/2015/03/20/new-in-the-wolfram-language-wikipediadata/ http://blog.wolfram.com/2015/03/20/new-in-the-wolfram-language-wikipediadata/#comments Fri, 20 Mar 2015 17:44:28 +0000 Alan Joyce http://blog.internal.wolfram.com/?p=24883 Since the inception of Wolfram|Alpha, Wikipedia has held a special place in its development pipeline. We usually use it not as a primary source for data, but rather as an essential resource for improving our natural language understanding, particularly for mining the common and colloquial ways people refer to entities and concepts in various domains.

We’ve developed a lot of internal tools to help us analyze and extract information from Wikipedia over the years, but now we’ve also added a Wikipedia “integrated service” to the latest version of the Wolfram Language—making it incredibly easy for anyone to incorporate Wiki content into Wolfram Language workflows.

You can simply grab the text of an article, of course, and feed it into some of the Wolfram Language’s new functions for text processing and visualization:

text sentence WikipediaData

word cloud WikipediaData

Or if you don’t have a specific article in mind, you can search by title or content:

WikipediaSearch by content or title

You can even use Wolfram Language entities directly in WikipediaData to, say, get equivalent page titles in any of the dozens of available Wikipedia language versions:

using entitites in WikipediaData

One of my favorite functions allows you to explore article links out from (or pointing in toward) any given article or category—either in the form of a simple list of titles, or as a list of rules that can be used with the Wolfram Language’s powerful functions for graph visualization. In fact, with just a few lines of code, you can create a beautiful and interesting visualization of the shared links between any set of Wikipedia articles:

WikisSharedLinks in a given article or category

There’s a lot of useful functionality here, and we’ve really only scratched the surface. Watch for many more integrated services to follow throughout the coming year.

Version 10.1 of the Wolfram Language is now supported in Mathematica and rolling out in all other Wolfram products.

]]>
http://blog.wolfram.com/2015/03/20/new-in-the-wolfram-language-wikipediadata/feed/ 10