Wolfram Blog
Stephen Wolfram

The Wolfram Data Drop Is Live!

March 4, 2015 — Stephen Wolfram

Where should data from the Internet of Things go? We’ve got great technology in the Wolfram Language for interpreting, visualizing, analyzing, querying and otherwise doing interesting things with it. But the question is, how should the data from all those connected devices and everything else actually get to where good things can be done with it? Today we’re launching what I think is a great solution: the Wolfram Data Drop.

Wolfram Data Drop

When I first started thinking about the Data Drop, I viewed it mainly as a convenience—a means to get data from here to there. But now that we’ve built the Data Drop, I’ve realized it’s much more than that. And in fact, it’s a major step in our continuing efforts to integrate computation and the real world.

So what is the Wolfram Data Drop? At a functional level, it’s a universal accumulator of data, set up to get—and organize—data coming from sensors, devices, programs, or for that matter, humans or anything else. And to store this data in the cloud in a way that makes it completely seamless to compute with.

Data Drop data can come from anywhere

Our goal is to make it incredibly straightforward to get data into the Wolfram Data Drop from anywhere. You can use things like a web API, email, Twitter, web form, Arduino, Raspberry Pi, etc. And we’re going to be progressively adding more and more ways to connect to other hardware and software data collection systems. But wherever the data comes from, the idea is that the Wolfram Data Drop stores it in a standardized way, in a “databin”, with a definite ID.

Here’s an example of how this works. On my desk right now I have this little device:

This device records the humidity, light, pressure, and temperature at my desk, and sends it to a Data Drop databin. The cable is power; the pen is there to show scale.

Every 30 seconds it gets data from the tiny sensors on the far right, and sends the data via wifi and a web API to a Wolfram Data Drop databin, whose unique ID happens to be “3pw3N73Q”. Like all databins, this databin has a homepage on the web: wolfr.am/3pw3N73Q.

The homepage is an administrative point of presence that lets you do things like download raw data. But what’s much more interesting is that the databin is fundamentally integrated right into the Wolfram Language. A core concept of the Wolfram Language is that it’s knowledge based—and has lots of knowledge about computation and about the world built in.

For example, the Wolfram Language knows in real time about stock prices and earthquakes and lots more. But now it can also know about things like environmental conditions on my desk—courtesy of the Wolfram Data Drop, and in this case, of the little device shown above.

Here’s how this works. There’s a symbolic object in the Wolfram Language that represents the databin:

Databin representation in the Wolfram Language

And one can do operations on it. For instance, here are plots of the time series of data in the databin:

Time series from the databin of condition data from my desk: humidity, light, pressure, and temperature

And here are histograms of the values:

Histograms of the same humidity, light, pressure, and temperature data from my desk
And here’s the raw data presented as a dataset:

Raw data records for each of the four types of desktop atmospheric data I collected to the Data Drop

What’s really nice is that the databin—which could contain data from anywhere—is just part of the language. And we can compute with it just like we would compute with anything else.

So here for example are the minimum and maximum temperatures recorded at my desk:
(for aficionados: MinMax is a new Wolfram Language function)

Minimum and maximum temperatures collected by my desktop device

We can convert those to other units (% stands for the previous result):

Converting the minimum and maximum collected temperatures to Fahrenheit

Let’s pull out the pressure as a function of time. Here it is:

It's easy to examine any individual part of the data—here pressure as a function of time

Of course, the Wolfram Knowledgebase has historical weather data. So in the Wolfram Language we can just ask it the pressure at my current location for the time period covered by the databin—and the result is encouragingly similar:

The official weather data on pressure for my location nicely parallels the pressures recorded at my desk

If we wanted, we could do all sorts of fancy time series analysis, machine learning, modeling, or whatever, with the data. Or we could do elaborate visualizations of it. Or we could set up structured or natural language queries on it.

Here’s an important thing: notice that when we got data from the databin, it came with units attached. That’s an example of a crucial feature of the Wolfram Data Drop: it doesn’t just store raw data, it stores data that has real meaning attached to it, so it can be unambiguously understood wherever it’s going to be used.

We’re using a big piece of technology to do this: our Wolfram Data Framework (WDF). Developed originally in connection with Wolfram|Alpha, it’s our standardized symbolic representation of real-world data. And every databin in the Wolfram Data Drop can use WDF to define a “data semantics signature” that specifies how its data should be interpreted—and also how our automatic importing and natural language understanding system should process new raw data that comes in.

The beauty of all this is that once data is in the Wolfram Data Drop, it becomes both universally interpretable and universally accessible, to the Wolfram Language and to any system that uses the language. So, for example, any public databin in the Wolfram Data Drop can immediately be accessed by Wolfram|Alpha, as well as by the various intelligent assistants that use Wolfram|Alpha. Tell Wolfram|Alpha the name of a databin, and it’ll automatically generate an analysis and a report about the data that’s in it:

The Wolfram|Alpha results for "databin 3pw3N73Q"

Through WDF, the Wolfram Data Drop immediately handles more than 10,000 kinds of units and physical quantities. But the Data Drop isn’t limited to numbers or numerical quantities. You can put anything you want in it. And because the Wolfram Language is symbolic, it can handle it all in a unified way.

The Wolfram Data Drop automatically includes timestamps, and, when it can, geolocations. Both of these have precise canonical representations in WDF. As do chemicals, cities, species, networks, or thousands of other kinds of things. But you can also drop things like images into the Wolfram Data Drop.

Somewhere in our Quality Assurance department there’s a camera on a Raspberry Pi watching two recently acquired corporate fish—and dumping an image every 10 minutes into a databin in the Wolfram Data Drop:

Images are easy to store in Data Drop, and to retrieve

In the Wolfram Language, it’s easy to stack all the images up in a manipulable 3D “fish cube” image:

If this were a Wolfram CDF document, you could simply click and drag to rotate the cube and view it from any angle

Or to process the images to get a heat map of where the fish spend time:

Apparently the fish like the lower right area of the tank

We can do all kinds of analysis in the Wolfram Language. But to me the most exciting thing here is how easy it is to get new real-world data into the language, through the Wolfram Data Drop.

Around our company, databins are rapidly proliferating. It’s so easy to create them, and to hook up existing monitoring systems to them. We’ve got databins now for server room HVAC, for weather sensors on the roof of our headquarters building, for breakroom refrigerators, for network ping data, and for the performance of the Data Drop itself. And there are new ones every day.

Lots of personal databins are being created, too. I myself have long been a personal data enthusiast. And in fact, I’ve been collecting personal analytics on myself for more than a quarter of a century. But I can already tell that March 2015 is going to show a historic shift. Because with the Data Drop, it’s become vastly easier to collect data, with the result that the number of streams I’m collecting is jumping up. I’ll be at least a 25-databin human soon… with more to come.

A really important thing is that because everything in the Wolfram Data Drop is stored in WDF, it’s all semantic and canonicalized, with the result that it’s immediately possible to compare or combine data from completely different databins—and do meaningful computations with it.

So long as you’re dealing with fairly modest amounts of data, the basic Wolfram Data Drop is set up to be completely free and open, so that anyone or any device can immediately drop data into it. Official users can enter much larger amounts of data—at a rate that we expect to be able to progressively increase.

Wolfram Data Drop databins can be either public or private. And they can either be open to add to, or require authentication. Anyone can get access to the Wolfram Data Drop in our main Wolfram Cloud. But organizations that get their own Wolfram Private Clouds will also soon be able to have their own private Data Drops, running inside their own infrastructure.

So what’s a typical workflow for using the Wolfram Data Drop? It depends on what you’re doing. And even with a single databin, it’s common in my experience to want more than one workflow.

It’s very convenient to be able to take any databin and immediately compute with it interactively in a Wolfram Language session, exploring the data in it, and building up a notebook about it

But in many cases one also wants something to be done automatically with a databin. For example, one can set up a scheduled task to create a report from the databin, say to email out. One can also have the report live on the web, hosted in the Wolfram Cloud, perhaps using CloudCDF to let anyone interactively explore the data. One can make it so that a new report is automatically generated whenever someone visits a page, or one can create a dashboard where the report is continuously regenerated.

It’s not limited to the web. Once a report is in the Wolfram Cloud, it immediately becomes accessible on standard mobile or wearable devices. And it’s also accessible on desktop systems.

You don’t have to make a report. Instead, you can just have a Wolfram Language program that watches a databin, then for example sends out alerts—or takes some other action—if whatever combination of conditions you specify occur.

You can make a databin public, so you’re effectively publishing data through it. Or you can make it private, and available only to the originator of the data—or to some third party that you designate. You can make an API that accesses data from a databin in raw or processed form, and you can call it not only from the web, but also from any programming language or system.

A single databin can have data coming only from one source—or one device—or it can have data from many sources, and act as an aggregation point. There’s always detailed metadata included with each piece of data, so one can tell where it comes from.

For several years, we’ve been quite involved with companies who make connected devices, particularly through our Connected Devices Project. And many times I’ve had a similar conversation: The company will tell me about some wonderful new device they’re making, that measures something very interesting. Then I’ll ask them what’s going to happen with data from the device. And more often than not, they’ll say they’re quite concerned about this, and that they don’t really want to have to hire a team to build out cloud infrastructure and dashboards and apps and so on for them.

Well, part of the reason we created the Wolfram Data Drop is to give such companies a better solution. They deal with getting the data—then they just drop it into the Data Drop, and it goes into our cloud (or their own private version of it), where it’s easy to analyze, visualize, query, and distribute through web pages, apps, APIs, or whatever.

It looks as if a lot of device companies are going to make use of the Wolfram Data Drop. They’ll get their data to it in different ways. Sometimes through web APIs. Sometimes by direct connection to a Wolfram Language system, say on a Raspberry Pi. Sometimes through Arduino or Electric Imp or other hardware platforms compatible with the Data Drop. Sometimes gatewayed through phones or other mobile devices. And sometimes from other clouds where they’re already aggregating data.

We’re not at this point working specifically on the “first yard” problem of getting data out of the device through wires or wifi or Bluetooth or whatever. But we’re setting things up so that with any reasonable solution to that, it’s easy to get the data into the Wolfram Data Drop.

There are different models for people to access data from connected devices. Developers or researchers can come directly to the Wolfram Cloud, through either cloud or desktop versions of the Wolfram Language. Consumer-oriented device companies can choose to set up their own private portals, powered by the Wolfram Cloud, or perhaps by their own Wolfram Private Cloud. Or they can access the Data Drop from a Wolfram mobile app, or their own mobile app. Or from a wearable app.

Sometimes a company may want to aggregate data from many devices—say for a monitoring net, or for a research study. And again their users may want to work directly with the Wolfram Language, or through a portal or app.

When I first thought about the Wolfram Data Drop, I assumed that most of the data dropped into it would come from automated devices. But now that we have the Data Drop, I’ve realized that it’s very useful for dealing with data of human origin too. It’s a great way to aggregate answers—say in a class or a crowdsourcing project—collect feedback, keep diary-type information, do lifelogging, and so on. Once one’s defined a data semantics signature for a databin, the Wolfram Data Drop can automatically generate a form to supply data, which can be deployed on the web or on mobile.

The form can ask for text, or for images, or whatever. And when it’s text, our natural language understanding system can take the input and automatically interpret it as WDF, so it’s immediately standardized.

Now that we’ve got the Wolfram Data Drop, I keep on finding more uses for it—and I can’t believe I lived so long without it. As throughout the Wolfram Language, it’s really a story of automation: the Wolfram Data Drop automates away lots of messiness that’s been associated with collecting and processing actual data from real-world sources.

And the result for me is that it’s suddenly realistic for anyone to collect and analyze all sorts of data themselves, without getting any special systems built. For example, last weekend, I ended up using the Wolfram Data Drop to aggregate performance data on our cloud. Normally this would be a complex and messy task that I wouldn’t even consider doing myself. But with the Data Drop, it took me only minutes to set up—and, as it happens, gave me some really interesting results.

I’m excited about all the things I’m going to be able to do with the Wolfram Data Drop, and I’m looking forward to seeing what other people do with it. Do try out the beta that we launched today, and give us feedback (going into a Data Drop databin of course). I’m hoping it won’t be long before lots of databins are woven into the infrastructure of the world: another step forward in our long-term mission of making the world computable…

Leave a Comment

20 Comments


Daniel Reeves

So much potential here for the Quantified Self movement!

Posted by Daniel Reeves    March 4, 2015 at 5:12 pm
Sumit Chawla

This sound interesting. What kind of dataformats are supported here? I tried JSON and CSV files, but cannot get much out of it.

Posted by Sumit Chawla    March 5, 2015 at 1:36 am
    The Wolfram Team

    Thanks for your comment. JSON support is coming soon. At the moment, Data Drop does not support file uploads of data because it is designed to accept data as individual entries of key/value pairs, however, you can find DatabinUpload here as a first step for adding entries in bulk.

    Posted by The Wolfram Team    March 10, 2015 at 11:16 am
Nils Rune Bodsberg

Wow — this is astonishing! So simple, and yet so powerful!

And the WDF data semantics signature, making the data universally interpretable, will work wonders for data sharing.

Posted by Nils Rune Bodsberg    March 5, 2015 at 4:11 am
Vladislav Kaganovskiy

This is very promising, probably have to use cleaning of data/statistics and probabilistic processing of data and other DataMining tools.

Posted by Vladislav Kaganovskiy    March 5, 2015 at 2:25 pm
Ray Frigo

Very interesting. What is the security like on this? Are you encrypting the data and if so, how, both in storage and in transit? How do you manage access?

With the potential for sensitive information to be stored here, and with the propensity of bad actors using such information (including other individuals, corporations, governments), it would seem that you would have a lot more about how this information is protected and controlled.

Posted by Ray Frigo    March 6, 2015 at 2:20 pm
    The Wolfram Team

    Thank you for your comment, we use support SSL for transferring data securely with the API. Databins can have private permissions settings which require authentication with our cloud OAuth system.

    Posted by The Wolfram Team    March 6, 2015 at 3:15 pm
      Ray Frigo

      Are the data bins encrypted and if so at what strength and using what technology?

      Thanks.

      Posted by Ray Frigo    March 6, 2015 at 5:20 pm
        The Wolfram Team

        Thanks for your question, the best security is to use “https”, and to get a private cloud. For more information about Wolfram cloud services see our website.

        Posted by The Wolfram Team    March 10, 2015 at 1:21 pm
Mark Howard

The device looks to be an ElectricImp-based sensor tail of some sort – can you provide details on this device, or is it a secret?

Posted by Mark Howard    March 8, 2015 at 11:08 am
Lou

Interesting! What could this bring to the enterprises and how? I do miss that overall picture. Most data ends up in databases or in semi-structure data in Hadoop clusters. If we would adapt to this approach how can we analyse all this data in all different databins? Are Hadoop->Databins generators in the making? Or databins-> Databases (SAP HANA or Microsoft Analytics platform)? Would be something for a nice blog?

Posted by Lou    March 9, 2015 at 5:56 am
    The Wolfram Team

    Thank you for you comment. The big picture of Data Drop is that it lets you store, analyze, and act on your data all with the same tools: the Wolfram Language and the Wolfram Cloud. Data Drop’s first focus is making it easier to get data into the Wolfram Language on a per-entry basis. From there you can pull data from one or more databins into your session for analysis as desired.

    If data is already stored in a database, I’d suggest starting with the database connectivity already in the language: http://reference.wolfram.com/language/guide/DatabaseConnectivity.html

    If you’re interested in uploading larger amounts of existing data to a databin, we have a tool for that as well: http://reference.wolfram.com/language/ref/DatabinUpload.htm

    Posted by The Wolfram Team    March 13, 2015 at 3:30 pm
Sebastian Orzel

Mathematics is everywhere, especially various types of data. I think that the Wolfram Data Drop will be extremely helpful for many people. Thank you.

Posted by Sebastian Orzel    March 9, 2015 at 6:32 am
Claudiu

This is simply awesome! I ran into this post last night and spent hours reading about your amazing work. Thank you so much for all you do!

Side question: what is the sensor board connected to the electric imp (if you can share)?

Posted by Claudiu    March 11, 2015 at 4:36 pm
Adrian

Are there plans to offer a private cloud version supporting this data drop priced at a level that is more affordable to small businesses? Starting at $25K is not going to work for the numerous smaller shops who would like to have an on-premise or privately hosted (by arbitrary providers) access to their IoT data.

Posted by Adrian    March 13, 2015 at 8:25 pm
    The Wolfram Team

    Thanks for your comment, this early beta version of Data Drop is a service of the Wolfram Cloud, and requires access to the full Cloud. As the technology matures, we will seek to offer it in a more modular fashion so that customers have more options at different price ranges. We’d love to hear more about your specific needs; please contact us via https://datadrop.wolframcloud.com/contact.html

    Posted by The Wolfram Team    March 19, 2015 at 3:04 pm
John

Nice work! This looks very useful. Please answer the following:

On average, how long before data is available for consumption once uploaded?

How frequently can data upload requests and get requests be made?

How much data can be uploaded and downloaded per second?

How much data can a databin hold?

Are there price tiers for different usage levels?

Posted by John    March 16, 2015 at 9:50 am
Donald Pellegrino, PhD

An alternative approach analogous to the Wolfram Data Drop might be the use of a graph database to store the data and to expose it via a SPARQL endpoint. Graph databases or triple stores, SPARQL, SPARQL Federated Query, and other standardized technologies from the W3C Data Activity (http://www.w3.org/2013/data/) seem to provide the ingredients for an interoperable back-end. Does the vision for Wolfram Data Drop include interoperability with SPARQL Federated Query? Will there be a SPARQL endpoint for the Wolfram Data Drop?

Posted by Donald Pellegrino, PhD    April 7, 2015 at 8:40 am


Leave a comment

Loading...

Or continue as a guest (your comment will be held for moderation):