Walking the Dog: Neural Nets, Image Identification and Geolocation
It’s National Pet Day on April 11, the day we celebrate furry, feathered or otherwise nonhuman companions. To commemorate the date, we thought we’d use some new features in the Wolfram Language to map a dog walk using pictures taken with a smartphone along the way. After that, we’ll use some neural net functions to identify the content in the photos. One of the great things about Wolfram Language 11.1 is pre-trained neural nets, including Inception V3 trained on ImageNet Competition data and Inception V1 trained on Places365 data, among others, making it super easy for a novice programmer to implement them. These two pre-trained neural nets make it easy to: 1) identify objects in images; and 2) tell a user what sort of landscape an image represents.
The Wolfram Documentation Center also makes this especially easy.
First, we need to talk a little bit about metadata stored in digital photographs. When you snap a photo on your smartphone or digital camera, all sorts of data is saved with the image, including the location where the picture was taken. The exchangeable image file format (EXIF) is a standard developed in 1985 that organizes the types of metadata stored. For our purposes, we’re interested in geolocation so we can make a map of our dog walk.
To demonstrate how image metadata works, let’s start with a picture of my cats, Chairman Meow and Detective Biscuits, and see how the Wolfram Language can extract where the picture was taken using GeoPosition.
Fantastic—we have some coordinates. Now let’s see where on Earth this is on a map using GeoGraphics. We’ve defined those coordinates as "catlocation" above. Now, using coordinates from the image of my cats dutifully keeping the bed warm, we define our map as "catmap".
Excellent. Now let’s use the zoom tool to show where these coordinates are.
So, yes, this picture was taken in my old neighborhood in Baton Rouge, Louisiana, where I was a grad student before starting work here at Wolfram Research (Geaux Tigers!). Very cool, and good to know that data is stored in my iPhone pictures.
Just for fun, let’s see if Wolfram’s built-in knowledge has any data on my old neighborhood, known as the Garden District, using Ctrl + = followed by input, which allows us to use the Wolfram Language’s free-form input capability.
Fantastic. Let’s get a quick map of Baton Rouge.
And how about a map showing the rough outline of the Garden District?
This provides us with a rough outline of the Garden District in Baton Rouge. There is a ton of built-in socioeconomic data in the Wolfram Language we could look at, but we’ll save that for another blog post.
Since I’m not a dog owner, I asked a coworker if I could join her on a dog walk to snap some pictures to first map the walk using nothing but photos from a smartphone, then use a neural net to identify the content of the photos.
So I can just drag and drop the photos into a notebook and define them, and then their locations, from their metadata.
OK, that’s pretty good, but let’s add some points, change up some colors and add tooltips to show the images at each stop.
In the Wolfram Notebook (or if you download the CDF), when you hover the mouse over each point, it shows the image that was taken at that location. Very cool.
Next, let’s move on to a new feature in Wolfram Language 11.1, pre-trained neural nets. First, we need to define our net, and we’re going to use Inception V3 trained on ImageNet Competition data, implemented with one impressively simple line of code. Inception V3 is a dataset used for image recognition using a convolutional neural network. Sounds complicated, but we can implement it easily and figure out what objects are in the images.
Now all we have to do is put in an image, and we’ll get an accurate result of what our image is.
Fantastic. Let’s try another picture and see how sure the neural net is of its determination by using "TopProbabilities" as an option.
So its best guess is a goose at a 0.901 probability. Pretty good. Let’s try another image.
The net is less sure in this case what kind of dog this is—which is exactly the right answer: this dog, Maxie, is a mixed border collie/corgi.
Just for fun, let’s see what it thinks my cats are.
Wow. That’s pretty impressive, since they are indeed tabby cats. And I guess it’s reasonable there’s a 0.0446 probability my cats look like platypuses.
Along our dog walk, we took a picture of a pond. Let’s use a different pre-trained neural net to see if it can tell us what kind of landscape it is. For this, we’ll use Inception V1 trained on Places365 data, again implemented with one line of amazingly simple code. This particular neural net identifies landscapes based on a training set of images taken at various locations.
Very neat. Let’s try something else.
OK, sure, a pasture rather than a park. But you can see that it had other things in mind. This kind of reminds me of Stephen Wolfram’s blog post on overcoming artificial stupidity. Since it was written, neural nets (and Wolfram|Alpha) have certainly come a long way.
And let’s see if we can confuse it with a picture of a sculpture.
Not bad.
As you can see, the pre-trained neural nets work really well. If you want to train your own net, Wolfram Language 11.1 makes it painless to do so. So next time you’re out for a walk taking pictures of random objects and want to recreate your walk from images, you can use these new features in the Wolfram Cloud or Wolfram Desktop.
Happy coding, and happy National Pet Day!
Download this post as a Computable Document Format (CDF) file. New to CDF? Get your copy for free with this one-time download.
This was very cool. I am wondering if the time the picture was taken can also be used to get an idea of the dynamics of the walk. I might use if when my students collect field data.
Very fun post!