Wolfram Computation Meets Knowledge

Wolfram Neural Networks Boot Camp Recap: Dog vs. Butterfly Optical Illusion Showdown

Wolfram Neural Networks Boot Camp Recap: Dog vs. Butterfly Optical Illusion Showdown

Neural networks are increasingly a part of society and are used in many aspects of life, especially e-commerce and social media. I recently had the opportunity to attend the Wolfram Neural Networks Boot Camp with developers and researchers who design and utilize Wolfram Language neural net resources. During the boot camp, participants received a crash course on using neural nets in the Wolfram Language.

Wolfram developer and researcher Markus van Almsick introduced attendees to the structure and features of the Wolfram Language for computer vision and deep-learning image classification. During this session, he explained how to create an optical illusion for a computer. What does that mean? Would an optical illusion that fools a person fool a computer too? Maybe it would work vice versa?

This blog is an exploration for beginning users of neural networks and computer vision that uses an example of a computer vision shortcoming. Fooling a neural network is an introduction to generative adversarial networks (GANs). If you’re interested in learning about neural networks in either a complete introduction or a refresher, you should read Tuseeta Banerjee’s blog post “Neural Networks: An Introduction.”

How Do Computers See?

Computers see an image as a rank-3 tensor. Each pixel located in an image has three values associated with it that correspond to red, green and blue (RGB). The computer reads these three values and translates them into light we can see and understand. These values, for example, produce the colors on your screen right now.

Humans excel at pattern recognition. We can pick out patterns everywhere: in faces, shapes in clouds and plenty of other places—including toast. Even though neural networks accomplish similar visual tasks as humans, substantial differences remain.

The center of the human retina provides much higher resolution compared to its peripheral vision. Computers, in contrast, weigh equally the RGB values of each pixel in an entire image. Furthermore, humans recognize aligned, oriented features with ease, while neural networks are hard to beat in texture recognition. These differences become apparent when investigating optical illusions that humans and machines fall for.

Let’s utilize some example data from the Wolfram Data Repository to demonstrate this.

imgArray = {ExampleData
&#10005

imgArray = {ExampleData[{"TestImage", "Airplane2"}], 
  ImagePerspectiveTransformation[
   ExampleData[{"TestImage", "Airplane2"}], \!\(\*
TagBox[
RowBox[{"(", GridBox[{
{"1", 
RowBox[{"1", "/", "3"}], "0"},
{
RowBox[{"1", "/", "3"}], "1", "0"},
{"0", 
RowBox[{"1", "/", "2"}], "1"}
},
GridBoxAlignment->{
         "Columns" -> {{Center}}, "ColumnsIndexed" -> {}, 
          "Rows" -> {{Baseline}}, "RowsIndexed" -> {}},
GridBoxSpacings->{"Columns" -> {
Offset[0.27999999999999997`], {
Offset[0.7]}, 
Offset[0.27999999999999997`]}, "ColumnsIndexed" -> {}, "Rows" -> {
Offset[0.2], {
Offset[0.4]}, 
Offset[0.2]}, "RowsIndexed" -> {}}], ")"}],
Function[BoxForm`e$, 
MatrixForm[BoxForm`e$]]]\), Background -> Transparent, 
   Masking -> All]}

The Wolfram Language’s neural network-powered function ImageIdentify will attempt to identify the original and transformed images.

ImageIdentify /@ imgArray
&#10005

ImageIdentify /@ imgArray

This feels like the question is answered. Deforming the image is easy, but it could make the image difficult for even a human to identify. Why not try to create an image that surely fools a neural network without deforming the image totally?

Dog vs. Butterfly

During the Neural Networks Boot Camp, we were walked through a fun and relatively simple exploration of a trained neural network currently available to Wolfram Language users.

inceptNet = 
 NetModel
&#10005

inceptNet = 
 NetModel["Inception V1 Trained on ImageNet Competition Data"]

Inception V1 is a neural network released by Google in 2014. Attendees learned how to access the Wolfram Neural Net Repository (premade and pretrained networks for various applications that can be found and programmatically called using NetModel). This NetChain is a complete model that was used in the previous example to make predictions and can be trained with new data and even restructured (which is also known as net surgery and transfer learning).

The first step is to define how we want our test image to be predicted. We will pull sample images for a dog and a butterfly from the Data Repository.

butterfly = 
 ResourceData
&#10005

butterfly = 
 ResourceData["Sample Image: Orange Butterfly on a Purple Flower"]

dog = ResourceData
&#10005

dog = ResourceData["Sample Image: White Dog on a Beach"]

Inception V1 identifies that the first image has a butterfly in it.

inceptNet
&#10005

inceptNet[butterfly]

The Inception V1 network stores probabilities of every option the image could represent, and we can pull what it believes are the most probable identifications.

inceptNet
&#10005

inceptNet[dog, {"TopProbabilities", 3}]

Following are the probabilities the dog and butterfly images are either a dog or a butterfly. In both cases, it’s very sure that it is not one of the options.

{Entity
&#10005

{Entity["Concept", "DanausPlexippus::bfk9c"], 
  Entity["Concept", "Samoyede::rq827"]} /. 
 inceptNet[dog, "Probabilities"]

{Entity
&#10005

{Entity["Concept", "DanausPlexippus::bfk9c"], 
  Entity["Concept", "Samoyede::rq827"]} /. 
 inceptNet[butterfly, "Probabilities"]

The goal is to make Inception V1 believe with a high probability or confidence that the image of the butterfly is a dog, specifically a Samoyed. This is done via increasing the other concept’s probabilities. The key is to create the features in the butterfly image that the neural net is looking for when it identifies the dog.

For this purpose, we will implement a new neural network called foolNet. It will contain the butterfly image as an array of weights and modify these weights to change the image classification from “Monarch butterfly” to “Samoyed.”

Step one in building our network is to extract from the Inception V1 network the input image dimensions and number of color channels as well as the NetDecoder with its output classes. These parameters are needed later in the construction of foolNet.

imageDims = 
 NetExtract
&#10005

imageDims = 
 NetExtract[inceptNet, {"Input", "ImageSize"}]; imageChannels = 
 NetExtract[inceptNet, {"Input", "ColorChannels"}]; decoder = 
 Normal@NetExtract[inceptNet, "Output"];
decodeDims = decoder["Dimensions"];
foolNetEncoder = 
  NetEncoder[{"Class", decoder["Labels"], "UnitVector"}];

To provoke a misclassification, we apply a modified CrossEntropyLossLayer to Inception V1. This layer is doing most of the legwork for foolNet by comparing the resulting 1,000 class probabilities to those of a dog.

featureLossLayer = CrossEntropyLossLayer
&#10005

featureLossLayer = CrossEntropyLossLayer["Probabilities"]

foolNet is inherently simple. The image of the butterfly will be stored as a tensor of weights in a NetArrayLayer that we prepend as input to the inceptNet.

imgWeights = 
  ImageData
&#10005

imgWeights = 
  ImageData[
   ImageResize[butterfly, imageDims, Resampling -> {"OMOMS", 3}], 
   Interleaving -> False];

The new network is made using NetGraph and NetPort.

foolNet
&#10005

foolNet[img_Image] := 
 With[{imgWeights = 
    ImageData[ImageResize[img, imageDims, Resampling -> {"OMOMS", 3}],
      Interleaving -> False]}, 
  NetGraph[<|
    "image" -> NetArrayLayer["Array" -> imgWeights],
    "inceptnet" -> 
     NetReplacePart[
      inceptNet, {"Input" -> Prepend[imageDims, imageChannels], 
       "Output" -> {decodeDims}} ],
    "loss" -> featurelosslayer
    |>, {"image" -> "inceptnet", {"inceptnet", NetPort["Target"]} -> 
     "loss" -> NetPort["Loss"]},
   "Target" -> foolnetencoder]]

Note that foolNet requires no training data. We only provide the target tag for the misclassification, which amounts to a batch size of 1. Furthermore, almost all LearningRateMultipliers are set to None to avoid the (re)training of inceptNet itself. Just the learning rate for the layer with the butterfly image is set to 1.

training = NetTrain
&#10005

training = NetTrain[
   foolNet[butterfly],
   <|"Target" -> {inceptNet[dog]}|>,
   All,
   LearningRateMultipliers -> {"image" -> 1, _ -> None},
   MaxTrainingRounds -> 300, BatchSize -> 1,
   TargetDevice -> "CPU"
   ];

Observe the structure of foolNet, as it is still per se the same network. The metric the model uses to learn about the tracked data, however, is changed. foolNet is minimizing the loss (or incorrect probabilities) of the new image created.

fooledNet = training
&#10005

fooledNet = training["TrainedNet"]

foolNet uses the structure of Inception V1 with the additional loss layer. The output or target is the modified weight tensor. This tensor can also be converted back into an image.

maybedog = 
 ImageResize
&#10005

maybedog = 
 ImageResize[
  Image[NetExtract[fooledNet, {"image", "Array"}], 
   Interleaving -> False], ImageDimensions[butterfly], 
  Resampling -> {"OMOMS", 3}]

This image, while blurry, looks like a butterfly, and most humans would identify it as one. Testing in Inception V1, however, produces peculiar results.

inceptNet
&#10005

inceptNet[maybedog, {"TopProbabilities", 5}]

Now, comparing the probabilities of butterfly or Samoyed identification:

{Entity
&#10005

{Entity["Concept", "DanausPlexippus::bfk9c"], 
  Entity["Concept", "Samoyede::rq827"]} /. 
 inceptNet[maybedog, "Probabilities"]

ImageDifference
&#10005

ImageDifference[butterfly, maybedog] // ImageAdjust

Inception V1 is now pretty sure it sees a Samoyed dog, or at least it’s sure the image isn’t a butterfly. None of the overall patterns of the butterfly’s wings have changed, so this is a certified computer optical illusion. Neural networks are fooled with seemingly slight adjustments because they are not human brains; they are constrained by methods that are very simple in comparison. This means a neural network can incorrectly classify a slightly modified image.

Before celebrating our supremacy over neural networks, it’s worth noting that simply blurring the resulting image causes Inception V1 to again correctly identify the butterfly.

Blur
&#10005

Blur[maybedog, 3]

inceptNet
&#10005

inceptNet[Blur[maybedog, 3]]

As explained at the Neural Networks Boot Camp, this method is network- and image-specific. Given a new network or image, the procedure must be repeated and potentially modified. Fooling or beating a neural network is expanded upon and applied within GANs.

Additional Resources

This article explores just one of many topics covered at the Wolfram Neural Networks Boot Camp. If you’re interested in learning more about the structure of neural networks and using them in the Wolfram Language, take a look at these resources:

Check out the rest of Wolfram U’s courses and tutorials to learn how to use Wolfram technologies in a wide range of fields and applications.

Get recognized for your computational achievements with Wolfram Certifications.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

3 comments

  1. Thanks for this interesting article. Nevertheless, there is some sort of implicit neural network failure in it. The butterfly is identified as monarch but I’m quite sure, the image shows a painted lady (vanessa cardui).
    Best regards Jürg

    Reply
    • Hello!

      Very good eye picking this up; this is another challenge when training neural networks. Inception V1 (the neural network used) is trained on ImageNet Competition data and only contains classes for six species of butterfly, which don’t include the Painted Lady. So, Inception V1 could never identify the butterfly correctly without further training. This is less of an error and more of an ever-present challenge to overcome. A neural network can only use classes and data it’s seen before; when given something new that it hasn’t seen before, it will struggle to identify images correctly. We could use Transfer Learning on Inception V1 with sets of Painted Lady images to train it to identify a new class.

      Information on Inception V1 and its training set:
      https://resources.wolframcloud.com/NeuralNetRepository/resources/Inception-V1-Trained-on-ImageNet-Competition-Data

      Reply