Wolfram Computation Meets Knowledge

Computational Video Premieres in Wolfram Language 12.1

Version 12.1 of the Wolfram Language introduces the long-awaited Video object. The Video object is completely (and only) out-of-core; it can link to an extensive list of video containers with almost any codec. Most importantly, it is bundled with complete stacks for image and audio processing, machine learning and neural nets, statistics and visualization and many more capabilities. This already makes the Wolfram Language a powerful video computation platform, but there are still more features to explore.

The Video Object

A video file typically has a video and an audio track. Here is a Video object linked to a video file:

Engage with the code in this post by downloading the Wolfram Notebook
Video

&#10005

Video["ExampleData/Caminandes.mp4"]

In Version 12.1, by default, the Video object is displayed as a small thumbnail and can be played in an external player. There are other appearances to enable in-notebook players, like the Video object with a basic player:

Video

&#10005

Video["ExampleData/Caminandes.mp4", Appearance -> "Basic"]

Now you can inspect the Video object:

Duration

&#10005

Duration[Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, 
  AudioOutputDevice -> Automatic, SoundVolume -> Automatic]]
Information

&#10005

Information[
 Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, 
  AudioOutputDevice -> Automatic, SoundVolume -> Automatic]]

Most video containers support multiple video, audio and subtitle tracks. Having multiple audio or subtitle tracks in a single file is more common than having more than one video track.

This is an example of a Video object linking to a file with multiple audio and subtitle tracks:

Information

&#10005

Information[Video["ExampleData/bullfinch.mkv"]]

Accessing Parts of a Video

There are several parts of a video you may be interested in extracting. Use VideoFrameList and VideoExtractFrames to extract specific video frames. You can also use VideoFrameList to sample the video uniformly or randomly with frames:

VideoFrameList

&#10005

VideoFrameList[
 Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, 
  AudioOutputDevice -> Automatic, SoundVolume -> Automatic], 3]

Use this function to create a thumbnail grid (a group of smaller images that summarizes the whole video):

VideoFrameList

&#10005

VideoFrameList[
  Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, 
   AudioOutputDevice -> Automatic, SoundVolume -> Automatic], 
  12] // ImageCollage

You can also trim a segment of a video:

VideoTrim

&#10005

VideoTrim[
 Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, 
  AudioOutputDevice -> Automatic, SoundVolume -> Automatic], {30, 60}]

Or extract only the audio track from a video to analyze it:

Audio

&#10005

Audio[Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, 
  AudioOutputDevice -> Automatic, SoundVolume -> Automatic]]
Spectrogram

&#10005

Spectrogram[%]

Performing Analysis

In Version 12.1, we have introduced VideoTimeSeries, which works on frames of a video file to perform any computation—either one frame at a time or a list of frames all at once. This is a powerful tool capable of analysis like in the examples below.

Compute the mean color of each frame over time:

VideoTimeSeries

&#10005

VideoTimeSeries[Mean, 
  Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, 
   AudioOutputDevice -> Automatic, SoundVolume -> Automatic]] // 
 ListLinePlot[#, PlotStyle -> {Red, Green, Blue}] &

Count the number of objects (cars, for example) detected in each frame of a video:

v = Video

&#10005

v = Video["http://exampledata.wolfram.com/cars.avi"];
ts = VideoTimeSeries

&#10005

ts = VideoTimeSeries[Point[ImagePosition[#, Entity["Word", "car"]]] &,
   v]

Plot the number of objects (again, using cars as an example) detected in each frame:

TimeSeriesMap

&#10005

TimeSeriesMap[Length @@ # &, ts] // ListLinePlot

Highlight the position of all detected objects (cars) on a sample frame:

HighlightImage

&#10005

HighlightImage[
 VideoExtractFrames[v, 1], {AbsolutePointSize[3], Flatten@Values[ts]}]

We can also use the multiframe version of the function to perform any analysis that requires multiple frames.

By looking at consecutive frames from a pixabay video and computing the difference between four views, we can find the transition times from one view to another and then use those times to extract one frame per scene:

v = Video

&#10005

v = Video["Musician.mp4"]
diffs = VideoTimeSeries

&#10005

diffs = VideoTimeSeries[ImageDistance @@ # &, v, 
  Quantity[2, "Frames"], Quantity[1, "Frames"]]
ListLinePlot

&#10005

ListLinePlot[diffs, PlotRange -> All]
times = FindPeaks

&#10005

times = FindPeaks[diffs, Automatic, Automatic, 150]["Times"]
VideoExtractFrames

&#10005

VideoExtractFrames[v, Prepend[times, 0]]

Process a Video

The Wolfram Language already included a variety of image and audio processing functions. VideoFrameMap is a function that takes one frame or a list of video frames, filters them and writes them to a new video file. Let’s use the bullfinch video:

v = Video[v = Video[
&#10005

v = Video["ExampleData/bullfinch.mkv"];
VideoFrameList[v,3]

We can start with a color negation as a simple “Hello, World!” example:

VideoFrameMap

&#10005

VideoFrameMap[ColorNegate, v] // VideoFrameList[#, 3] &

Or posterize frames to create a cartoonish effect:

f = With

&#10005

f = With[{tmp = ColorQuantize[#, 16, Dithering -> False]}, 
    tmp - EdgeDetect[tmp]] &;
VideoFrameMap

&#10005

VideoFrameMap[f, v] // VideoFrameList[#, 3] &

Use a neural net to perform semantic segmentation on the previously used video of cars:

v = Video
&#10005

v = Video["http://exampledata.wolfram.com/cars.avi"];
segment
&#10005

segment[img_] := 
 Block[{net, encData, dec, mean, var, prob}, 
  net = NetModel["Dilated ResNet-38 Trained on Cityscapes Data"];
  encData = Normal@NetExtract[net, "input_0"];
  dec = NetExtract[net, "Output"];
  {mean, var} = Lookup[encData, {"MeanImage", "VarianceImage"}];
  Colorize@
   NetReplacePart[
     net, {"input_0" -> 
       NetEncoder[{"Image", ImageDimensions@img, "MeanImage" -> mean, 
         "VarianceImage" -> var}], "Output" -> dec}][img]]
VideoFrameList
&#10005

VideoFrameList[VideoFrameMap[segment, v], 3]

Next is a video stabilization example, which is a vastly simplified version of this Version 12.0 product example. The input video is another pick from pixabay:

v = Video
&#10005

v = Video["soap_bubble.mp4"]

Here is the mask over the ground to make sure the shaking soap bubble movement does not affect our stabilization algorithm:

mask = CloudGet

&#10005

mask = CloudGet["https://wolfr.am/Mt580rl0"];

Next is a routine to find correspondence and geometric transformation between every two consecutive frames, iteratively composed with the previous transformation to get a stabilization all the way to the initial frame:

f = Identity;

&#10005

f = Identity;
VideoFrameMap[
  Module[{tmp}, 
    tmp = Last@
        FindGeometricTransform[##, TransformationClass -> "Rigid"] & @@
       ImageCorrespondingPoints[Sequence @@ #, Sequence[
       MaxFeatures -> 25, Method -> "ORB", Masking -> mask]];
    f = Composition[tmp, f];
    ImagePerspectiveTransformation[#[[2]], f, Sequence[
     DataRange -> Full, Padding -> "Fixed"]]] &, v, 
  Quantity[2, "Frames"], Quantity[1, "Frames"]];

From Manipulate to Video

Let’s switch the topic to generation of video. Manipulate has been a core way of creating animations in the Wolfram Language for over a decade. In Version 12.1, Manipulate expressions can easily be converted to video.

This is a Manipulate from the Wolfram Demonstrations Project:

&#10005

m = ResourceData["Demonstrations Project: Day and Night World Clock"]

And a video generated from it:

Video

&#10005

Video[m]

A video can also be generated from a Manipulate and a Sound or Audio object:

Export

&#10005

Export["file.mp4", {"Animation" -> m, 
   "Audio" -> ExampleData[{"Audio", "PianoScale"}]}, "Rules"] // Video

A Short Note about Supported Codecs

The Wolfram Language by default uses the operating system as well as a limited version of FFmpeg for decoding and encoding a large number of multimedia containers and codecs. $VideoEncoders, $VideoDecoders, $AudioEncoders, etc. list supported encoders and decoders.

Codec support can be expanded even further by installing FFmpeg (Version 4.0.0 or higher). This is the number of decoders and the list of MP4 video decoders on macOS with FFmpeg installed:

Length /@ $VideoDecoders

&#10005

Length /@ $VideoDecoders
$VideoDecoders

&#10005

$VideoDecoders["MP4"][[All, 1]]

More to Come

Video computation in the Wolfram Language is only at its beginning stages. The new capabilities featured here are only part of an already powerful collection of video basics, and we are actively designing and developing updates to existing functions and additional capabilities for future versions, with machine learning and neural net integration at the top of the list. Let us know what you think in the comments—bugs, suggestions and feature requests are always welcome.

Get full access to the latest Wolfram Language functionality with a Mathematica 12.1 or Wolfram|One trial.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

1 comment

  1. Excellent post and a pleasurable read Shadi!

    Thanks,
    Eleazar

    Reply