Computational Video Premieres in Wolfram Language 12.1
Version 12.1 of the Wolfram Language introduces the long-awaited Video object. The Video object is completely (and only) out-of-core; it can link to an extensive list of video containers with almost any codec. Most importantly, it is bundled with complete stacks for image and audio processing, machine learning and neural nets, statistics and visualization and many more capabilities. This already makes the Wolfram Language a powerful video computation platform, but there are still more features to explore.
The Video Object
A video file typically has a video and an audio track. Here is a Video object linked to a video file:
Engage with the code in this post by downloading the Wolfram Notebook
✕ Video["ExampleData/Caminandes.mp4"] |
In Version 12.1, by default, the Video object is displayed as a small thumbnail and can be played in an external player. There are other appearances to enable in-notebook players, like the Video object with a basic player:
✕ Video["ExampleData/Caminandes.mp4", Appearance -> "Basic"] |
Now you can inspect the Video object:
✕ Duration[Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic]] |
✕ Information[ Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic]] |
Most video containers support multiple video, audio and subtitle tracks. Having multiple audio or subtitle tracks in a single file is more common than having more than one video track.
This is an example of a Video object linking to a file with multiple audio and subtitle tracks:
✕ Information[Video["ExampleData/bullfinch.mkv"]] |
Accessing Parts of a Video
There are several parts of a video you may be interested in extracting. Use VideoFrameList and VideoExtractFrames to extract specific video frames. You can also use VideoFrameList to sample the video uniformly or randomly with frames:
✕ VideoFrameList[ Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic], 3] |
Use this function to create a thumbnail grid (a group of smaller images that summarizes the whole video):
✕ VideoFrameList[ Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic], 12] // ImageCollage |
You can also trim a segment of a video:
✕ VideoTrim[ Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic], {30, 60}] |
Or extract only the audio track from a video to analyze it:
✕ Audio[Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic]] |
✕ Spectrogram[%] |
Performing Analysis
In Version 12.1, we have introduced VideoTimeSeries, which works on frames of a video file to perform any computation—either one frame at a time or a list of frames all at once. This is a powerful tool capable of analysis like in the examples below.
Compute the mean color of each frame over time:
✕ VideoTimeSeries[Mean, Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic]] // ListLinePlot[#, PlotStyle -> {Red, Green, Blue}] & |
Count the number of objects (cars, for example) detected in each frame of a video:
✕ v = Video["http://exampledata.wolfram.com/cars.avi"]; |
✕ ts = VideoTimeSeries[Point[ImagePosition[#, Entity["Word", "car"]]] &, v] |
Plot the number of objects (again, using cars as an example) detected in each frame:
✕ TimeSeriesMap[Length @@ # &, ts] // ListLinePlot |
Highlight the position of all detected objects (cars) on a sample frame:
✕ HighlightImage[ VideoExtractFrames[v, 1], {AbsolutePointSize[3], Flatten@Values[ts]}] |
We can also use the multiframe version of the function to perform any analysis that requires multiple frames.
By looking at consecutive frames from a pixabay video and computing the difference between four views, we can find the transition times from one view to another and then use those times to extract one frame per scene:
✕ v = Video["Musician.mp4"] |
✕ diffs = VideoTimeSeries[ImageDistance @@ # &, v, Quantity[2, "Frames"], Quantity[1, "Frames"]] |
✕ ListLinePlot[diffs, PlotRange -> All] |
✕ times = FindPeaks[diffs, Automatic, Automatic, 150]["Times"] |
✕ VideoExtractFrames[v, Prepend[times, 0]] |
Process a Video
The Wolfram Language already included a variety of image and audio processing functions. VideoFrameMap is a function that takes one frame or a list of video frames, filters them and writes them to a new video file. Let’s use the bullfinch video:
✕
v = Video["ExampleData/bullfinch.mkv"]; VideoFrameList[v,3] |
We can start with a color negation as a simple “Hello, World!” example:
✕ VideoFrameMap[ColorNegate, v] // VideoFrameList[#, 3] & |
Or posterize frames to create a cartoonish effect:
✕ f = With[{tmp = ColorQuantize[#, 16, Dithering -> False]}, tmp - EdgeDetect[tmp]] &; |
✕ VideoFrameMap[f, v] // VideoFrameList[#, 3] & |
Use a neural net to perform semantic segmentation on the previously used video of cars:
✕
v = Video["http://exampledata.wolfram.com/cars.avi"]; |
✕
segment[img_] := Block[{net, encData, dec, mean, var, prob}, net = NetModel["Dilated ResNet-38 Trained on Cityscapes Data"]; encData = Normal@NetExtract[net, "input_0"]; dec = NetExtract[net, "Output"]; {mean, var} = Lookup[encData, {"MeanImage", "VarianceImage"}]; Colorize@ NetReplacePart[ net, {"input_0" -> NetEncoder[{"Image", ImageDimensions@img, "MeanImage" -> mean, "VarianceImage" -> var}], "Output" -> dec}][img]] |
✕
VideoFrameList[VideoFrameMap[segment, v], 3] |
Next is a video stabilization example, which is a vastly simplified version of this Version 12.0 product example. The input video is another pick from pixabay:
✕
v = Video["soap_bubble.mp4"] |
Here is the mask over the ground to make sure the shaking soap bubble movement does not affect our stabilization algorithm:
✕ mask = CloudGet["https://wolfr.am/Mt580rl0"]; |
Next is a routine to find correspondence and geometric transformation between every two consecutive frames, iteratively composed with the previous transformation to get a stabilization all the way to the initial frame:
✕ f = Identity; VideoFrameMap[ Module[{tmp}, tmp = Last@ FindGeometricTransform[##, TransformationClass -> "Rigid"] & @@ ImageCorrespondingPoints[Sequence @@ #, Sequence[ MaxFeatures -> 25, Method -> "ORB", Masking -> mask]]; f = Composition[tmp, f]; ImagePerspectiveTransformation[#[[2]], f, Sequence[ DataRange -> Full, Padding -> "Fixed"]]] &, v, Quantity[2, "Frames"], Quantity[1, "Frames"]]; |
From Manipulate to Video
Let’s switch the topic to generation of video. Manipulate has been a core way of creating animations in the Wolfram Language for over a decade. In Version 12.1, Manipulate expressions can easily be converted to video.
This is a Manipulate from the Wolfram Demonstrations Project:
✕
m = ResourceData["Demonstrations Project: Day and Night World Clock"] |
And a video generated from it:
✕ Video[m] |
A video can also be generated from a Manipulate and a Sound or Audio object:
✕ Export["file.mp4", {"Animation" -> m, "Audio" -> ExampleData[{"Audio", "PianoScale"}]}, "Rules"] // Video |
A Short Note about Supported Codecs
The Wolfram Language by default uses the operating system as well as a limited version of FFmpeg for decoding and encoding a large number of multimedia containers and codecs. $VideoEncoders, $VideoDecoders, $AudioEncoders, etc. list supported encoders and decoders.
Codec support can be expanded even further by installing FFmpeg (Version 4.0.0 or higher). This is the number of decoders and the list of MP4 video decoders on macOS with FFmpeg installed:
✕ Length /@ $VideoDecoders |
✕ $VideoDecoders["MP4"][[All, 1]] |
More to Come
Video computation in the Wolfram Language is only at its beginning stages. The new capabilities featured here are only part of an already powerful collection of video basics, and we are actively designing and developing updates to existing functions and additional capabilities for future versions, with machine learning and neural net integration at the top of the list. Let us know what you think in the comments—bugs, suggestions and feature requests are always welcome.
Get full access to the latest Wolfram Language functionality with a Mathematica 12.1 or Wolfram|One trial. |
Excellent post and a pleasurable read Shadi!
Thanks,
Eleazar