New in the Wolfram Language: Audio
September 23, 2016 — Carlo Giacometti, Kernel Developer, Algorithms R&D
I have always liked listening to music. In high school, I started wondering how it is that music seems to be so universally pleasing, and how it differs from other kinds of sounds and noises. I started learning to play guitar, and later at the University of Trieste, I learned about acoustics and signal processing. I picked up the guitar in high school, but once I began learning to program, the idea of being able to create and process any sound using a computer was liberating. I didn’t need to buy expensive and esoteric gear; I just needed to write some (or a lot!) of code. There are many programming languages that focus on music and sound, but complex operations (such as sampling a number from a special distribution, or the simulation of random processes) often require a lot of effort. That’s why the audio capabilities in the Wolfram Language are special: the ability to deal with audio objects is combined with all the knowledge and computational power of the Wolfram Language!
First, we needed a brand-new atomic object in the language: the Audio object.
The Audio object is represented by a playable user interface and stores the signal as a collection of sample values, along with some properties such as sample rate.
In addition to importing and storing every sample value in memory, an Audio object can reference an external object, which means that all the processing is done by streaming the samples from a local or remote file. This allows us to deal with big recordings or large collections of audio files without the need for any special attention.
The file size of the two-minute Bach piece above is almost 50MB, uncompressed.
The out-of-core representation of the same file is only a few hundred bytes.
Audio objects can be created using an explicit list of values.
Various commonly generated audio signals can be easily and efficiently created using the new AudioGenerator function, ranging from basic waveform and noise models to more complex signals.
The AudioGenerator function also supports pure functions, random processes and TimeSeries as input.
Now that we know what Audio objects are and how to create them, what can we do with them?
The Wolfram Language has a lot of native features for audio processing. As an example, we have complex filters at our disposal with very little effort.
Use LowpassFilter to make a recording less harsh.
WienerFilter can be useful in removing background noise.
A lot of audio-specific functionality has been developed for editing and processing Audio objects—for example, editing (AudioTrim, AudioPad, AudioNormalize, AudioResample), to visualization (AudioPlot, Spectrogram, Periodogram), special effects (AudioPitchShift, AudioTimeStretch, AudioReverb) and analysis (AudioLocalMeasurements, AudioMeasurements, AudioIntervals).
It is easy to manipulate sample values or perform basic edits, such as trimming.
A fun special effect consists of increasing the pitch of a recording without changing the speed.
And maybe adding an echo to the result.
With a little effort, it is also possible to apply more refined processing. Let’s try to replicate what often happens at the end of commercials: speed up a normal recording without losing words.
We can start by deleting silent intervals.
Delete the silences from the recording.
Finally, speed up the result using AudioTimeStretch.
To make the result sound less dry, we can apply some reverberation using AudioReverb.
Much of the processing can be done by using the Wolfram Language’s arithmetic functions; all of them work seamlessly on Audio objects. This is all the code we need for amplitude modulation.
Or you can do a weighted average of a list of recordings.
A lot of the analysis tasks can be made easier by AudioLocalMeasurements. This function can automatically compute a collection of features from a recording. Say you want to synthesize a sound with the same pitch and amplitude as a recording.
AudioLocalMeasurements makes the extraction of the fundamental frequency and the amplitude profile a one-liner.
Using these two measurements, one can reconstruct pitch and amplitude of the original signal using AudioGenerator.
We get a huge bonus by using the results of AudioLocalMeasurements as an input to any of the advanced capabilities the Wolfram Language has in many different fields.
Potential applications include machine learning tasks like classifying a collection of recordings.
And then there’s 3D printing! Produce a 3D-printed version of the waveform of a recording.
Sounds are a big part of everyone’s life, and the Audio framework in the Wolfram Language can be a powerful tool to create and understand them.