Wolfram Computation Meets Knowledge

New in the Wolfram Language: Audio

I have always liked listening to music. In high school, I started wondering how it is that music seems to be so universally pleasing, and how it differs from other kinds of sounds and noises. I started learning to play guitar, and later at the University of Trieste, I learned about acoustics and signal processing. I picked up the guitar in high school, but once I began learning to program, the idea of being able to create and process any sound using a computer was liberating. I didn’t need to buy expensive and esoteric gear; I just needed to write some (or a lot!) of code. There are many programming languages that focus on music and sound, but complex operations (such as sampling a number from a special distribution, or the simulation of random processes) often require a lot of effort. That’s why the audio capabilities in the Wolfram Language are special: the ability to deal with audio objects is combined with all the knowledge and computational power of the Wolfram Language!

First, we needed a brand-new atomic object in the language: the Audio object.

Import["http://exampledata.wolfram.com/bach.mp3"]

The Audio object is represented by a playable user interface and stores the signal as a collection of sample values, along with some properties such as sample rate.

In addition to importing and storing every sample value in memory, an Audio object can reference an external object, which means that all the processing is done by streaming the samples from a local or remote file. This allows us to deal with big recordings or large collections of audio files without the need for any special attention.

The file size of the two-minute Bach piece above is almost 50MB, uncompressed.

ByteCount[a]

47960528

The out-of-core representation of the same file is only a few hundred bytes.

afile = Audio["http://exampledata.wolfram.com/bach.mp3"]

ByteCount[afile]

576

Audio objects can be created using an explicit list of values.

f[t_] := Mod[
   t*BitAnd[BitOr[BitShiftRight[t, 12], BitShiftRight[t, 8]], 
     BitAnd[63, BitShiftRight[t, 4]]], 256, -128];
data = Table[f[t], {t, 0, 100000}];
data // Short

Audio[data, "SignedInteger8", SampleRate 8000]

Various commonly generated audio signals can be easily and efficiently created using the new AudioGenerator function, ranging from basic waveform and noise models to more complex signals.

Table[Labeled[
  AudioPlot[AudioGenerator[wave, .01], PlotTheme "Minimal"], 
  wave], {wave, {"Sin", "Sawtooth", "White"}}]



The AudioGenerator function also supports pure functions, random processes and TimeSeries as input.



Now that we know what Audio objects are and how to create them, what can we do with them?

The Wolfram Language has a lot of native features for audio processing. As an example, we have complex filters at our disposal with very little effort.

Use LowpassFilter to make a recording less harsh.





WienerFilter can be useful in removing background noise.





A lot of audio-specific functionality has been developed for editing and processing Audio objects—for example, editing (AudioTrim, AudioPad, AudioNormalize, AudioResample), to visualization (AudioPlot, Spectrogram, Periodogram), special effects (AudioPitchShift, AudioTimeStretch, AudioReverb) and analysis (AudioLocalMeasurements, AudioMeasurements, AudioIntervals).

It is easy to manipulate sample values or perform basic edits, such as trimming.

A fun special effect consists of increasing the pitch of a recording without changing the speed.





And maybe adding an echo to the result.



With a little effort, it is also possible to apply more refined processing. Let’s try to replicate what often happens at the end of commercials: speed up a normal recording without losing words.

We can start by deleting silent intervals.



Delete the silences from the recording.



Finally, speed up the result using AudioTimeStretch.



To make the result sound less dry, we can apply some reverberation using AudioReverb.



Much of the processing can be done by using the Wolfram Language’s arithmetic functions; all of them work seamlessly on Audio objects. This is all the code we need for amplitude modulation.







Or you can do a weighted average of a list of recordings.








A lot of the analysis tasks can be made easier by AudioLocalMeasurements. This function can automatically compute a collection of features from a recording. Say you want to synthesize a sound with the same pitch and amplitude as a recording.



AudioLocalMeasurements makes the extraction of the fundamental frequency and the amplitude profile a one-liner.

Using these two measurements, one can reconstruct pitch and amplitude of the original signal using AudioGenerator.



We get a huge bonus by using the results of AudioLocalMeasurements as an input to any of the advanced capabilities the Wolfram Language has in many different fields.

Potential applications include machine learning tasks like classifying a collection of recordings.

And then there’s 3D printing! Produce a 3D-printed version of the waveform of a recording.

You can get an idea of the variety of applications at Wolfram’s Computational Audio page, or by looking at the audio documentation pages and tutorials.

Sounds are a big part of everyone’s life, and the Audio framework in the Wolfram Language can be a powerful tool to create and understand them.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

5 comments

  1. Could you demo going through an audio file and pairing the distinct speakers with the locations of their blurbs?

    Reply
  2. Carlo

    Thanks for an interesting post. Is it possible to get a copy of this notebook in either .nb or .cdf format? It would be interesting to play around with the examples you have provided.

    Michael

    Reply
  3. Great article on audio objects! I have a quick question, you can take the weighted averages yes but, can you use AudioLocalMeasurements to weight each audio value differently? Or does it have to be an average of all the sounds.

    Thanks

    Reply
  4. This is an impressive array of tools. Well done !
    One more would really complete this set – Mathematica needs a robust means of recording Audio. The only tool currently documented (SystemDialogInput[“RecordSound”]) is not completely stable, and lacks the programmatic controls to enable precision measurements. With a robust Recording capability, your Audio tool kit could be widely used in physics and engineering.

    Reply
  5. Nice package. An extra incentive to use Mathematica with numerical audio processing as well.

    Reply