Moviegoer: Audio Features — Score

Tim Lee
4 min readAug 31, 2020

--

This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

A film’s score adds emotional impact to the action onscreen. King Kong (1933) was the first film to incorporate a full orchestra score, adding weight to Kong’s rampage. The famous shower scene in Psycho (1960) wouldn’t be nearly as frightening without the violin screeches. And the two notes of the shark’s theme in Jaws (1975) were a big source of suspense for the POV underwater stalking scenes. A film score is a directorial decision, made to influence our emotional response. We can use the librosa library to conduct various analyses.

Tempo and Chromagram

First, we’ll take a look at what we can understand from “Alone in Kyoto”, a song in Lost in Translation (2003). Lost in Translation, a film about loneliness and isolation, benefits from a score with heavy use of shoegaze: slow, ethereal, and introspective. “Alone in Kyoto”, by the electronic group Air, is a slow, minimalist song that accompanies Scarlett Johansson’s character as she wanders through Japanese temples alone, taking in the foreign sights and customs.

We can estimate the tempo, a relatively slow, calm 89 BPM. The chroma features can be extracted to estimate with notes/pitch classes are present at each time window. The chromagram provides a visualization — it looks like the D note is the most prevalent.

Scales and Chords — Major and Minor

At its most simplified level, the score music is happy or sad. As a general, broad rule of thumb, music composed in major scales are happy, and music composed in minor scales are sad. Scales are composed of seven of the 12 pitch classes. Using chroma_stft() gives us the mean intensity of all 12, making a best effort to group audio data into the 12 pitch classes. By looking at the list of top seven, we may be able to map these to major and minor scales.

We may also want to identify major and minor chords, with the help of the pychord library. Again, major chords are happy, and minor chords sad. Chords are made of three or four individual notes to create (what sounds like) a single tone. Here, we can find a minor chord, the diminished triad for the root note C (Cdim) in a French-inspired scene transition in The Hustle (2019). For each time window, we can look for each pitch above a certain intensity. In the below example, we found three semitones in this window, index numbers 0, 3, 6, corresponding to the notes C, E-flat, and G-flat.

Diegetic vs. Non-Diegetic Music

We’ll also want to differentiate between diegetic and non-diegetic music. Non-diegetic music has been overlaid on top of the film’s soundtrack, with the implication that it isn’t part of the in-movie story. Diegetic music is in-universe, and may be a song on a car radio, or a character singing karaoke.

We may be able to tell the difference by looking at the song’s frequencies, specifically the spectrograms and frequency roll-offs. Below are two spectrograms for the Alanis Morrissette hit “You Oughta Know”. One is the audio from Booksmart (2019), where the song is performed at a karaoke party; the other is the album version.

Log spectrogram for the karaoke performance of “You Oughta Know” in “Booksmart”
Log spectrogram for the album version of “You Oughta Know”

We can also compare the frequency roll-offs, or the frequencies that contain a certain percentage of overall intensity. Typically when shaping audio to make it sound like it’s coming out of speakers onscreen, the high and low frequencies are reduced or rejected entirely with band-reject filters. Although further research is required to tell the difference, we can use these principles as a start.

Wanna see more?

--

--

Tim Lee
Tim Lee

Written by Tim Lee

Unlocking the emotional knowledge hidden within the world of cinema.

No responses yet