This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).
Image this prototypical film scene: two characters are driving down the highway, and they erupt into a fierce argument. The driver pulls over and demands the passenger get out. After some resistance, the passenger exits the car, and the driver speeds off. We see a wide shot of the character alone on the side of the road. To add to the loneliness, a truck speeds by, without slowing down.
What sound do you imagine when you hear the truck pass by? Is it this sound, formally known as “Truck By Short Dopple TE047603”? You’ve surely heard it in plenty of movies, TV shows, and video games. It’s instantly recognizable as soon as one hears the first part of sound effect, the truck’s approach.
Sound Effects in the Audio Mix
Audio engineers and sound mixers spend a great time adding sound effects to film. The general maxim is that “if it moves, it needs a sound”. Though this often leads to some improbable sounds (a passing bike always dings its bell), or objects that are normally silent suddenly being loud (a gun being moved suddenly makes all sorts of clicking sounds), the audience would definitely notice if these sounds weren’t in the mix.
This process of adding sound effects to the audio mix is called Foley. There are three main components to Foley, two of which, feet and moves, are related to character movement, while the third, specifics, focus on objects. Many of these sound effects can give valuable clues on the location and context of a scene. Sounds of nature (like running water or various animals) may indicate a scene takes place outdoors. The sounds of crickets usually accompany establishing shots of nighttime scenes. Church bells, school bells, and fire alarm bells are all related, but indicate three separate locations.
Continued development of Moviegoer will allow it to identify these sounds and use these as clues to identify scene location.
First, we would have to be able to identify discrete sound effects in the film’s audio mix, then vectorize them so they can be compared to other sounds. The librosa Python library easily allows us to do this, converting sounds to frequency data through Fourier transforms with the mfcc() function.
With sounds vectorized, we can compare them to known, labeled movie sound effects. This would require some manual “hard-coding” of a lookup dictionary with many different sounds. But think back to the truck sound effect example at the beginning of this post — this specific sound has been used in dozens of movies and TV shows. Often audio engineers will reuse sounds or draw from pre-made sound libraries, rather than create their own from scratch. Indeed, TE047603 was included in a sound effect pack released in the 90’s by The Hollywood Edge, a company that produces stock sound effects.
There are only so many common stock sound effects to process and label, with each of those contributing valuable clues to a scene’s context. Because of the substantial effort of cataloging these sound effects, Moviegoer development (for now) will continue without this implementation, and it will be reserved for future effort.
Audience Recognition and Ironic Usage of Sound Effects
One final word on “Truck By Short Dopple TE047603”. Because this sound is so recognizable, so familiar to audience, it tends to lighten the mood of the scene. In the example at the beginning of this post, the truck sound has the effect of capping off an argumentative scene with a joke, this cliched sound effect — it was deliberately inserted into the audio mix to lighten the mood. (The ubiquitous “Wilhelm scream”, often included as an in-joke in film mixes, has the same effect.) So not only does this sound effect offer substantial clues about the scene’s location and happenings, but its presence provides a hint about the scene’s intended emotional impact.
Wanna see more?