This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).
We can identify the beginnings and ends of scenes, which means we can isolate the dialogue and analyze them as a single, self-contained conversation. This is most easily done by looking at the subtitles, which are ground-truth transcriptions of the dialogue — no audio recognition required. We’ve previously designed a system to load a film’s .srt subtitle file into pandas dataframes using the Python library pysrt. …
This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).
Without any structure, a film is just a collection of a few thousand frames of images and a very long audio track. Conversations bleed into one another, characters appear and disappear without reason, and we teleport from one location to the next. We can begin to organize a film by dividing it into individual scenes. We’ll use an example of Lost in Translation (2003).
To start, we’ll just be identifying two-character dialogue scenes. These are the most basic building-blocks of films: just two characters speaking together with no distractions, purely advancing the plot with their dialogue. In modern filmmaking, these scenes are usually shot in a specific manner. We can take advantage of this by looking for specific patterns of shots. …
This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).
The Moviegoer prototype is complete, the unveiling includes a technical demonstration as well as two background articles of why movies are key to training better emotional AI. With a functional prototype complete, it’s time to think of what comes next in the project. Here are six areas of improvement, and their context within the larger impacts of Moviegoer.
Subtitles are key for identifying dialogue, and they provide descriptions of non-dialogue sound effects and audio. However, there isn’t a set standard for subtitle formatting, and the Moviegoer prototype only covers a single format. Improving the subtitle processing will allow for the “watching” of many more movies. …
Smart devices, digital assistants, and service chatbots are becoming ubiquitous, but they don’t yet have the emotional capacity they need to fully understand how we communicate. Emotional AI models should be able to detect specific emotions and act with empathy, understand societal norms, and recognize specific communicational nuances, like irony and humor. Datasets of emotional data exist, but are inadequate in terms of size and realistic context. Fortunately, there’s a dataset which can satisfy all these requirements: movies.
In a previous post, I demonstrated the prototype of Moviegoer, a tool which breaks films into structured data. This essentially allows a machine to “watch” a movie: identifying individual scenes, parsing conversational dialogue, and recognizing characters. …
Film is the key to teaching emotion to AI. As I argued in a previous post, machines will learn emotions by “watching” movies. By turning cinema into structured data, we can unlock a trove of emotional data which can then be used to train AI models. Moviegoer is a tool that can do this automatically, turning movies into self-labeling data. The prototype is complete, and can automatically identify scenes, discover key dialogue, and track characters and their emotions throughout a film.
The Moviegoer prototype is freely available for anyone to replicate the results. The examples below were taken straight from the prototype, with no pre-processing cleaning required of the input movie data. …
In the near-future, we’ll be surrounded by AI entities who act just like humans. They’ll be able to maintain conversations with the perfect amount of hesitations, slang, and cadence to be indistinguishable from a person. They’ll be able to interact with us and know exactly how we’re feeling, analyzing our facial expressions, body language, word choice, voice tone, and eye movements. This is all technology that exists today, but it’s lacking the key component that gives them this personal connection. Luckily, we already have the missing piece. We will teach robots emotion by having them watch movies.
The Moviegoer project has the goal of unlocking the enormous wealth of emotional data within cinema to advance affective computing (emotional AI). However, movies are incredibly difficult for machines to interpret. Whether we realize it or not, there are many filmmaking conventions which we take for granted (e.g. passage of time between scenes, dramatic music over a conversation, montages). We humans can understand how these can affect a film, but can a robot? …
This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).
With scenes and their details identified, we turn our attention to characters. Tracking characters (along with their screentime, dialogue, and emotional ups-and-downs) throughout the film is a big step toward understanding a film.
One tough task is identifying character names. Though we have clues, like other characters addressing them by name (“here’s your ticket, Anna”), this isn’t quite foolproof enough to name characters with confidence. But to be honest, I often have a hard time knowing characters’ names when I watch films. …
This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).
We’ve improved our scene-boundary detection algorithm, and we’ve been able to detect two-character dialogue scenes throughout the film. With these scenes partitioned, we can apply some of the NLP-based and emotional analysis we’ve been developing the past few weeks. Our goal is to extract plot and character information from the dialogue.
Our algorithm identifies scenes that alternate back-and-forth between two characters speaking to one another. We can identify these from shot clusters, or groups of similar frames. …
This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).
We’ve just begun to use all the clues we learn from a film’s visuals, audio, and subtitles together to understand a film. In the last post, we demonstrated how we could look for characters’ self-introductions to build composite facial encodings, and identify them throughout the entire film. We did this somewhat manually, so now it’s time to be more efficient with how we look up and store data. …
This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).
Now that we’re starting to use features from the visual, audio, and subtitle tracks together, we can use them in tandem to gather some important information about movies. This is a small example of what we can accomplish.
We want to create a composite “average” face encoding based on times a character introduces himself with a line like “I’m Ben” or “My name is Ben”. We’ll search for these self-introductions, gather all the times he appears onscreen as he’s introducing himself, calculate his facial encoding (a numerical representation of what he looks like), and then calculate the average of these, for a composite encoding of Ben. …
About