This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

In Rogue One (2016), Jyn declares (in first-person), her interpretation of events

We can identify the beginnings and ends of scenes, which means we can isolate the dialogue and analyze them as a single, self-contained conversation. This is most easily done by looking at the subtitles, which are ground-truth transcriptions of the dialogue — no audio recognition required. We’ve previously designed a system to load a film’s .srt subtitle file into pandas dataframes using the Python library pysrt…


This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

Without any structure, a film is just a collection of a few thousand frames of images and a very long audio track. Conversations bleed into one another, characters appear and disappear without reason, and we teleport from one location to the next. We can begin to organize a film by dividing it into individual scenes. We’ll use an example of Lost in Translation (2003).

To start…


This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

The Moviegoer prototype is complete, the unveiling includes a technical demonstration as well as two background articles of why movies are key to training better emotional AI. With a functional prototype complete, it’s time to think of what comes next in the project. Here are six areas of improvement, and their context within the larger impacts of Moviegoer.

Improving Subtitle Processing

Subtitles are key for identifying dialogue, and they…


Moviegoer

Movies demonstrate emotional communication atop valuable societal context. They are the perfect dataset for emotional AI models.

What can movies teach AI models about how people communicate and express emotion? (image by author)

Smart devices, digital assistants, and service chatbots are becoming ubiquitous, but they don’t yet have the emotional capacity they need to fully understand how we communicate. Emotional AI models should be able to detect specific emotions and act with empathy, understand societal norms, and recognize specific communicational nuances, like irony and humor. Datasets of emotional data exist, but are inadequate in terms of size and realistic context. Fortunately, there’s a dataset which can satisfy all these requirements: movies.

In a previous post, I demonstrated the prototype of Moviegoer, a tool which breaks films into structured data. This essentially allows a…


Moviegoer

Image by author

Film is the key to teaching emotion to AI. As I argued in a previous post, machines will learn emotions by “watching” movies. By turning cinema into structured data, we can unlock a trove of emotional data which can then be used to train AI models. Moviegoer is a tool that can do this automatically, turning movies into self-labeling data. The prototype is complete, and can automatically identify scenes, discover key dialogue, and track characters and their emotions throughout a film.

The Moviegoer prototype is freely available for anyone to replicate the results. The examples below were taken straight from…


Moviegoer

there’s drama in every frame (image by author)

In the near-future, we’ll be surrounded by AI entities who act just like humans. They’ll be able to maintain conversations with the perfect amount of hesitations, slang, and cadence to be indistinguishable from a person. They’ll be able to interact with us and know exactly how we’re feeling, analyzing our facial expressions, body language, word choice, voice tone, and eye movements. This is all technology that exists today, but it’s lacking the key component that gives them this personal connection. Luckily, we already have the missing piece. We will teach robots emotion by having them watch movies.

The Moviegoer project…


This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

In Ford v Ferrari (2019), we infer from his face that Matt Damon’s character is white, male, and 39 years old. The actor is actually older, so we may have been fooled by his movie star looks.

With scenes and their details identified, we turn our attention to characters. Tracking characters (along with their screentime, dialogue, and emotional ups-and-downs) throughout the film is a big step toward understanding a film.

One tough task is identifying character names. Though we have clues, like other characters addressing them by name (“here’s your ticket, Anna”), this isn’t quite foolproof enough to name characters with confidence. But…


This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

In Ford v Ferrari (2019), the dialogue of a two-character conversation scene contains a response to the direct-question “Who are you?”

We’ve improved our scene-boundary detection algorithm, and we’ve been able to detect two-character dialogue scenes throughout the film. With these scenes partitioned, we can apply some of the NLP-based and emotional analysis we’ve been developing the past few weeks. Our goal is to extract plot and character information from the dialogue.

Our algorithm identifies scenes that alternate back-and-forth between two characters speaking to one another. We…


This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

The majority of the action In Knives Out (2019) takes place at an estate. There are often establishing shots of the estate, so we can see if these are reused throughout the film

We’ve just begun to use all the clues we learn from a film’s visuals, audio, and subtitles together to understand a film. In the last post, we demonstrated how we could look for characters’ self-introductions to build composite facial encodings, and identify them throughout the entire film. We did this somewhat manually, so now it’s time to be more efficient with how we look up and…


This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

In Plus One (2019), we look for character self-introductions, but this clearly isn’t Ben

Now that we’re starting to use features from the visual, audio, and subtitle tracks together, we can use them in tandem to gather some important information about movies. This is a small example of what we can accomplish.

We want to create a composite “average” face encoding based on times a character introduces himself with a line like “I’m Ben” or “My name is Ben”. We’ll…

Tim Lee

Unlocking the emotional knowledge hidden within the world of cinema.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store