Moviegoer — Next Steps

3 min readDec 7, 2020

This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

The Moviegoer prototype is complete, the unveiling includes a technical demonstration as well as two background articles of why movies are key to training better emotional AI. With a functional prototype complete, it’s time to think of what comes next in the project. Here are six areas of improvement, and their context within the larger impacts of Moviegoer.

Improving Subtitle Processing

Subtitles are key for identifying dialogue, and they provide descriptions of non-dialogue sound effects and audio. However, there isn’t a set standard for subtitle formatting, and the Moviegoer prototype only covers a single format. Improving the subtitle processing will allow for the “watching” of many more movies.

Identifying Significant Scenes

Currently we identify two-character dialogue scenes. We can look for scenes containing the highest amount of emotional data, by looking for emotionally-charged scenes. This can be accomplished by finding scenes with lots of profanity; a fast/argumentative conversation speed; or loud dialogue. Additionally, we can look for hallmarks of important scenes, such as long takes. These types of cinematography features can identity important two-character dialogues, as well as non-dialogue set pieces — which may still contain important character information.

Understanding Scene Context

There are arguably a finite number of possibilities for a scene’s location and scenario. Characters sharing a meal at a restaurant. Characters saying goodbye at an airport. Characters walking down the street and talking. There are specific pieces of dialogue, background sound effects, or cinematography aspects that may indicate one of these scenarios or locations. We may want to hard-code a check for these features to help us understand what’s happening in the scene.

Dialogue Attribution

Attributing individual lines of dialogue to the characters who speak them would tremendously help with comprehending the plot and events of a film. However, it’s still very experimental for now, and wasn’t reliable enough to make it into the prototype. Currently, we’re looking for characters onscreen who have their mouths open, and using this to attribute dialogue. We may be able to do a better job of diarizing voice audio from two-person conversations, and using this to attribute dialogue, as well as saving voice encodings for each character.

Better NLP Implementation

Since we have individual scenes isolated, we also have a full conversation isolated. A scene’s dialogue represents an end-to-end conversation, and we can do a better job of identifying key words. We can use dictionaries for positive/negative sentiment, and more effectively use Named Entity Recognition to identify conversation topics. NLP is great for parsing entire sentences and conversations, and we can do more research to see what we can learn from this (even with imperfect dialogue attribution, from the above paragraph).

Emotional Modeling POC

The point of Moviegoer, of course, is to break movies into structured data for training of emotional AI models. Though we’re still working on turning movies into data, a proof-of-concept would help demonstrate the eventual capabilities and provide perspective on the importance of emotional AI modeling. The most prominent idea for a POC involves studying facial reactions/changes resulting from specific lines of dialogue.

Wanna see more?

Repository: Moviegoer
Prototype Demo: Turning Movies into Emotional Data
Background: Teaching Emotion to AI by Having Them Watch Movies