Moviegoer: Character Identification with a Self-Introduction

Tim Lee
3 min readOct 19, 2020

This is part of a series describing the development of Moviegoer, a multi-disciplinary data science project with the lofty goal of teaching machines how to “watch” movies and interpret emotion and antecedents (behavioral cause/effect).

In Plus One (2019), we look for character self-introductions, but this clearly isn’t Ben

Now that we’re starting to use features from the visual, audio, and subtitle tracks together, we can use them in tandem to gather some important information about movies. This is a small example of what we can accomplish.

We want to create a composite “average” face encoding based on times a character introduces himself with a line like “I’m Ben” or “My name is Ben”. We’ll search for these self-introductions, gather all the times he appears onscreen as he’s introducing himself, calculate his facial encoding (a numerical representation of what he looks like), and then calculate the average of these, for a composite encoding of Ben. With this, we can identify him throughout the film.

We can generate a potential list of characters based on the number of times they’re mentioned in the dialogue, or they’re listed as offscreen speakers as a subtitle clarification. Using this list, we can reasonably infer the main characters.

We can infer the main characters in Plus One (2019)

With this list, we can search through the dialogue for any time a character introduces themselves. Once we have the times of these self-introductions, we can calculate the associated (visual) frames that these lines of dialogue are spoken. In this example, Ben introduces himself several times throughout the film, so we have a few frames to check.

In Plus One (2019), we have a clear image of Ben as he introduces himself

From each frame, we collect the facial encodings. However, this is not entirely guaranteed — sometimes Ben’s face will be obscured, or he won’t even be onscreen. So we’ll compare each of these facial encodings to one another. If a majority of facial encodings (roughly) match one another, we’ll consider those to be accurate representations of Ben’s face.

Ben is obscured, and we can’t get a clear image of his face

From there, we can take an average of those encodings, to create a composite representation of what Ben looks like. We can then compare this encoding to all of the faces in the film, identifying when Ben appears onscreen.

Ben is onscreen, but so is Alice. Both of these faces are compared to all the other Ben self-introductions, but luckily a majority of them are clear images of Ben’s face

This exercise required lots of analysis of individual frames, looking for Ben’s faces in various frames. This was somewhat computationally expensive, as we were calculating facial encodings every time we wanted to look for his face. It would be much easier if we could simply calculate each frames’ encodings once, and then save this data to be looked up later. This will be the next focus of effort, serialization of data.

Wanna see more?

--

--

Tim Lee

Unlocking the emotional knowledge hidden within the world of cinema.