How the MOCAP Tools Behind Avengers’ Thanos Brought “The Quarry” to Life

Ted Raimi in
Ted Raimi in “The Quarry”. (Images courtesy of 2K Games)
Placeholder while loading article actions

Performance capture for video games is often a sterile experience. An actor attaches a helmet camera and performs his lines in a sound booth, then that data is transferred to a game engine where it is placed on a game skeleton for artists to review and clean. With “The Quarry,” the process was more like filming a play, according to Aruna Inversin, creative director and visual effects supervisor for Digital Domain, the VFX studio that partnered with developer Supermassive Games on their spiritual successor to ” Until Dawn”.

The Oscar-winning studio has produced visual effects for films like ‘Titanic’, ‘The Curious Case of Benjamin Button’ and several Marvel films. To create the photorealistic characters seen in “The Quarry,” he used the AI ​​Masquerade facial capture system, which was developed to replicate Josh Brolin’s likeness to his character Thanos in “Avengers: Infinity War.” Masquerade was originally designed to do one thing: take the performance of a front camera and translate it into a digital mesh that could then be rendered into a movie. For “The Quarry”, the VFX team needed something that could track actors’ movements and facial expressions and create digital characters that could be edited in real time. So they built Masquerade 2.0.

This new technology has increased production tenfold. Actors would act out their scenes, Digital Domain would upload all of the performances to a computer with their body performances, head performances, and encoded and synchronized audio time, then send them to Supermassive for in-game engine review. The developer would provide feedback and the next day, “The Quarry” director Will Byles, who also directed “Until Dawn,” could watch the footage to determine if a particular performance needed a reshoot. In total, Digital Domain said it rendered 250 million frames for “The Quarry” – far more than for a typical movie.

Like “Until Dawn,” “The Quarry” is an interactive take on classic slasher and horror movies. A group of camp counselors get stuck overnight after camp ends and are chased by werewolves. whether they survive the night is determined by the player, who makes decisions like whether to investigate a mysterious noise or take an ill-advised detour to go swimming.

Transposing the actors’ performances onto their in-game character models involved a multi-step process that Inversin called the “DD pipeline”. First, the team performed facial scans of each cast member to create a library of face shapes that Supermassive could reference to create their in-game characters. Then, they filmed their performances on the stage of performance capture of Digital Domain in Los Angeles. Each day, the actors donned full motion capture suits and facial devices to record their expressions. Their faces were covered in dots, which cameras tracked with their suit markers to triangulate and map their movements onto a virtual skeleton using Masquerade 2.0.

Review: ‘The Quarry’ is a standout slasher that only takes a few wrong turns

To properly calibrate the equipment, the actors performed a series of motion tests, ensuring data was tracked consistently across shoots, said Paul Pianezza, senior producer at Digital Domain. Maintaining this consistency for the actors’ faces was a little trickier: the team built molds of each actor’s face, dotted the mold, and drilled holes in it to create a physical model which they then used to ensure that points were placed in similar places. throughout filming. If some of the markers were blocked or missing when fired, Masquerade’s AI could automatically fill in the blanks.

There’s also a human element to the DD pipeline, Inversin said. The artists analyzed the images and identified the actor’s expressions – a smile, a frown, a cry, etc. – to note how the dots moved in these cases, which Inversin compared to setting keyframes for each character.

The team used the data recorded throughout these shoots, facial scans and range of motion tests to create a library of each actor’s idiosyncrasies which could then be used to train their AI so the computer could read it correctly. their movements and expressions. As Inversin explained: “Machines are only as good as what you can feed [them] in terms of information.

The team also worked with Supermassive to align objects on the board with their in-game locations using the scene’s grid layout.

“So if someone opens the door and looks through, they move that door and we capture that data,” Pianezza said.

Filming began around the same time as the covid-19 pandemic, which limited the number of actors who could be in the studio at one time and added another layer of choreography to the process. Most group scenes were filmed in segments with two or three actors over multiple shoots. The hearth scene, for example, involved three groups of two counselors each shot at different times.

The Hidden World and Overlooked Issues of Acting in Video Games

During filming, the actors didn’t have to worry about containing their movements when running, jumping or being attacked by werewolves, Inversin said. All performances of “The Quarry” – except for some stunt sequences – were performed by the actors on the motion-capture stage, exactly as they appear in-game.

“It’s Ted Raimi’s performance and you see it in the game and it’s his lip quivering and it’s his look around you, you know, it’s him,” Inversin said. “An animator didn’t go and fix that. You know, that’s what he did on stage.

Digital Domain had to tackle two common issues in motion capture to make this process possible: headset stabilization and eye tracking. The team heavily modified open-source eye-tracking software, Gaze ML, over the course of three years to improve the accuracy and appearance of digital eyes. New machine learning algorithms added to Masquerade 2.0 allowed it to analyze capture footage and compensate for any jostle as the actors moved.

Like in the movies, Masquerade uses these capture sequences to create photorealistic animations of an actor’s performance. Unlike movies, however, video games must render these animations in real time in response to player actions – a monumental computational task since each actor’s face is made up of over a thousand unique blending shapes or facial movements corresponding to different phrases.

“When you try to replicate this performance, at runtime there are so many different complex shapes that the real-time system doesn’t know what to pull in the appropriate amount of time it needs,” said Invert.

The solution: Chatterbox, a tool developed by the company’s Digital Human Group to streamline the process of analyzing and rendering live facial expressions. Each actor’s facial expression library and keyframes identified by Digital Domain artists are fed into Chatterbox, which then uses machine learning algorithms to automatically track the points on the actors’ faces in each shot and calculate the best possible options for modifying the face. expressions without sacrificing quality.

“So to make it more optimized for a game engine, we’re taking those thousand different face shapes and decimating them,” Inversin said. “We reduce them to idealized forms based on performance. And what that means is that if a character is just talking, they don’t need some form of blending to, you know, scream, right? »

In other words, Chatterbox does with facial expressions what video games already do with other in-game assets – rendering them as needed based on what the player is doing at the time.

In total, after 42 days of shooting on the motion capture stage, Digital Domain had 32 hours of captured footage to put into the game. % – had to be edited by animators in post-production. Traditionally, animators had to manually touch up expressions throughout the footage, but between Chatterbox and advancements in Masquerade 2.o, the team only needed to “fix a few issues where the machine got angry,” said Pianozza. It was crucial for a project of this magnitude.

“You can’t brutally force anything at 30,” added Pianezza. “The system has to work.

The result was in-game performances that seemed tailored to each actor. Inversin said he thought being on stage and being able to move freely and express his emotions heightened the actors’ performance. And that has promising implications for motion capture in the entertainment industry.

“Having the ability for actors and directors to capture things in the mocap scene and to know that their performance is translated effectively into a video game with the nuance of their face and their performance, I think that’s a big selling point for anyone who wants to see these experiments and direct and consume this media. Because it’s like Ted Raimi’s walk as he walks through the prison cell. This is Lance Henriksen smacking his lips. That’s what he does. We didn’t add that, that’s his natural performance.