Real or Avatar? The faces behind a video AI revolution

Artificial intelligence turns a single photo into a “real” presenter

Jenny, the presenter of a corporate video, explains the onboarding process to new employees. She is professional and engaging. But she’s not real.

It’s a photorealistic avatar, created from a five-minute video session with a real person. This encounter provided enough data for the artificial intelligence to mimic his voice and movements so that what you see looks exactly like him. But this is not the case.

DID, a tech startup in Israel, specializes in “hyper-real AI presenters”. Type as much text as you want and he or she will present it as a real person, but at a fraction of the hiring price.

Choosing a face, voice and more for a video presenter is easy. Courtesy

A live actor, in a studio with a cameraman, lights and sound recording, would cost $1,000 per minute. Avatars are an attractive and affordable alternative.

Anyone who’s been through a “Death by Powerpoint” presentation about security, compliance, or a new company initiative will tell you that a human — or near-human — presenter is more likely to keep you awake than another chart, photo or organizational chart.

“With a human face, people are more likely to engage with information, watch the video, follow the course, and absorb the information,” said Gil Perry, CEO and co-founder of D-ID. , at NoCamels.

“Our technology takes the headache out of corporate video production to effortlessly create high-quality, cost-effective, professional videos in any language with the click of a button.”

D-ID has developed technology that enables its customers to create avatar-led training videos quickly, cost-effectively and efficiently.

As well as Jenny, it has a whole cast of ‘actors’ – choose whichever one you want, then select one of 270 voices, 119 languages ​​and a range of accents.

There’s even a range of presentation styles – angry, happy, sad, excited, hopeful, customer service, newsletter.

The sophistication of the technology is remarkable. During a Zoom session, Perry photographed me, pasted in a paragraph of text describing the NoCamels website, and brought the still image to life in less than a minute.

The voice isn’t mine (it’s a guy named Eric) so it wouldn’t fool my friends or family, but the AI ​​added a whole bunch of facial movements and even filled in the parts missing from the background that are exposed when my head moves. It’s all from a low quality still image and you can try it yourself by uploading an image here.

The company – founded in 2017 by three IDF intelligence corps veterans – started life in facial recognition, using algorithms to ‘anonymize’ photographs (hence the name D-ID) with small tweaks so that they remain recognizable to humans, but would fool the biometric readers used by Facebook and many others.

The Tel Aviv-based company has become a world leader in deep learning, computer vision, image processing and computational photography.

But Perry says they then realized the technology they had developed could be applied elsewhere – to create narrated content.

“We can drastically reduce the cost of video productions, we can increase the value of their existing boring assets, and we can create personalized and targeted content at scale,” he told NoCamels.

“The biggest problem is the completion rate. People just don’t watch. They don’t read everything about onboarding, cyber compliance, sexual harassment. They just press next, next, next next, to complete the course.

“We mainly sell to the training and learning and development departments of large companies. We help them create content that is more engaging, better understood and better remembered.

“Many times the managers who appear in the videos as presenters just aren’t good actors. And employees are uncomfortable looking at them.

One of the benefits of using avatars rather than people is that it’s much easier to update or add to the script. Giving Jenny a little more text to read again beats hiring a cast, crew, and studio.

“D-ID’s work has already generated more than 100 million videos,” says Perry. The company now offers a self-service version of its Creative Reality platform to small businesses and says the potential for growth is huge.

It has strict policies in place to prevent misuse of its technology and guarantees that all of its videos display an AI symbol to indicate that they are computer generated.

The screen the user sees to create their video presenter. Courtesy

Besides corporate training videos, D-ID, which has 45 employees and raised $47 million from investors, finds other uses for its technology, which can be expanded without limit.

The CEO of one of his corporate clients was able to send a personalized video to all subscribers – well over 100 million of them – all different, all addressing the subscriber by name, from a still photo.

The future offers even more opportunities. “We are now working on real-time streaming,” says Perry.

“So you’ll be able to make video calls but without the camera. I could be in the kitchen washing dishes or on the beach, but I could choose a better photo of myself and present it to you with the camera off.