© 2020 – 2024 AEA3 WEB | AEAƎ United Kingdom News
AEA3 WEB | AEAƎ United Kingdom News
Image default
IT

AI unicorn Synthesia is adding emotions to your digital double

British AI unicorn Synthesia has launched its new generation of AI avatars, promising truly human-like digital speakers.

Synthesia CEO Victor Riparbelli said traditional AI avatars, including those made by his own company, have always struggled to capture the nuance of human facial expressions and emotions.

The London-based startup, which raised £71.4m last June in a round including chip giant Nvidia’s VC arm, claims to have solved this with its new generation of computerised presenters.

The new Synthesia “Expressive Avatars” build on previous methods, which involve looping footage of a human speaker and switching out the lip movements.

The company said its latest line can match the entire facial structure of a person and accurately determine the intended emotions for each line.

The new method was in part created by training its tech on a massive backlog of videos – the source of these videos, however, was not shared.

And if using one of the many options in Synthesia’s library of stock presenters doesn’t work for you, the firm is now giving users the option to turn themselves into avatars.

Building an AI avatar

UKTN was given a demo of how the company makes bespoke AI avatars – and it’s more manual than one might expect.

The recent explosion in popularity of AI tools has, in part, been fuelled by the allure of instant results from a prompt.

Where seasoned experts have spent years developing the skills required to write, draw, film or edit, one can now simply enter a short prompt and watch as the generative AI model of choice creates whatever a user wants in seconds.

In contrast, Synthesia’s new custom video avatars require a considerable amount of human effort.

In a recently built studio in east London, stocked with high-end recording and editing equipment, the Synthesia team films the input required to turn a user’s likeness into a performing digital double.

The space is not unlike a traditional film set, with green screens draped over the walls, expensive cameras with lengthy cables snaking across the floor and a director instructing a small crew, as well as the actor of the day.

The client is given a script that covers topics with a range of emotional connotations. The words themselves don’t matter too much, as long as the camera can see the movement of the subject’s lips, bones, muscles and skin across a variety of tones.

When the recording session is finished, Synthesia says it will have the new avatar ready in a “few weeks”.

There are alternatives from competitors that create an avatar quickly from uploaded pictures and videos. Synthesia’s message to those put off by the comparatively drawn-out procedure is that good things come to those who wait.

The company told UKTN that while the near-instant creation of an AI likeness is an option, the drop in quality outweighs the convenience.

The lip-syncing in the new avatars is a considerable improvement and they are certainly able to express a handful of different emotions. However, there remains an uncanny valley quality that so often gives AI depictions of humans that slightly off and unnerving look.

It may be that the technology to generate a truly human speaker is still some years away, however, part of the issue may be in the avatar’s real-life counterparts.

The process requires someone to be recorded in a variety of emotional states. The problem is that the film crew is capturing the subject’s best attempt at conveying these emotions within seconds of each other, rather than genuine examples.

A Synthesia staff member admitted occasional feedback from clients included comments that the final product was stiff and robotic. A look at the original footage, however, suggests that corporate executives might not make the best actors.

Tackling AI disinformation

The Synthesia goal of a completely human-like AI avatar might not be here just yet. But as the technology continues to improve, questions around preventing disinformation will become even more important.

Riparbelli said ensuring the ethical use of the tech has always been foundational to the company. It has taken steps to ensure that avatars cannot be made without the subject’s permission.

All recordings for the avatar-creation process are required to begin with the subject giving explicit consent. The company also has strict terms of use, with all videos going through its content moderation process.

Riparbelli said that hate speech, clearly malicious scams and harmful misinformation are unable to get past this process. The Synthesia CEO conceded that moderation becomes difficult when content is less obviously harmful.

Synthesia’s tech hasn’t completely escaped controversy. In March 2023, it was reported that avatars generated using Synthesia to look like TV newsreaders were delivering propaganda in support of authoritarian Venezuelan President Nicolas Máduro.

The AI news reporters celebrated an increase in Venezuelan tourism, downplayed the poverty crisis and claimed: “Venezuelans do not actually feel there is any opposition to the government”.

Synthesia was quick to ban the responsible user, reiterating its strict guidelines for proper use.

This particular case can fall into the category of obvious misinformation, however, more complex misuses of the technology could still pose a risk.

Riparbelli acknowledged this but said there will always be the potential to misuse communication tools, whether it’s a PowerPoint presentation, a hand-written letter or a video presentation built from code wearing a human mask.

The post AI unicorn Synthesia is adding emotions to your digital double appeared first on UKTN.

Related posts

IR35 reforms: TUC demand for government ban on umbrella companies raises eyebrows

AEA3

Century-old insurer uses low code as digital demands increase

AEA3

Security Think Tank: Cyber effectiveness, efficiency key in 2021

AEA3