In the music video for their latest single “DTDTGMGN,” members of South Korean band Eternity dance in perfect time to a bubblegum pop beat in a neon pink room decorated with school uniforms and celebrity posters. The dizzying camera movements and quick-paced scene changes are typical of the K-pop genre that has become a global phenomenon in recent years.
“Hold me tight, lean on my shoulder,” one member sings.
But the group’s fans cannot do either of those things, physically speaking — because none of Eternity’s 11 bandmates are real.
Built using AI technology, Eternity is one of the latest South Korean acts pushing the boundary between real and virtual entertainment. And the group’s creator and management company, Pulse9, believes that computer-generated stars have a significant edge over their real-life counterparts.
As with most popular K-pop groups, Eternity’s members include a group leader, vocalists, rappers and dancers. But one member in particular, Zae-in, can do it all. “Zae-in has (a combination of) characteristics that most human artists would be hard-stretched to pull off,” said the firm’s CEO Park Ji-eun. “She can sing very well, rap very well and she can be creative as a (fashion) designer. And she’s a natural-born actress.”
With the help of real-time face-swapping technology and an AI-generated voice, Zae-in is played by 10 different people, including actors, singers and dancers, who lend their talents to her avatar as needed. In one video posted on Zae-in’s Instagram account, for example, she deftly rides a longboard beside Seoul’s Han River. For the video, her creators filmed the stunt with a stand-in whose face was later swapped out for Zae-in.
“There is the pop music Zae-in, actress Zae-in, reporter Zae-in,” the virtual star told CNN in a Zoom interview, during which she was played by an actor (though Pulse9 said that its AI software can recreate Zae-in’s voice and generate speech without human input). “Various Zae-ins exist, so our unique advantage is that we can do what one person cannot.”
“The only thing we cannot do is sign an autograph,” she added, her eyes smiling.
Real-time face-swapping and early forms of “deepfake” technology have existed for almost a decade. In that time, virtual influencers like Lil Miquela and South Korea’s own Rozy have gained large online followings in the fashion and music spaces.
But the latest advancements in AI-generated video and audio are making it easier for companies like Pulse9 to create convincing virtual characters, while advances in generative AI programs like ChatGPT are eliminating the need for stars like Zae-in to be “played” by anyone at all.
How they were made
The team at Pulse9 began the process of bringing eternity to “life” by designing its members’ looks. Park’s team generated 101 unique digital faces, inspired by existing Korean celebrities, before dividing them into four categories — dubbed “sexy,” “cute,” “intelligent” and “innocent” — and narrowing the selection down to 11 via an online poll.
Using deepfake and motion capture technology, Pulse9 then produced the band’s first music video (to accompany the ironically titled track “I’m Real”) in 2021. Reaction to the video was mixed. Many social media users felt the band members’ facial expressions looked unnatural and divorced from their bodies, while others cited the “uncanny valley” effect, whereby viewers are discomforted by things that resemble humans but aren’t.
But five months later, when Pulse9 released Eternity’s next music video, “No Filter,” followers noted that the group was almost indistinguishable from the real thing. “I’m surprised how realistic they sound, and the visuals look more realistic than the debut song,” reads the most liked comment on YouTube.
“DTDTGMGN,” which came out last October, represented another huge leap in quality. “Until last year, the face was the only virtual part of the performer, but today we can make whole full body (images) based on AI technology,” Park said, adding that “No Filter” was also composed using AI (although humans then edited the song and wrote lyrics).
SM Entertainment, the agency behind K-pop megastars like Super Junior, Girls’ Generation and Exo, among others, has also been experimenting with AI. In 2020, the company debuted Aespa, a group comprising four human artists and their virtual counterparts. After the band’s launch, SM’s founder Lee Soo-man declared that the future of entertainment is “celebrity and robot.”
Elsewhere, BTS’ management company Hybe used AI to tweak one of its singer’s pronunciation and pitch, before releasing one of his songs in six languages — Korean, English, Spanish, Chinese, Japanese and Vietnamese.
And earlier this year, Metaverse Entertainment, an agency formed by video game publisher Netmarble and SM Entertainment’s parent company Kakao Entertainment, debuted a virtual girl group, MAVE. Unlike Eternity, which relies on straightforward face swaps, MAVE’s four members were generated from scratch using CGI.
Metaverse Entertainment used AI technology to generate faces, merging the features and hairstyles deemed most desirable into four final characters. The company employed motion capture technology to record real-life K-pop performances that a deep learning program then used to animate music videos. The group’s vocals are part-human and part-AI-generated, said Metaverse Entertainment’s chief technical officer, Kang Sung-ku.
“Our goal is to create fully AI-driven virtual humans (that are) really believable,” he told CNN.
The music video for MAVE’s first single, “Pandora,” has already racked up 25 million views on YouTube since its release in January.
The future of K-pop?
Virtual popstars may enjoy an advantage over human ones when it comes to engaging with their fanbases. Advancements in AI language programs could make it possible for fans to “speak” with — and even build up their own personal relationship with — their idols, Kang said.
“They will remember what we talked about yesterday and can continue talking about things from (previous conversations),” he said, drawing a comparison between his firm’s AI creations and iPhone’s virtual assistant, Siri. “They will remember you and know about you — and they will talk (with you) based on that information.”
Metaverse Entertainment is already experimenting with this idea via apps that allow users to directly interact with MAVE. The band’s members currently speak Korean, English, French and Indonesian — indicative of key fanbases and target markets — though they could theoretically speak any number of languages.
“There are some things I cannot tell (even my best friends), but maybe I could tell a personalized AI, which can comfort me and care about me,” Kang said. “I think it’ll be good for everybody.”
The development could certainly be good for companies like Kang’s. The K-pop industry is underpinned by a laborious trainee system, in which hopefuls are molded from a young age, often going through years of costly dance and music classes before debuting. (According to the Wall Street Journal, SM Entertainment spent approximately $3 million taking Girls Generation to market in 2007.) Agencies must invest in lessons, studio space, styling and sometimes even cosmetic procedures for potential band members — costs not required for virtual talents.
Furthermore, computer-generated pop stars will never fall ill, age or become embroiled in contract disputes or tabloid scandals. But there is little chance of major agencies replacing their real-life bands anytime soon, said Kang, acknowledging that some quarters of K-pop’s fandom have reacted badly to AI groups.
“Criticism was about replacing current idols with virtual ones — they thought current idols might lose their jobs,” Kang said, adding that some fans will always prefer human artists.
Eternity’s creator Park agreed, comparing the emergence of AI to the invention of photography in the 19th century: “Many painters were afraid of (the new) technology but today, we still have painters. I think BTS will never be afraid of virtual idols — they’re just new content.”