Enlarge / This is not an article about Daft Punk remixing or mashing up Jay-Z classics. It’s a photo illustration about machine learning models being applied to famous people’s voices. (But, hey, we’re ready for that Daft Punk + Jay-Z collab over here.) (credit: Getty Images / Sam Machkovech)
In late April, audio clips surfaced that appeared to capture Jay-Z rapping several unexpected texts. Did you ever imagine you’d hear Jay-Z do Shakespeare’s “To Be, Or Not to Be” soliloquy from Hamlet? How about Billy Joel’s “We Didn’t Start the Fire,” or a decade-old 4chan meme? All of these unlikely recitations were, of course, fake: “entirely computer-generated using a text-to-speech model trained on the speech patterns of Jay-Z,” according to a YouTube description. More specifically, they were deepfakes.
“Deepfakes” are super-realistic videos, photos, or audio falsified through sophisticated artificial intelligence. The better-known deepfakes are probably videos, which can be as silly as Green Day frontman Billie Joe Armstrong’s face superimposed on Will Ferrell’s, or as disturbing as non-consensual porn and political disinformation. But audio deepfakes— AI-generated imitations of human voices—are possible, too. Two days after the Jay-Z YouTubes were posted, they were removed due to a copyright claim. But just as quickly, they returned. The takedowns may have been a first attempt to challenge audio deepfake makers, but musicians and fans could potentially be grappling with the weird consequences of machine-generated voice manipulations long into the future.
Here’s a breakdown of Jay-Z’s copyright dispute, the laws around audio deepfakes, and what all this could mean in the years to come.