Law and The Machine

"Quintessential Fair Use" or Wholesale Theft? Inside the AI Music Lawsuits

April 21, 202614:50Law and The Machine

This episode explores the heated legal disputes between AI music generators like Suno and Udio and the RIAA over alleged copyright infringement. It dissects the "fair use" defense, which frames AI training as learning, contrasting it with evidence suggesting AI models may be memorizing and replicating copyrighted material, including producers' watermarks. Listeners will learn about the significant financial stakes and the fundamental conflict between AI developers' data needs and creators' intellectual property rights.

Key Takeaways

Detailed Report

AI music generators, such as Suno and Udio, are currently embroiled in significant lawsuits initiated by the Recording Industry Association of America (RIAA). These lawsuits allege that the AI companies have engaged in mass infringement of copyrighted sound recordings by using them without permission to train their models. The AI developers' core defense hinges on the argument that their training process constitutes "quintessential fair use," likening their models to human students learning from vast amounts of information.

The "Fair Use" Defense Under Scrutiny

AI companies assert that the copying of copyrighted music occurs in a non-public, "back-end" process, where systems learn statistical patterns unseen by human eyes. Suno's CEO, Mikey Shulman, has framed this process as akin to a "kid writing their own rock songs after listening to the genre," attempting to anthropomorphize the AI and downplay the industrial scale of data ingestion. However, Shulman has also acknowledged that training on copyrighted music is "stock standard" practice across the AI industry.

Evidence of Direct Copying and Memorization

Forensic evidence increasingly challenges the narrative of invisible learning. Ed Newton-Rex, formerly of Stability AI and now leading the nonprofit Fairly Trained, has published detailed analyses demonstrating that these models are not just mimicking styles but are capable of "regurgitating" near-exact melodies, harmonies, and chord progressions from well-known songs. Examples include Suno-generated tracks bearing striking resemblances to works by artists like ABBA, Oasis, and Eminem, even when specific artist names are misspelled to bypass filters.

The "Smoking Gun": Audio Watermarks

Perhaps the most compelling evidence against the "invisible back-end" defense is the documented presence of producer audio watermarks (e.g., "CashMoneyAP," "Jason Derulo") in AI-generated music outputs. These embedded audio clips are used by producers to protect their instrumental tracks. Their appearance in AI-generated content serves as direct proof that the models were trained on the original, watermarked audio files, indicating wholesale ingestion and reproduction rather than abstract pattern learning.

Redefining Copying: Memorization vs. Regurgitation

Academic research further reframes the legal discussion. A 2025 *Chicago-Kent Law Review* paper, "The Files are in the Computer: On Copyright, Memorization, and Generative AI," distinguishes between "memorization" and "regurgitation." "Memorization" refers to the process where a model encodes near-exact copies of its training data into its parameters, suggesting the model itself can be considered an infringing "copy." "Regurgitation" is the subsequent generation of a near-exact copy from that memorized data. This research argues that the act of copying happens during training, embedding potential infringement within the AI's architecture, thereby challenging the defense that infringement only occurs at the output stage.

The "Build First, Clear Later" Business Model

The AI sector's prevailing ethos, often fueled by venture capital, has been to "build first, clear later." This approach prioritizes rapid development and scaling of technologies over securing upfront licensing deals for copyrighted material. An early investor in Suno reportedly stated they might not have funded the company if it had pursued licensing from the outset, believing it would stifle innovation. Despite ongoing legal battles, Suno is reportedly negotiating a new funding round that could value the company at over $2 billion, suggesting a financial bet that the cost of settlement will be less than the value generated by this approach. This business model has galvanized a broad coalition of creators under campaigns like "Stealing Isn't Innovation," advocating for responsible AI development through licensing and partnerships.

Government's Conflicting Role

A profound conflict emerges as the U.S. government plays a dual role. While major music publishers are suing AI company Anthropic for massive copyright infringement, alleging the scraping and reproduction of copyrighted song lyrics, another branch of the U.S. government is rapidly adopting Anthropic's Claude model for mission-critical operations within the Department of Defense and the Intelligence Community. This raises a critical question: if Anthropic's foundational models are found liable for "systematic piracy" by a federal judge, will national security interests provide a backdoor pardon for copyright infringement, or will the government delete its new intelligence tools?

Broader Implications for Intellectual Property

The legal battles in the music industry are not isolated; they serve as pivotal test cases for the future of all intellectual property and digital labor in the age of AI. A ruling in favor of AI companies could establish a sweeping legal precedent, potentially creating a loophole that allows any digital content—from software code and journalism to scientific research and visual art—to be freely ingested, copied, and commercialized by tech giants under the guise of an "invisible back-end process." This outcome would fundamentally devalue human creative labor, transferring immense value from individual creators to the companies owning the AI models. The core challenge lies in adapting analog-era legal doctrines, designed for human-scale actions, to the realities of automated systems capable of memorizing and reproducing vast amounts of human creative output.

Show Notes

Works Referenced

  • Recording Industry Association of America (RIAA): The trade organization that represents the U.S. recording industry, currently suing AI music generators like Suno and Udio for copyright infringement.
  • Suno: An AI music generation company currently facing a major copyright infringement lawsuit from the RIAA.
  • Udio: Another AI music generation company targeted by the RIAA in a significant copyright infringement lawsuit.
  • The Files are in the Computer: On Copyright, Memorization, and Generative AI: A 2025 *Chicago-Kent Law Review* paper by Matthew Sag that distinguishes between AI "memorization" and "regurgitation" and argues that the AI model itself can be an infringing copy.
  • Fairly Trained: A non-profit organization founded by Ed Newton-Rex that advocates for ethical AI training practices and identifies AI models trained on licensed data.
  • Mishcon de Reya Generative AI IP Litigation Tracker: A legal firm's resource tracking intellectual property lawsuits related to generative AI across various creative fields.
  • Anthropic: An AI company developing large language models like Claude, currently facing copyright lawsuits from music publishers while also being adopted by U.S. government entities.
  • Stealing Isn't Innovation: A campaign advocating for fair compensation and ethical data sourcing for creators whose work is used to train AI models.
  • Mikey Shulman: CEO of Suno, who has publicly commented on the company's training practices and legal challenges.
  • Ed Newton-Rex: Former VP of Audio at Stability AI and founder of Fairly Trained, known for his analyses debunking claims of AI "invisible learning."

Glossary

  • AI Models: Computer systems designed to learn from data and perform tasks, such as generating music, art, or text.
  • Fair Use: A legal principle in U.S. copyright law allowing limited use of copyrighted material without permission, often for purposes like criticism, comment, news reporting, teaching, scholarship, or research.
  • RIAA: Acronym for the Recording Industry Association of America, a trade group representing the U.S. recording industry.
  • Audio Watermark: A short, distinctive sound embedded by producers in music tracks, often to identify ownership or prevent unauthorized use.
  • Memorization (AI): The process where an AI model stores near-exact copies of its training data within its internal parameters.
  • Regurgitation (AI): When an AI model generates an output that is a near-exact copy of its memorized training data.
  • Large Language Model (LLM): An advanced AI program trained on vast amounts of text to understand, generate, and respond to human language.
  • Intellectual Property (IP): Legal rights that protect creations of the mind, such as artistic works, inventions, and designs.
  • Statutory Damages: Fixed monetary awards set by law for certain legal violations, like copyright infringement, without needing to prove actual financial loss.
  • Transformative Use: A type of fair use where a new work significantly changes the original material's purpose, character, or expression.

Full Transcript

HostThe pitch has been heard many times now: AI models are like human students, learning from the vast oceans of available information, absorbing patterns, and then creating something new and original. That's the narrative AI developers often use to explain how their systems create music, art, or text.
ExpertAnd it’s a narrative that underpins a very aggressive legal defense. Companies like Suno and Udio, which generate AI music, are facing massive lawsuits from the Recording Industry Association of America – the RIAA – for allegedly copying copyrighted songs without permission. Their argument? This training process is "quintessential fair use." They claim the copying happens in a non-public, "back-end" process, unseen by human eyes, just statistical patterns being learned.
HostBut what if that invisible, back-end process isn't just learning, but actually *memorizing*? And what if the output of these systems isn't just new music *inspired* by existing work, but sometimes, a near-exact replica, complete with a producer’s audio watermark embedded in the track?
ExpertThat’s the crux of it. The forensic evidence increasingly suggests these models aren’t just mimicking styles. They are, in fact, swallowing and sometimes spitting out the originals, including those distinctive audio tags that producers use to protect their work. It's a "smoking gun" that completely undermines the "invisible learning" narrative.
HostThe legal landscape around AI and intellectual property is, to put it mildly, in turmoil. Looking at the Mishcon de Reya's tracker of Generative AI IP cases, it's a flood. We're seeing lawsuits against nearly every major AI developer, across every creative field you can imagine – authors, visual artists, 3D modelers, software developers.
ExpertIt is a fundamental conflict. On one side, you have AI developers who need truly vast amounts of data to train their models, an almost insatiable hunger for information. On the other side, you have creators who are fighting for their basic rights to control their work and be compensated for it. These RIAA lawsuits against Suno and Udio are at the epicenter of that battle.
HostAnd the stakes are enormous. The RIAA, representing giants like Sony, UMG, and Warner, is alleging what they call "mass infringement of copyrighted sound recordings copied and exploited without permission." They’re seeking statutory damages that could reach up to $150,000 *per infringed work*. We’re talking potentially billions of dollars.
ExpertExactly. And the defendants, Suno and Udio, are leaning heavily on this "fair use" doctrine. Their core argument is that the copying of copyrighted music is a necessary and transformative part of an "intermediate" or "back-end technological process." The claim is that because no human ever hears these intermediate copies during the training phase, no public-facing infringement has occurred. It’s a sophisticated legal maneuver, trying to cast the creation of the AI model itself as the primary purpose of the copying, fundamentally different from the original works it was trained on.
HostSo, it's not about the output, in their view, it's about the internal mechanics of the machine, which they argue should be protected as "transformative." It’s almost like saying a student copying notes from a textbook is fair use, even if they later publish those notes as their own original work.
ExpertWell, the analogy they prefer is more like a student listening to music to learn to write their own songs. Suno’s CEO, Mikey Shulman, has repeatedly tried to downplay the industrial scale of their data ingestion by framing it that way. He suggested on one podcast that their AI is just like a "kid writing their own rock songs after listening to the genre." It's a very intentional framing.
HostThe anthropomorphic machine. Giving it human qualities to justify the scraping.
ExpertPrecisely. By portraying the AI as a "student" rather than, say, a sophisticated, automated copying machine, the industry hopes to sway both judicial and public opinion towards a more lenient interpretation of fair use. It reframes a process of brute-force data extraction as a creative act of learning and inspiration. But critics are quick to point out that this is a deliberate misrepresentation. One analysis called AI the "world's most efficient plagiarism machine."
HostAnd Shulman himself has acknowledged the reality, stating that training on copyrighted music is "stock standard" practice that "every AI company does." So, the narrative might be one thing, but the internal understanding of the practice is quite another.
ExpertIndeed. And that brings the discussion to the actual technical reality, which really challenges this "invisible back-end" argument. Ed Newton-Rex, who used to be VP of Audio at Stability AI before he resigned over the company's use of copyrighted data, has been a leading voice in debunking this claim. Through his nonprofit, Fairly Trained, he's published detailed analyses of both Suno and Udio's platforms.
HostWhat did he find? Is it really just "learning style," or is there more direct copying happening?
ExpertHis research demonstrates that these models are not merely learning abstract concepts like "style" or "genre." Instead, they are quite capable of "regurgitating" near-exact melodies, harmonies, and chord progressions from well-known songs. He’s provided examples of Suno-generated tracks that bear striking resemblances to songs by artists like ABBA, Oasis, Ed Sheeran, and Eminem.
HostSo, even if the AI companies try to block prompts using specific artist names, you can still get something uncannily similar?
ExpertThat's right. He found that simple misspellings can often bypass these filters, allowing users to generate music that is unmistakably similar to the original copyrighted work. It's not about abstract influence; it's about direct, recognizable imitation.
HostAnd then there's the "smoking gun" mentioned earlier. The audio watermarks. How does that work?
ExpertThis is perhaps the most damning evidence against the "invisible back-end" defense. There have been numerous documented instances of AI music outputs containing producer audio watermarks, like "CashMoneyAP" or even "Jason Derulo." These are short audio clips that producers embed in their instrumental tracks, often to prevent unauthorized use.
HostAnd the AI is spitting them out?
ExpertYes. Their presence in AI-generated music is direct proof that the models were trained on the original, watermarked audio files, and that the copying was not nearly as "invisible" or "transformative" as claimed. It shows that the model has ingested and can reproduce the audio file wholesale, not just an abstract pattern. It’s not just learning *about* a song; it's got the actual song, with its embedded tag, latent within its parameters.
HostSo, it's not just "learning statistics"; it's literally storing and reproducing specific, identifiable elements of copyrighted works. This seems to directly contradict the "fair use" argument.
ExpertIt does. And there's a 2025 *Chicago-Kent Law Review* paper, "The Files are in the Computer: On Copyright, Memorization, and Generative AI" that provides the academic and technical framework for understanding this. It makes a crucial distinction between "memorization" and "regurgitation."
HostCan you break that down?
Expert"Memorization" occurs during the training phase. It's the process by which the model encodes near-exact copies of its training data into its parameters. The paper argues that when a model has memorized training data, the model itself can be considered a "copy" in the copyright sense.
HostSo, the *model* becomes the infringing copy, not just its output?
ExpertExactly. And "regurgitation" is what happens at the output stage – the generation of a near-exact copy of that memorized training data, regardless of the user's intent. AI companies have tried to argue that infringement only occurs at the output stage, and that the model itself is a neutral tool. But this research challenges this by asserting that the act of copying happens during training, when the copyrighted data becomes mathematically latent within the model. It argues, provocatively, that the "files are in the computer," not just in the outputs. The potential infringement is embedded in the very architecture of the AI.
HostThat's a significant reframing of where the legal line might be drawn. It shifts the focus from what the user does with the AI to how the AI itself was built.
ExpertIt's a critical distinction, because it means the "fair use" defense, which often hinges on the *purpose* and *character* of the use, becomes much harder to sustain if the model itself is considered an infringing copy. It attacks the very foundation of the AI companies' argument.
HostThis "build first, clear later" strategy, as some call it, sounds like a very specific business model. It's not just a legal tactic.
ExpertIt absolutely is. It's a core component of the venture capital-fueled business model that has dominated the AI sector. The prevailing ethos has been to "build first, clear later." An early investor in Suno admitted to *Rolling Stone* that they likely wouldn't have funded the company if it had pursued licensing deals from the outset, arguing that it would have stifled innovation.
HostSo, copyright law isn't seen as a property right to be respected, but more as a barrier to be circumvented in the race to develop and scale new technologies.
ExpertThat's the implication. And it's been incredibly lucrative. Suno, despite these ongoing legal battles, is reportedly negotiating a new funding round that would value the company at over $2 billion. Even their CEO, Mikey Shulman, conceded that if the lawsuits are successful, "the company's not dead," though he admitted it wouldn't be "good for us." The financial bet is clearly that the legal system won't catch up fast enough, or that the cost of settlement will be less than the value created by ignoring IP in the first place.
HostThis "innovation at all costs" mentality has certainly created a lot of wealth for some, but it has also galvanized a broad coalition of creators in opposition.
ExpertAbsolutely. The "Stealing Isn't Innovation" campaign has been launched. Its argument is straightforward: AI companies are using creators' work without authorization to build platforms that directly compete with them.
HostAnd their central message is that a better way exists through licensing deals and partnerships, that responsible AI development *is* possible.
ExpertExactly.
HostNow, the discussion turns to the paradoxical role of the U.S. government as both a potential regulator of AI and a major customer. This involves situations where the government's procurement and national security interests can inadvertently undermine its role in upholding the rule of law, including intellectual property rights.
ExpertThe focus this time is on the ongoing lawsuit by major music publishers against the AI company Anthropic. The suit alleges that Anthropic engaged in massive copyright infringement by scraping copyrighted song lyrics to train its large language model, Claude. The publishers claim Claude not only copies lyrics during training but also illegally reproduces them in its outputs.
HostSo, while the U.S. federal court system is actively adjudicating whether Anthropic's foundational models are built on a bedrock of mass copyright infringement, another branch of the U.S. government is moving in the exact opposite direction.
ExpertThat's the profound conflict. The Department of Defense and the Intelligence Community are rapidly adopting and integrating Anthropic's Claude model for mission-critical operations. This means the government is functionally subsidizing and legitimizing a company that may soon be found liable for what the publishers call "systematic piracy."
HostIt raises a very pointed, and unresolved question for consideration: If a federal judge rules that Anthropic’s foundational models are built on wholesale theft, will the Pentagon delete its new favorite intelligence tool, or does "national security" provide a backdoor pardon for copyright infringement?
HostZooming out from these specific cases, these legal battles in the music industry aren't just about music, are they?
ExpertNot at all. These are test cases for the future of all intellectual property and digital labor in the age of AI. If the "quintessential fair use" defense put forward by Suno and other AI companies were to succeed, it would establish a sweeping legal precedent with profound implications far beyond music.
HostSo, a ruling in favor of the AI companies could essentially create a massive legal loophole.
ExpertThat's precisely right. It would effectively mean that any digital content – whether it's software code, investigative journalism, scientific research, medical analysis, or visual art – could be freely ingested, copied, and commercialized by tech giants, as long as the copying happens within this so-called "invisible back-end process."
HostAnd that would fundamentally devalue the labor of human creators across the board.
ExpertIt would. It transfers that value from the individual human labor to the companies that own the AI models, effectively granting them a free pass to exploit existing copyrighted works for massive profit. The legal framework, as it stands, was designed for a different era.
HostSo, the core thesis here is that copyright law, and indeed much of the legal framework, was designed with human actors and human-scale actions in mind. It wasn't written for machines capable of memorizing and indexing the entire history of human creative output and regurgitating it on command.
ExpertThat is the central challenge. The current wave of litigation forces a confrontation between analog-era legal doctrines and the realities of digital, automated systems. The outcome of these cases will determine whether the law adapts to protect human creativity and labor, or whether it will be reinterpreted to sanction a new form of industrial-scale, automated appropriation.
HostUltimately, it comes down to a fundamental question for society: who gets to own the informational and cultural commons in an age where machines can learn everything?