Policy

Courts Split on Whether AI Training Violates Copyright Law

Federal judges have reached opposite conclusions on fair use, leaving creators who trained generative models without compensation in legal limbo.

Omega Editorial· July 2, 2026· 3 min read

Key takeaways

Federal judges have issued contradictory rulings on whether training AI models on copyrighted work constitutes fair use, creating legal uncertainty for the entire generative AI industry.
Researchers traced how copyrighted books and images entered AI training through datasets like Books3 and LAION-5B, often sourced from pirate sites holding millions of titles.
Music creators face projected losses of €10 billion over five years, while freelancers in AI-exposed jobs saw earnings drop 5% after ChatGPT's release.
OpenAI's promised Media Manager tool missed its 2025 deadline, and current opt-out mechanisms require manual submission of individual files with descriptions.
A $1.5 billion settlement in Bartz v. Anthropic paid rightsholders roughly $3,000 per pirated title, but the core fair use question remains unresolved in ongoing litigation.

Competing rulings create uncertainty for creative professionals

Federal courts have issued contradictory rulings on whether artificial intelligence companies can legally train their models on copyrighted creative work, setting up a legal conflict that will determine whether writers, artists, and other creators receive compensation for powering generative AI systems.

In Thomson Reuters v. Ross Intelligence, a judge ruled that training a legal research tool on more than 2,000 copyrighted case summaries constituted copyright infringement and rejected the defendant's fair use defense. But in Bartz v. Anthropic, a different judge reached the opposite conclusion, calling the training of Claude models on copyrighted books "exceedingly transformative" and protected under fair use doctrine.

The split comes as multiple consolidated lawsuits work through federal courts. The Authors Guild and 17 writers including John Grisham and George R.R. Martin are suing OpenAI and Microsoft over GPT models trained on their fiction. Visual artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed suit against Stability AI, Midjourney, DeviantArt, and Runway AI over image generators. The New York Times is pursuing claims against OpenAI and Microsoft for training on millions of articles without licensing agreements.

Why it matters

The legal uncertainty arrives as creative professionals face measurable economic harm. The International Confederation of Societies of Authors and Composers projects music creators will lose €10 billion cumulatively over five years, with 24% of revenue at risk by 2028. Film and television creators face a projected €12 billion loss with 21% of revenue exposure. Freelancers in AI-exposed jobs on Upwork saw a 2% drop in contracts and 5% drop in earnings after ChatGPT's release, with experienced workers charging higher rates hit hardest.

How copyrighted work entered training pipelines

Researchers have documented the supply chain that fed creative work into AI systems. The Pile, an 886-gigabyte text collection assembled by EleutherAI in 2020, became one of the two most widely used training sets for large language models. One component, Books3, contained roughly 191,000 copyrighted titles pulled from pirate site Bibliotik. LAION-5B, a dataset of 5.58 billion image-text pairs scraped from the open web, trained Stable Diffusion and Midjourney. The Authors Guild documented that Meta copied books from Library Genesis, an illegal repository holding more than 7.5 million titles.

A Danish anti-piracy group forced Books3 offline in July 2023, but copies had already spread across the internet.

Opt-out tools remain inadequate

OpenAI announced Media Manager in May 2024, promising creators control over whether their work enters training by 2025. The company missed that deadline, and the tool remains in development with no public release date. OpenAI's current opt-out for images requires submitting each file individually with written descriptions, an impractical system for creators with extensive portfolios.

Website owners can block future AI crawlers using robots.txt files, but the mechanism does nothing for data already collected. The European Union's copyright rules include a formal opt-out for text-and-data mining, but a European Parliament study found the mechanism introduces substantial complexity and will likely prove unworkable.

The Bartz v. Anthropic case produced a $1.5 billion settlement in August 2025 covering roughly 500,000 pirated works, with affected rightsholders receiving approximately $3,000 per title before legal fees. The U.S. Copyright Office issued a report in May 2025 concluding that using vast troves of copyrighted work to build competing products, especially when obtained illegally, "goes beyond established fair use boundaries," though the report carries no binding legal authority.

These details were first reported by AI Watch.

#copyright law#generative ai#fair use#ai training data#creative economy#legal disputes

This is an original analysis by the Omega editorial team. Source reporting: AI Watch.

Want systems like this working for your business?

Book a Call

Courts Split on Whether AI Training Violates Copyright Law

Competing rulings create uncertainty for creative professionals

Why it matters

How copyrighted work entered training pipelines

Opt-out tools remain inadequate

More in Policy

Manufacturing's 40-Year Collapse Previews AI's White-Collar Threat

OpenAI Proposes 5% US Stake Amid Public AI Backlash

U.S. Needs Better Data Infrastructure to Track AI Adoption