Advertisement

Copyright office criticizes AI ‘fair use’ before director’s dismissal 

The register of copyrights cast serious doubt on whether AI companies could legally train their models on copyrighted material. The White House fired her the next day. 
Listen to this article
0:00
Learn more. This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.
A man holds a flag that reads "Shame" outside the Library of Congress on May 12, 2025 in Washington, D.C. On May 8, President Donald Trump fired Carla Hayden, the head of the Library of Congress, and Shira Perlmutter, the head of the U.S. Copyright Office just days after. (Photo by Kayla Bartkowski/Getty Images)

President Donald Trump’s firing over the weekend of Shira Perlmutter, director of the U.S. Copyright Office, has drawn strong criticism from Democrats and tech experts who believe her dismissal is related to a report on generative AI and copyright law that the register of copyrights released a day earlier.

That report, overseen by Perlmutter, questioned whether AI companies can legally train their models on massive amounts of copyrighted data without compensating creators. It also disputed the common claim made by AI companies that this practice should fall under the “Fair Use” exemption of copyright law.  

While the U.S. Copyright Office noted that data collection might sometimes be legal, it sees important differences between academic or nonprofit use and what commercial AI companies are doing.

“When a model is deployed for purposes such as analysis or research — the types of uses that are critical to international competitiveness — the outputs are unlikely to substitute for expressive works used in training,” the report stated. “But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.”

Advertisement

The report identified two aspects of AI training that may conflict with existing copyright law: early training phases and memorization.

Large language models are trained in iterative steps, ingesting massive amounts of data while developers tweak the system weights to make the model’s answers more responsive and useful. 

This phase, called “pre-training,” is what AI companies argue should be permitted. But the Copyright Office noted that such pre-training “often requires orders of magnitude more data and computing power than other training” and is the stage “responsible for many of the sophisticated capabilities of generative AI models.”

“The steps required to produce a training dataset containing copyrighted works clearly implicate the right of reproduction,” the report concluded, while also adding that creating and deploying an AI system with copyrighted material “involves multiple acts that, absent a license or other defense, may infringe one or more rights” of copyright holders.

Businesses like OpenAI, Anthropic and Meta are the subject of numerous lawsuits from news organizations, entertainers, artists and creators accusing them of mass intellectual property theft by collecting data from the internet, social media,data brokers and other sources. At the same time, OpenAI has objected to Chinese companies like High Flyer training their own AI models on data from ChatGPT and other U.S.-made AI tools.

Advertisement

The report also took issue with claims from AI companies that they cannot be infringing on copyright because, as lawyers for Google claimed in court, there “is no copy of the training data — whether text, images, or other formats — present in the model itself.” OpenAI has called the critique that their models contain copyrighted work — as opposed to being generally informed by such work in their outputs — “a common and unfortunate misperception of the technology.”

But again, the Copyright Office disputed those arguments, saying researchers and media outlets have identified numerous instances where an AI model generates answers that closely mirror or repeat verbatim protected work.

Those conclusions drew sharp reactions from some industry groups that are pushing the Trump administration for looser restrictions on the data that AI companies can legally collect and use.

“Copyright’s constitutional purpose is to promote progress, yet by this report’s logic legacy content industries could thwart development of generative AI and its revolutionary capabilities,” Adam Eisgrau, senior director of AI, creativity & copyright policy for the Chamber of Progress, said in a statement. “Thankfully, fair use is independently decided by judges on a case-by-case basis and not the Copyright Office.”

Rep. Joe Morelle, D-N.Y., ranking member on the House Administration Committee, called Perlmutter’s firing “a brazen, unprecedented power grab with no legal basis” and suggested that the move was at the behest of billionaire Elon Musk and the administration’s AI industry allies.

Advertisement

“It is surely no coincidence he acted less than a day after she refused to rubber-stamp Elon Musk’s efforts to mine troves of copyrighted works to train AI models,” Morelle said.

A request for comment to the White House press office for more information on the rationale for Perlmutter’s firing was not returned by the time of publication.

Derek B. Johnson

Written by Derek B. Johnson

Derek B. Johnson is a reporter at CyberScoop, where his beat includes cybersecurity, elections and the federal government. Prior to that, he has provided award-winning coverage of cybersecurity news across the public and private sectors for various publications since 2017. Derek has a bachelor’s degree in print journalism from Hofstra University in New York and a master’s degree in public policy from George Mason University in Virginia.

Latest Podcasts