OpenAI Seeks Dismissal of New York Times Copyright Infringement Lawsuit

OpenAI filed a motion to dismiss a December 2023 lawsuit from the New York Times. The initial suit claims OpenAI infringed The Times’ copyrights by using millions of their articles to generate news answers for users who illegally accessed those articles.

Attribution

Man Using Laptop with Chat GPT by Matheus Bertelli on Pexels

OpenAI has filed a motion to dismiss a December 2023 lawsuit filed by the New York Times and Microsoft. The lawsuit alleges that OpenAI unlawfully used The Times’ copyrighted works to develop artificial intelligence products that directly compete with the newspaper’s ability to provide exclusive content to its subscribers. The lawsuit claims that “the defendants seek to free ride on The Times’ massive investment in its journalism by using it to build substitutive products.” OpenAI’s large language models (LLMs) were trained using millions of copyrighted articles, investigations, reviews, and opinion pieces from The Times without permission or payment. 

On February 27, 2024, OpenAI filed a motion in the U.S. District Court for the Southern District of New York, seeking to dismiss key claims made by The Times. OpenAI argues that the newspaper paid individuals to hack into their chatbot, generating manipulated evidence to support their lawsuit. According to the motion, “It took tens of thousands of attempts to generate the highly anomalous results” that were presented in The Times’ complaint, suggesting that the results were artificially created. OpenAI also highlighted its intention to collaborate with The Times, proposing a “high-value partnership around real-time display with attribution to AI.”  The company disputes the claim that ChatGPT poses a threat to independent journalism, arguing that AI tools summarize existing information rather than create original content and do not function in the same way as a news subscription service. 

OpenAI acknowledges that their LLMs occasionally suffer from a bug called “regurgitation,” where certain content is repeated from the sources the AI was trained on, but they describe this as rare and something they are actively working to fix. In its motion, OpenAI insists that its tools provide contextual summaries rather than acting as a replacement for original reporting. Therefore, they contend that the claims made in the lawsuit mischaracterize the capabilities and intent of their AI models.

OpenAI contends that The Times allegedly manipulated OpenAI’s chatbot to artificially reproduce articles in a way that would support the lawsuit’s claims. They argue that this tactic undermines the integrity of the lawsuit and paints a misleading picture of the technology’s real-world performance. The company’s motion to dismiss emphasizes that The Times’ methods were flawed and involved synthetic manipulation of search results, which would not naturally occur.

To support its position, OpenAI’s legal team draws on a quote from Justice Brandeis: “The general rule of law is that the noblest of human productions—knowledge, truth ascertained, conceptions, and ideas—become, after voluntary communication to others, free as the air to common use.” Justice Brandeis’s words highlight OpenAI’s broader argument that once information has been shared publicly, it should not be monopolized by a single entity; especially when AI technologies are involved in disseminating that information.

OpenAI is seeking to dismiss four specific claims made by The Times, one of which involves the assertion that OpenAI’s ChatGPT can freely reproduce The Times’ articles. In response, OpenAI contends that this allegation is based on manipulative tactics and that The Times obtained the data improperly and in doing so, undermines its case. 

The heart of this legal conflict is a critical issue that will become more prevalent as AI advances: When does information cross the line from public knowledge to something that is protected by intellectual property law? Both parties in the lawsuit accuse each other of trying to monopolize information that should be freely available in the public domain. The Association of Research Libraries (LCA) has answered the question of whether LLMs are fair use with the answer based on precedence, referencing Authors Guild v Google, which upheld that the mass digitization of large volumes of copyright books to distill and reveal new information about the books was fair use.

This case is about more than just a dispute between two major organizations—it has far-reaching implications for the future of information sharing in the digital age. As AI technologies become more widespread, the questions of copyright infringement, the use of synthetic data, and the ethical responsibilities around AI-generated content will continue to grow in importance. The outcome of this case could set significant precedents for how AI and journalism interact in the future.

Defining the boundaries of intellectual property ownership concerning AI outputs is becoming essential as the technology rapidly evolves. Will public knowledge remain freely accessible, or will companies fight to control how that information is used? The outcome of this litigation could shape the landscape for AI technologies and how they are integrated into fields like journalism, information sharing, and content creation.

Previous
Previous

<em> Tribe of Two, LLC v. Vidal </em>

Next
Next

Idol Images and the Plight of Technology