On December 27, 2023, the New York Times sued OpenAI for copyright infringement. OpenAI is the company behind popular application ChatGPT. The New York Times’s lawsuit is not unique but could be hugely consequential to nascent generative AI applications.
ChatGPT is a “large language model,” meaning that responses to user queries come from “training.” In essence, ChatGPT consumes vast amounts of data to “learn” and respond to user queries. In the New York Times case, the paper alleges that ChatGPT is trained on its copyrighted content and that its responses are verbatim “regurgitations” of all or part of its copyrighted content. OpenAI responds by pointing to ChatGPT’s transformational use. ChatGPT writes in the style of New York Times writers, but direct regurgitations occur in less than one percent of queries.
The federal courts hearing this lawsuit are confined to precedent on outmoded technology. One such bedrock case is Sony Corp. of America v. Universal City Studios, Inc., also known as the “Betamax case.” The Supreme Court in the Sony Corp. case ultimately held that if a piece of technology has some “transformational” use, rather than being built purely for infringing purposes, the technology does not infringe on a copyright. The outcome largely rested on the output of the Betamax.
A generation later, the Supreme Court was asked to decide whether Napster, a file sharing service, was infringing on copyrights. The Supreme Court held that Napster’s purpose purely infringed on copyrights and had no transformational use. Napster quickly became defunct. The outcome largely rested on the input and internal mechanizations of Napster.
Around the time of the Napster case, several publishers sued Google over Google Books. In contrast to Napster, the courts deemed Google Books transformative and amounted to fair use. Google Books would allow users to preview parts of books for free.
The overall question for the courts is whether ChatGPT is transformative enough to not infringe on various copyrights. The courts will have to decide whether to look at ChatGPT’s input (copyrighted material taken from various websites) or output (“original” content). This case is worth watching, because it could implicate how information is used for years to come.