The New York Times sues OpenAI and Microsoft for copyright

The New York Times sues OpenAI and Microsoft for copyright infringement – ​​CNN

New York CNN –

The New York Times has sued OpenAI and Microsoft for copyright infringement, alleging that the companies' artificial intelligence technology illegally copied millions of Times articles to train ChatGPT and other services to give people instant access to information – a technology , which now competes with the Times.

The complaint is the latest in a series of lawsuits aimed at restricting the use of the alleged scraping of large amounts of content from across the Internet – without compensation – to train so-called artificial intelligence models in major languages. Actors, writers, journalists and other creatives who publish their work online fear that AI will learn from their material and provide competitive chatbots and other sources of information without adequate compensation.

But the Times' lawsuit is the first among major news publishers to take on OpenAI and Microsoft, the best-known AI brands. Microsoft (MSFT) has a seat on OpenAI's board and a billion-dollar investment in the company.

In a complaint filed Wednesday, the Times said it had a duty to inform its subscribers, but Microsoft and OpenAI's “unlawful use of the Times' work to develop artificial intelligence products that compete with it jeopardizes the Times' ability to do so.” The paper noted that OpenAI and Microsoft used other sources in their “large-scale copying” but “placed a particular emphasis on the Times' content” to “exploit the Times' massive investment in its journalism, by using them to develop replacement products without permission.” Payment.”

“We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models,” OpenAI said in a statement from spokeswoman Lindsey Held. “Our ongoing discussions with The New York Times have been productive and progressing constructively, so we are surprised and disappointed by this development. We hope to find a mutually beneficial way to work together, as we do with many other publishers.”

Microsoft did not respond to a request for comment on the lawsuit.

The Times said in its complaint that it objected when it discovered months ago that its work had been used to train the companies' large language models. The Times said it began in April Negotiating with OpenAI and Microsoft to obtain fair compensation and determine the terms of an agreement.

However, the Times claims it has failed to reach a resolution with the companies. Microsoft and OpenAI claim that the Times' works qualify as “fair use,” which gives them the ability to use copyrighted material for a “transformative purpose,” the complaint says.

The Times strongly disputed this claim, saying that ChatGPT and Microsoft's Bing chatbot (also known as “Copilot”) could provide a similar service to the New York Times.

“There is nothing 'transformative' about using the Times' content without payment to create products that replace the Times and take away its audience,” the Times' complaint states. “Because the outputs of Defendants’ GenAI models compete with and closely mimic the inputs used for training, copying Times works for this purpose is not fair use.”

The Times is among a number of leading newsrooms, including CNN, that added code to their websites earlier this year that blocks OpenAI's web crawler, GPTBot, from crawling their platforms for content.

In separate but related lawsuits earlier this year, comedian Sarah Silverman and two authors sued Meta and OpenAI in July, alleging that the companies' AI language models were trained on copyrighted material from their books without their knowledge or consent. Neither company has commented on the lawsuit. A judge dismissed most of the lawsuit's claims in November.

And a group of famous fiction authors, along with the Authors Guild, filed a separate class action lawsuit against OpenAI in September, alleging the company's technology illegally uses their copyrighted works.

In its lawsuit, The Times claims that the datasets used to train the latest large-scale OpenAI language models that power its AI tools “likely leveraged millions of works owned by The Times.” In a 2019 English-language snapshot of one of these datasets – called Common Crawl and known as the “copy of the Internet” – the New York Times website is the third most represented source of information, after Wikipedia and a database of U.S. patent documents, it says of the complaint.

The Times claims that because the AI ​​tools have been trained on their content, they can “generate output that recites the Times' content verbatim, summarizes it accurately, and mimics its style of expression, as numerous examples show… These tools also write false information.” “falsely sent to the Times,” the complaint states.

In one instance cited in the complaint, ChatGPT provided a user with the first three paragraphs of the 2012 Pulitzer Prize-winning article “Snow Fall: The Avalanche at Tunnel Creek” after the user complained in chat about the paywall of the Times and not being able to read it.

The news outlet also claims that Microsoft's Bing search engine, which was upgraded with OpenAI technology earlier this year, “copies and categorizes” Times content to provide longer, more detailed answers than traditional search engines.

“By providing Times content without the Times’ permission or authorization, Defendants’ tools undermine and harm the Times’ relationship with its readers and deprive the Times of subscription, licensing, advertising and affiliate revenues,” it says the statement of claim.

But fighting AI is like sticking your finger in a dike. It's just around the corner and publishers like The New York Times are realizing they need to prepare for the future. They just want to make sure it's a future in which they're fairly compensated, the New York Times said.

New York Times Executive Vice President and General Counsel Diane Brayton told branch employees in a memo Wednesday morning: “We recognize the potential of [generative AI] for the public and for journalism.”

“But at the same time, we believe that the success of GenAI and the companies that develop it does not have to come at the expense of journalistic institutions,” said the memo, obtained by CNN. “Use of our work to create GenAI tools must be done with authorization and an agreement that reflects the fair value of that work, as required by law.”

The Times' lawsuit seeks billions of dollars in damages, but did not specify the compensation it seeks for the alleged infringement of its copyrighted materials. It also seeks a permanent injunction that would prevent Microsoft and OpenAI from continuing the alleged infringement. The Times also seeks the “destruction” of GPT and any other AI models or training datasets that inform its journalism.

The Times lawsuit could ultimately set a precedent for the entire industry as it questions whether using copyrighted material to train AI models violates the law, according to Dina Blikshteyn, partner in the artificial intelligence and deep learning practice group at , an unresolved legal matter is the law firm of Haynes Boone.

“I think there will be a lot of these types of suits that will pop up, and I think at some point [the issue will] We need to take it to the Supreme Court, then we will have clear case law,” Blikshteyn said, adding that there is currently “nothing specific to large language models and AI just because they are so new.”

This story has been updated with additional developments and context.