In a new class action lawsuit, ChatGPT creator OpenAI is accused of criminally stealing data from across the internet and then using the stolen data to develop its popular automated products. The lawsuit, filed by law firm Clarkson in a Northern California court this week, is just the latest in a series of legal challenges at the core of the influential startup’s business model.
Netflix passwords, ChatGPT can’t recognize AI and no more CoTweets | Editor’s Choice
Since evolving from a humble research organization into a for-profit company in 2019, OpenAI has been on a meteoric rise to the forefront of the technology industry. When the company launched ChatGPT last November, it became a household name.
But while OpenAI tries to sustain its business and lay the groundwork for future expansion, the controversial nature of the technology it sells could sabotage its own ambitions. Given the radical nature and newness of the AI industry, it only makes sense that legal and regulatory issues would develop. And if legal challenges like the ones filed this week stand, they could undermine the existence of OpenAI’s most popular products and, in turn, threaten the burgeoning AI industry that revolves around them.
The allegations of the Clarkson lawsuit are explained
The central allegation in the Clarkson lawsuit is that OpenAI’s entire business model is based on theft. The lawsuit specifically alleges that the company used “stolen private information, including personally identifiable information, from hundreds of millions of Internet users, including children of all ages, without their informed consent or knowledge” in the development of its products.
It is well known that OpenAI’s large language models – which animate platforms such as ChatGPT and DALL-E – are trained on vast amounts of data. Much of this data, the startup has openly admitted, has been erased from the open internet. By and large, most web scraping is legal, although there are some ambiguities in this basic formula. While OpenAI claims everything it does is correct, it has also been repeatedly criticized for a lack of transparency regarding the sources of some of its data. According to this week’s lawsuit, the startup’s vacuuming practices are patently illegal; Specifically, the lawsuit alleges that the company violated the terms of service of multiple platforms while also violating various state and federal regulations, including privacy laws.
Despite established protocols for acquiring and using personal information, the defendants took a different approach: theft. They have systematically scraped 300 billion words from the internet, “books, articles, websites and posts – including personally identifiable information obtained without consent”. OpenAI did this in secret and without registering as a data broker, as required by applicable law
The lawsuit also highlights the fact that OpenAI, after freely using everyone’s web content, then used that data to develop commercial products, which it is now attempting to sell back to the public for exorbitant sums of money:
Without this unprecedented theft of private and proprietary information that belongs to real people and is shared with unique communities for specific purposes and audiences, the [OpenAI] Products wouldn’t be the multi-billion dollar business they are today.
Whether the US judicial system will ultimately agree to the lawsuit’s definition of theft remains to be determined. Gizmodo contacted OpenAI for comment on the new lawsuit, but received no response.
OpenAI’s legal problems are piling up
The lawsuit against Clarkson isn’t the only one OpenAI is currently dealing with. In fact, OpenAI is subject to an ever-growing list of legal attacks, many of which make similar arguments.
Just this week, another lawsuit was filed in California on behalf of numerous authors who claim their copyrighted works were scraped by OpenAI to devour data to train its algorithms. The lawsuit again alleges that the company basically steals data to further its business – and says it created its products by “bulking” copyrighted works without “consent, without attribution and without compensation”. Further, platforms like ChatGPT are characterized as “infringing derivative works” — essentially meaning they would not exist without the copyrighted material — “that were created without the plaintiffs’ permission and in violation of their exclusive rights under copyright law.”
At the same time, both the Clarkson lawsuit and the authors’ lawsuit bear certain similarities to another lawsuit filed shortly after ChatGPT’s release last November. The lawsuit, filed as a class action by Joseph Savari’s San Francisco offices, accuses OpenAI and its financier and partner Microsoft of ripping off programmers to train GitHub Copilot — an AI-powered virtual assistant. The lawsuit specifically accuses the companies of failing to abide by the open source licensing agreements that underpin much of the development world. Instead, it is alleged that they copied and adopted the code without attribution, while also failing to comply with other legal requirements. In May, a federal judge in California denied OpenAI’s motion to dismiss the case, allowing the legal challenge to proceed.
In Europe, meanwhile, OpenAI has faced similar legal requests from government regulators over a lack of privacy protection for user data.
All of this legal turmoil is taking place against the backdrop of OpenAI’s meteoric rise to Silicon Valley stardom — a precarious new position the company is clearly struggling to maintain. As the company fends off legal attacks, OpenAI CEO Sam Altman is trying to influence how new laws are created around its axis-shifting technology. In fact, Altman has courted governments around the world to lay the groundwork for a friendly regulatory environment. The company is clearly poised to become the de facto leader in the AI industry — provided it can weather the ongoing challenges of its existence.