“Impossible” to build AI tools like ChatGPT without copyrighted material, says OpenAI – The Guardian

OpenAI

Pressure is growing on artificial intelligence companies regarding the content used to train their products

Developer OpenAI said it would be impossible to develop tools like its groundbreaking chatbot ChatGPT without access to copyrighted material, as pressure mounts on artificial intelligence companies over the content used to train their products.

Chatbots like ChatGPT and image generators like Stable Diffusion are “trained” on a vast pool of data from the Internet, much of which is protected by copyright – a legal protection against unauthorized use of other people’s work.

Last month, The New York Times sued OpenAI and Microsoft, which is a leading investor in OpenAI and uses its tools in its products, accusing them of “unlawfully using” its work to develop their products.

In a submission to the House of Lords Communications and Digital Select Committee, OpenAI said it could not train large language models such as its GPT-4 model – the technology behind ChatGPT – without access to copyrighted works.

“Because copyright law now covers virtually every form of human expression – including blog posts, photos, forum posts, pieces of software code and government documents – it would be impossible to train today’s leading AI models without using copyrighted material,” OpenAI said in his filing, first reported by the Telegraph.

George RR Martin and John Grisham are among a group of authors suing OpenAI

It added that limiting training materials to out-of-copyright books and drawings would lead to inadequate AI systems: “Limiting training data to public domain books and drawings created more than a century ago could lead to an interesting experiment However, “AI systems would not be able to meet the needs of today’s citizens.”

In response to the NYT lawsuit last month, OpenAI said it “respects the rights of content creators and owners.” AI companies' defenses against the use of copyrighted material tend to rely on the legal doctrine of “fair use,” which allows the use of content in certain circumstances without seeking permission from the owner.

OpenAI said in its statement that it believes that “copyright law does not legally prohibit training.”

The NYT lawsuit followed numerous other legal complaints against OpenAI. John Grisham, Jodi Picoult and George RR Martin were among 17 authors who sued OpenAI in September for “systematic theft on a large scale.”

Getty Images, which owns one of the largest photo libraries in the world, is suing Stable Diffusion's inventor, Stability AI, in the US and in England and Wales over alleged copyright infringement. In the US, a group of music publishers including Universal Music is suing Anthropic, the Amazon-backed company behind the Claude chatbot, accusing it of misusing “countless” copyrighted song lyrics to train its model.

Elsewhere in its submission to the House of Lords, OpenAI responded to a question about AI security by saying it supported an independent analysis of its security measures. The filing said it supports “red-teaming” AI systems, in which outside researchers test the security of a product by mimicking the behavior of fraudulent actors.

OpenAI is among companies that have agreed to work with governments on pre- and post-deployment security tests of their most powerful models, following an agreement reached at a global security summit in Britain last year.

{{#Ticker}}

{{top left}}

{{bottom left}}

{{top right}}

{{bottom right}}

{{#goalExceededMarkerPercentage}}{{/goalExceededMarkerPercentage}}{{/ticker}}

{{Headline}}

{{#paragraphs}}

{{.}}

{{/paragraphs}}{{highlightedText}}
{{#choiceCards}}

One-time, monthly, yearly

Other

{{/choiceCards}}We will be in touch to remind you to contribute. Watch for a message in your inbox. If you have any questions about contributing, please contact us.