Things get even worse for generative AI OpenAI39s DALL E and

Things get even worse for generative AI: OpenAI's DALL-E and Midjourney generated proprietary images and trademarks without prompting users – Developpez.com

Things get even worse for generative AI OpenAI39s DALL E and
In its fight against OpenAI and Microsoft, the New York Times has cited several examples of ChatGPT reciting excerpts from its journalists' articles almost verbatim. Internet users have discovered that plagiarism goes far beyond text and also affects images. They used images generated by DALL-E, OpenAI's tool that specializes in generating images using simple text prompts.

The emergence of widely available CG models like Midjourney and Stable Diffusion has sparked a fierce online battle between artists who view AI-powered works as a form of theft and those who enthusiastically embrace these new creative tools. Established artistic communities are at a crossroads, fearing that non-AI works will be drowned out by an unlimited supply of AI-generated works, while these tools have become very popular among some of their members.

Regarding the ban on CG art on its art portal, Newgrounds wrote: “We want to keep the focus on human-made art and not flood the art portal with computer-generated art.” Fur Affinity expressed concerns about the ethics of how CG models learn from existing works of art, and wrote, “Our goal is to support artists and their content.” We do not believe it is in the best interest of our community to allow AI-generated content on the site. These are just the latest steps in a rapidly evolving debate about how artistic communities (and art professionals) can adapt to software that can potentially produce infinitely great works of art at a pace that no human could manage without the tools.

Among these tools we can mention DALL-E 3, OpenAI's AI system that can generate images from a few words or edit and refine existing images in the same way. For example, the “Fox in a tree” prompt would show a photo of a fox sitting in a tree, or the “Astronaut with a bagel in his hand” prompt would show, “Well, you see where this is going.” The software not only creates an image in a single style, you can also add different artistic techniques as desired by entering drawing styles, oil painting, modeling clay, wool knitting, drawing on a cave wall or even like a 1960s movie poster.

When AI reproduces copyrighted works

However :

  • Generative AI systems such as DALL-E and ChatGPT were trained on proprietary material*;
  • OpenAI, despite its name, was not transparent about the topics it was trained on.
  • Generative AI systems are fully capable of producing material that infringes copyright. This case is also at the heart of the legal dispute between New York Times OpenAI and Microsoft. The complaint cites several examples in which ChatGPT recited excerpts from New York Times journalists almost verbatim. OpenAI was not involved in creating this content, but with minimal prompting it will recite much of it verbatim, the complaint says.1704232701 232 Things get even worse for generative AI OpenAI39s DALL E and
    Links is part of the response generated by ChatGPT. That's right, the New York Times article. The corresponding text is red.
  • They don't notify users when they do this.
  • They provide no information about the provenance of the images they produce.
  • Users may not know when creating a particular image whether they are violating their rights.

OpenAI DALL-E

Some began to notice that DALL-E was reproducing copyrighted works. For example, this Internet user who explains: “It should be clear by now that even very vague requests systematically lead to copyright and/or trademark violations.” How can the user be given responsibility if the genAI model attempts to commit violations? without being asked*?

Or even A16Z's Justine Moore, who declares: “We're definitely winning the copyright battle, guys.” These Italian brothers look nothing like Mario and Luigi.

In the middle of the journey

But DALL-E is not the only system that offers this type of representation. Reid Southern, a film designer and illustrator, said he found compelling evidence of “Midjourney's” blatant copyright infringement.

In case you're curious, I have a lot more copyright violations to report from Midjourney. This includes other examples like this example from Dune, where the same image is repeated over and over again. This is not an isolated case, I believe it is actually quite common and I would like to demonstrate it.

For an AI expert, none of this is easy to solve…

Gary Marcus wears many hats and presents himself as a leading expert in AI. He speaks before the US Senate AI Oversight Subcommittee, Founder/CEO of Geometric Intelligence (which was acquired by Uber), and TED speaker.

Given the situation, he explained:

I don't think any of this can be solved easily.

Systems like DALL-E and ChatGPT are essentially black boxes. GenAI systems do not specify attribution to source documents, as this is not possible, at least in their current form. (Some companies are researching how to do something like this, but I don't know of any convincing solutions yet.)

Unless someone succeeds in inventing a new architecture that can reliably trace the origin of generative text and/or images, the violation, often not at the user's request, will continue.

A good system should provide the user with a source manifest*; Current systems do not do this.

In all likelihood, the New York Times trial is just the first in a long series. Today, in a multiple-choice X poll, I asked people whether they thought the case would be settled (most did) and what the likely value of such a settlement might be. Most responses were for $100 million or more, with 20% expecting a $1 billion settlement. If you multiply such numbers by the number of film studios, video game companies, other newspapers, etc., you quickly arrive at astronomical sums.

And OpenAI faces other risks.

He also said that Microsoft was also responsible.

…but an engineer says the NYT example doesn't even constitute copyright infringement

None of these elements constitute a violation. A model that produces even the exact product, token by token or pixel by pixel, does not constitute a counterfeit. The red text pages of the suit are also unconvincing. First, it's possible that the verbatim text in the chat app is actually RAG and has nothing to do with the template itself. That would be funny… The New York Times won't like this surprise. Even if this isn't the case and the model recites the text/pixels verbatim… so what? The New York Times doesn't have a good argument here.

This is a misunderstanding of both fair use and technology. Documents created and protected by copyright do not apply. It is also not illegal to extract and resell content. Just ask HiQ, which the 9th Circuit protected from Linkedin – HiQ literally scraped and resold unsecured data from (mostly) LinkedIn's commercial website.

In Authors Guild v. Google, even Google's literal digitization of books to create a searchable database, was considered fair use, transformative. OpenAI's use of NYT content is similar, converting it for AI learning by splitting it into tokens and then converting it into embeds – again, this is a misunderstanding of people*; They believe that the words are used to form the model. That's not the case. The numbers come in. To get numbers you have to transform Words in tokens and then in numbers.

Fox News Network, LLC v. TVEyes, Inc., a service that records all content broadcast by organizations for indexing and clipping purposes, was found to be fair use because of its transformative purpose.

..Sony Corp. v. Universal City Studios supported technological innovation in the Betamax case, supporting new technological applications such as the creation of complete copies of television shows.

The NYT also has a huge hill to climb to prove that its alleged losses are due to AI and not its own failing business model (challenging to say the least).

Sources: Gary Marcus, Justine Moore

And you ?

Tinder travaille sur un new subscription mensuel a 500 dollars Are you surprised that generative AI uses protected works even if they are not mentioned in the entry?

Tinder travaille sur un new subscription mensuel a 500 dollars Do you think this constitutes copyright infringement? To what extent?

Tinder travaille sur un new subscription mensuel a 500 dollars What do you think of Gary Marcus's argument that the problem will always exist unless the architecture is changed to allow source identification?

Tinder travaille sur un new subscription mensuel a 500 dollars What do you think of Zack's argument that the case raised by the New York Times is not covered by copyright law?