ChatGPT Violates European Data Protection Laws, Italian Data Protection Authority Tells OpenAI

Photo credit: Didem Mente/Anadolu Agency/Getty Images

After a multi-month investigation into its AI chatbot ChatGPT by Italy's data protection authority, OpenAI has been told it is suspected of violating European Union privacy.

Details of the Italian authority's draft findings were not disclosed. But Garante said today that OpenAI had been notified and had 30 days to respond with a defense to the allegations.

Confirmed violations of the EU-wide regulation can be punished with fines of up to 20 million euros or up to 4% of annual global turnover. Even more inconveniently for an AI giant like OpenAI, data protection authorities (DPAs) can issue orders requiring changes to data processing to put an end to confirmed violations. Therefore, it may be forced to change the way it operates. Or withdraw its service from EU member states where data protection authorities are trying to enforce changes it doesn't like.

OpenAI has been contacted for a response to Garante's breach notification. We will update this story if they send a statement.

To update: OpenAI said:

We believe our practices are consistent with GDPR and other data protection laws and are taking additional measures to protect people's data and privacy. We want our AI to learn about the world, not about individuals. We are actively working to reduce personal data by training our systems such as ChatGPT, which also rejects requests for private or sensitive information about people. We plan to continue to work constructively with Garante.

Legality of AI model training in the framework

The Italian authority raised concerns about OpenAI's compliance with the General Data Protection Regulation (GDPR) last year – when it ordered a temporary ban on local data processing by ChatGPT, which resulted in the AI chatbot being temporarily suspended from the market.

The March 30 Guarantee's provision for OpenAI, also known as the “Register of Measures”, highlighted both the lack of an appropriate legal basis for the collection and processing of personal data for the purpose of training the algorithms underlying ChatGPT; and the AI tool's tendency to “hallucinate” (i.e., its potential to produce inaccurate information about individuals) – were among the issues of concern at the time. Child safety was also identified as an issue.

Overall, the authority suspected that ChatGPT violated Articles 5, 6, 8, 13 and 25 of the GDPR.

Despite identifying this long list of suspected violations, OpenAI was able to restore ChatGPT service in Italy relatively quickly last year after taking steps to address some issues raised by the data protection authority. However, the Italian authority said it would continue investigating the alleged violations. Provisional conclusions have now been reached that the tool violates EU law.

While the Italian authority has not said which of the previously suspected ChatGPT violations it has confirmed at this time, the legal basis that OpenAI cites for processing personal data to train its AI models appears to be a particularly sensitive issue.

This is because ChatGPT was developed using bulk data from the public internet – information that includes individuals' personal data. And the problem that OpenAI faces in the European Union is that processing the data of EU citizens requires a valid legal basis.

The GDPR lists six possible legal bases – most of which are simply not relevant in its context. Last April, OpenAI was ordered by Garante to remove references to “contract performance” for training the ChatGPT model – leaving only two options: consent or legitimate interests.

Given that the AI giant has never attempted to obtain consent from the untold millions (or even billions) of web users whose information it has ingested and processed to build AI models, any attempt to claim it seems have the Europeans' permission for processing, misplaced doomed to failure. And when OpenAI revised its documentation after Garante's intervention last year, it appeared to want to rely on the claim of legitimate interest. However, this legal basis still requires that a data processor give data subjects the opportunity to object – and stop the processing of their data.

How OpenAI might do this in the context of its AI chatbot is an open question. (Theoretically, it might require retiring and destroying illegally trained models and retraining new models without the objecting individual's data in the training pool – but assuming it could even identify all illegally processed data on an individual basis, that would be the case the case I have to do this for the data of every single EU citizen who objected and said it should stop… Which, um, sounds expensive.)

Beyond this thorny issue, the broader question is whether Garante will ultimately conclude that legitimate interests even constitute a valid legal basis in this context.

Frankly, that looks unlikely. Because LI is not an all-rounder. It requires data processors to balance their own interests against the rights and freedoms of the individuals whose data is being processed – and to consider whether individuals would have expected their data to be used in this way; and the potential to cause them unwarranted harm as a result. (If they had not anticipated this and there was a risk of such harm, LI would not be considered a valid legal basis.)

The processing must also be necessary and there is no other, less intrusive way for the data processor to achieve its objective.

In particular, the EU Supreme Court has previously found that legitimate interests constitute an inappropriate basis for Meta to track and profile individuals in order to conduct its behavioral advertising business on its social networks. So the idea of another type of AI giant seeking to justify large-scale processing of human data to build a commercial generative AI business is rife with question marks – especially when the tools in question pose all sorts of novel risks to named individuals (from disinformation and defamation to identity theft and fraud, to name a few).

A spokesperson for Garante confirmed that the legal basis for processing personal data for model training remains the mix of alleged ChatGPT infringement. However, they have not confirmed at this point exactly which article (or articles) OpenAI is allegedly violating.

Today's announcement from the authority is not the final word either, as it will also wait for OpenAI's response before making a final decision.

Here is the Garante's statement (which we translated from Italian using AI):

[Italian Data Protection Authority] has informed OpenAI, the company that operates the artificial intelligence platform ChatGPT, of its appeal for violating data protection regulations.
Following the provisional processing restriction issued on March 30 by the Garante against the company and following the result of the preliminary investigation carried out, the Authority concluded that the acquired elements could constitute one or more unlawful acts with regard to the provisions of the EU regulation.
OpenAI has 30 days to submit its defense statements regarding the alleged violations.
When defining the procedure, the Garante takes into account the ongoing work of the special working group set up by the body bringing together the EU data protection authorities (EDPB).

OpenAI is also facing scrutiny over ChatGPT's GDPR compliance in Poland after receiving a complaint last summer focused on an incident of the tool providing inaccurate information about an individual and OpenAI's response to that complainant . This separate GDPR investigation is ongoing.

OpenAI, meanwhile, has responded to rising regulatory risk across the EU by seeking to establish a physical base in Ireland; and announced in January that this Irish company would be the future service provider for EU users' data.

These moves hope to achieve so-called “head office” status in Ireland and move towards having GDPR compliance assessments carried out by the Irish Data Protection Commission via the regulation's one-stop-shop mechanism – rather than (like until now). ) its business may be subject to the supervision of the data protection authority from anywhere in the Union where its tools have local users.

However, OpenAI has yet to receive this status, so ChatGPT may still face further investigation by data protection authorities in other parts of the EU. And even if it receives the status, the Italian investigations and enforcement actions will continue, since the data processing in question occurred before the change in its processing structure.

The bloc's data protection authorities have sought to coordinate their oversight of ChatGPT by setting up a taskforce through the European Data Protection Board to examine how the GDPR applies to the chatbot, according to Garante's statement. These (ongoing) efforts may ultimately lead to harmonized results in individual ChatGPT GDPR investigations – for example in Italy and Poland.

However, authorities remain independent and empowered to make decisions in their own markets. Therefore, there is no guarantee that any of the current ChatGPT investigations will reach the same conclusions.