After telegraphing the move in media appearances, OpenAI launched a tool that attempts to distinguish between human-written and AI-generated text — like the text produced by the company’s ChatGPT and GPT-3 models. The classifier isn’t particularly accurate — its success rate is around 26%, according to OpenAI — but OpenAI argues that when used alongside other methods, it could be useful in preventing AI text generators from being misused.
“The classifier is intended to help defuse false claims that AI-generated text was written by a human. However, it still has a number of limitations — so it should be used as a complement to other methods of determining text source, rather than being the primary decision-making tool,” an OpenAI spokesperson told TechCrunch via email. “We are providing this first classifier to get feedback on whether such tools are useful and hope to share improved methods in the future.”
As enthusiasm for generative AI – particularly text-generating AI – grows, critics have urged the developers of these tools to take steps to mitigate their potentially harmful effects. Some of the largest US school districts have banned ChatGPT from their networks and devices over concerns about the impact on student learning and the accuracy of content produced by the tool. And sites like Stack Overflow have banned users from sharing content generated by ChatGPT, saying the AI makes it too easy for users to flood discussion threads with dubious answers.
OpenAI’s classifier — aptly named the OpenAI AI Text Classifier — is architecturally intriguing. It’s, like ChatGPT, an AI language model trained on many, many examples of publicly available text from around the web. But unlike ChatGPT, it’s fine-tuned to predict how likely it is that a piece of text was generated by AI — not just ChatGPT, but any text-generating AI model.
More specifically, OpenAI trained the OpenAI AI Text Classifier with text from 34 text generation systems from five different organizations, including OpenAI itself. This text was extracted with similar (but not exactly similar) human-written text from Wikipedia, sites extracted from links shared on Reddit were paired with a series of “human demonstrations” collected for a previous OpenAI text generation system. (However, OpenAI admits in a support document that it may have inadvertently classified some AI-written text as human-written “given the proliferation of AI-generated content on the internet.”)
Importantly, the OpenAI text classifier does not work with just any text. It needs at least 1,000 characters or about 150 to 250 words. Plagiarism is not detected – a particularly unfortunate limitation considering that the text-generating AI recreates the text it was trained on. And OpenAI says that because of its English-language dataset, it’s more likely to get it wrong with text written by children or in a language other than English.
The detector takes its answer a bit off when it evaluates whether a certain text was generated by the AI. Depending on the confidence level, text is AI-generated as “very unlikely” (less than 10% chance), “unlikely” AI-generated (between 10% and 45% chance), and “unsure if yes” AI-generated (a probability of 45% to 90%), “possibly” AI-generated (a 90% to 98% probability), or “probably” AI-generated (a greater than 98% probability).
Out of curiosity, I fed some text through the classifier to see how it might handle it. While it confidently and correctly predicted that several paragraphs from a TechCrunch article about Metas Horizon Worlds and a snippet from an OpenAI support page were not generated by AI, the classifier had a harder time with article-long text from ChatGPT and ultimately failed to classify it in total. However, it successfully extracted the ChatGPT output from a Gizmodo article about – what else? — ChatGPT.
According to OpenAI, the classifier incorrectly flags human-written text as AI-written 9% of the time. This error did not occur in my tests, but I attribute this to the small sample size.
On a practical level, I didn’t find the classifier particularly useful for evaluating shorter texts. In fact, 1,000 characters is a difficult threshold to reach when it comes to messages such as emails (at least the ones I receive regularly). And the limitations give food for thought – OpenAI emphasizes that the classifier can be bypassed by changing some words or clauses in the generated text.
This is not to say that the classifier is useless – far from it. But it certainly won’t stop dedicated scammers (or students for that matter) in its current state.
The question is, will other tools? Something of a cottage industry has sprung up to fill the demand for AI-generated text detectors. ChatZero, developed by a Princeton University student, uses criteria such as “perplexity” (the complexity of the text) and “burstiness” (the variations in sentences) to determine whether text may have been written by AI. Plagiarism detector Turnitin is developing its own AI-generated text detector. Additionally, a Google search turns up at least half a dozen other apps claiming to be able to separate the AI-generated chaff from the human-generated chaff to torment the metaphor.
It will probably be a game of cat and mouse. As the text-generating AI improves, so do the detectors — an endless back-and-forth akin to that between cybercriminals and security researchers. And as OpenAI writes, while the classifiers can be helpful in certain circumstances, they will never be reliable sole evidence that text was generated by AI.
That’s all to say that there’s no magic bullet to solve the problems that AI-generated text poses. Most likely it will never happen.