ChatGPT Would Be Able To Easily Thwart Some Traditional Email Address Obfuscation Techniques, A Capability That Could Be Exploited By Threat Actors

A developer has highlighted an unexpected capability of ChatGPT: OpenAI’s AI chatbot would be able to easily bypass email address obfuscation techniques. They are very often used to hide email addresses on online platforms, as they can be collected through web scraping and used to send unsolicited emails. But the fact that some of them can now be easily thwarted by ChatGPT, and probably its competitors as well, poses a new security risk for Internet users. However, exploiting this ability of AI models could be very costly.

There are several reasons why a person or company might hide their email address on certain platforms. For example, email address obfuscation techniques such as changing characters (e.g. replacing “@” with “at”) are used to prevent automated web scraping tools from easily scavenging email addresses. Collect addresses. They are used in social networks, online forums, etc. Affected individuals may be the target of large-scale phishing campaigns, posing a significant risk of a data breach.

However, developer Arnaud Norman reported that ChatGPT effortlessly breaks through these barriers and highlights obscured email addresses with remarkable precision. Norman, who develops the AI tool BulkNinja, worked on a project to use AI to organize discussion threads called “Ask HN: Who’s Hiring?” on the community platform Hacker News. In these discussion forums, companies and startups publish job offers and, conversely, job seekers advertise themselves and offer their services. However, the inconsistent format makes it difficult to sort through the huge amount of information.

As part of his project, Norman called ChatGPT. Norman, trying to compile this data into Google Sheets for easier access, asked ChatGPT to include the contact information in job postings. The developer assumed that it would be difficult to extract obfuscated contacts, but found that ChatGPT collected contacts without any problems, even when some letters in email addresses were replaced with other characters. I realized that if I used it, I could eliminate the need to obfuscate email addresses,” the developer notes in a blog post.

Intriguingly, Norman finds that ChatGPT was successful at decrypting email addresses even when multiple obfuscation methods were used simultaneously. Norman said he ultimately ignored that data. He claimed: Even when multiple obfuscation methods were used, the AI chatbot skillfully identified the intended email addresses and retrieved them with remarkable accuracy. Ultimately, I decided to exclude contact emails from the final Google Spreadsheet because people who hide their emails obviously don’t want them to be publicly available.

The scale of this capability raises questions about the effectiveness of traditional obfuscation methods compared to advanced AI systems like ChatGPT. In his blog post, Norman shared some fascinating techniques he came across while reviewing the extracted data. In addition to the “character replacement method,” the developer found three other obfuscation techniques impressive:

Division of information in the message

According to Norman, this technique involves writing part of the email address as “john@company name domain” so that the email address is only recognizable when it is associated with the company name in the message. Norman notes that this method was quite effective, but ChatGPT would have easily outsmarted him if he had given him the prompt: “Think one step at a time.”

Indirect publication of information

With this method, the author of the message does not publish his email address, but rather indicates where you can find it. The message may read: For all inquiries, please use the email address on the employment information page. Since Norman’s code did not include navigation functionality, this method remains valid.

Other indirect publishing method

This is the same method as before. On the other hand, in this case the author of the message writes: The email address is in my profile. He then refers to the Hacker News profile. This method was also effective for the reasons mentioned above. This method is very effective because it would be expensive to use an AI to search the site for the profile to find the email address, Norman said.

He commented on this experience as follows: In summary, traditional email obfuscation techniques such as character substitution are completely ineffective against advanced language models such as ChatGPT. The battle to protect email addresses from automated collection seems lost from the start, as these models are capable of deciphering various “obfuscation” techniques. While ChatGPT amazes with its ability to decrypt an obfuscated message, it is interesting that simple scripts using a regular expression can achieve similar results.

However, the fundamental difference lies in the approach taken, as ChatGPT relies on AI algorithms to achieve its decoding capabilities. Furthermore, the impact of ChatGPT’s decoding ability is significant. Organizations and individuals that rely on email communications now have the opportunity to rethink the methods they use to protect their contact information. With the emergence of AI models like ChatGPT, researchers say it is important to remain vigilant and experiment with more robust measures to protect sensitive information.

I used ChatGPT to decode proprietary binaries from industrial machines. It’s amazing how he can decipher things like this and find patterns. First he looked for ASCII characters, strings of bytes that serve as separators, then he started looking for which bytes could represent length, or which 4 bytes could be floating point numbers of coordinates, and which endianness made the most sense for contact data, etc. It’s actually completely crazy. I strongly believe that people need to start protecting their sensitive data like never before.

In a world where “hidden” doesn’t always mean secure, ChatGPT’s decryption capability paves the way for further investigation and development of advanced techniques to address potential email security vulnerabilities. But in the comments, some pointed out that while ChatGPT’s ability to decrypt obfuscated messages seems tempting, exploiting it would be very costly. In fact, the cost of extracting email addresses with ChatGPT exceeds the revenue generated by web scraping email addresses. So it won’t have much of an impact, notes one critic.

However, other critics counter that operating costs can be kept low because there are open source models that can run on local machines. Norman’s experience echoes a study published last month by researchers at ETH Zurich in Switzerland. The study shows that AI chatbots like ChatGPT can infer sensitive information about the people they chat with, even if the conversation is completely mundane. This information includes race, location, occupation and more. A threat to user privacy.

The team says this worrying feature is “highly problematic” as fraudsters could exploit it to collect sensitive data from unsuspecting users. It can also be used for targeted advertising. Currently, researchers say they don’t know how to solve the problem.

Source: blog post

And you ?

What is your opinion on this topic?

What do you think of ChatGPT’s ability to decrypt obfuscated email addresses?

What impact might this feature have on language models like ChatGPT?

Will using this capacity be too expensive, as some claim? For what ?

What are the risks for internet users and companies? How can they mitigate these risks?

What do you think about the ability of AI models to extract sensitive information from innocuous conversations?

See also

Study claims AI chatbots can accurately guess a user’s personal information from innocuous chats, posing a threat to users’ privacy

Beyond the safeguards: Investigating the security risks of the AI chatbot ChatGPT. These risks include generating fraudulent services and collecting malicious information

Employees are reportedly sharing sensitive business data with ChatGPT, raising security concerns. ChatGPT can remember this data and disclose it later