GPT behind ChatGPT
Despite its $10 million investment by Microsoft and an estimated value of $29 billion by 2023, the Open AI LP company has only recently come to the fore. Doubtful is the launch of ChatGPT, which was greeted by an onslaught of users whose number reached 100 million in January.
ChatGPT is a conversational tool, i.e. able to have a human-like conversation – like the widely used trading bots. Its peculiarity lies in the AI on which it is based: version 3.5 of “Generative Pre-trained Transformer” (GPT), a generative AI capable of some form of creation and prediction. Version by version, GPT has increased its number of parameters to 175 billion today – or fragments of knowledge gleaned from online sources such as Common Crawl, Wikipedia, and various literary corpora. These parameters form the data set on which the training system is trained. Its Transformer architecture then makes it possible to answer complex questions by processing many elements at the same time. Finally, Reinforcement Learning (RLHF) provides GPT with a supplemental training dataset based on human feedback on ChatGPT suggestions.
In terms of usage, the general public version of ChatGPT can facilitate the online searching of information for a lay audience and its rendering in accessible language. However, there is a paid version for professionals, including the ability to train them on the customer’s data and models. As for GPT, many applications (the Canva platform, Legaltech Lexion, etc.) already integrate their AI and rely on it to develop other tools. GPT could thus become the basis for multiple activities (creation of websites, customer relations, semantic research, content creation, etc.) in all sectors and professions, especially those of IE.
A relevant tool for finding information in open sources
ChatGPT could be particularly useful for open source research. First of all, it has a great advantage, the tool actually masters almost a hundred natural languages, making it easier to access foreign information. Thanks to its semantic analysis capacity, ChatGPT also allows identifying the keywords and resources related to a specific monitoring topic, a fundamental step that until now cannot be automated. It might also be possible to automate the tasks of tagging the content brought in the day before. It also has the ability to manage very large amounts of data: by providing qualified and up-to-date information, ChatGPT could write daily monitoring or even conduct due diligence.
ChatGPT is also able to draw trends from this mass of data. By training on sensitive data like intelligence reports, GPT was able to detect weak signals and misinformation. IARPA is also developing a similar AI, the REASON project, to provide analysts with new ways of thinking.
However, there are some limitations to be aware of: The GPT database is currently limited to content from 2021, which affects the freshness of the answers. This obstacle could be overcome by connecting the tool to the internet and thus providing real-time information. The relevance of the information provided also raises questions: ChatGPT sometimes gives erroneous information, which is also related to how it actually works, since the tool only predicts the next word, thus overriding probability over the accuracy and quality of the information.
Another problem lies in obtaining information, because the databases used by Open AI are relatively opaque. Also, ChatGPT does not provide the sources of the information provided, which requires lengthy verification.
If capable of performing a series of tasks related to the search for information, this tool would benefit from being coupled with other intelligence methods (HUMINT, SIGINT, etc.) to become a real tool to support analysis and ultimately to become the decision .
limits of analysis
In terms of analytics, ChatGPT struggles to offer reliable analytics and forecasting capabilities due to the aforementioned limitations – lack of timeliness of its database and lack of sourcing. However, it is a very effective synthesis tool when trained on the appropriate data and thus able to create listings or create management analysis grids. For example, it can provide SWOT analysis, a PESTEL matrix, or even assess Porter’s forces in a market.
It therefore seems to be a valuable tool for strategic analysis: to study a market, provide analytical indicators and facilitate the fast and synthetic visualization of sometimes complex environments. Regarding the sector analysis, ChatGPT can also provide relevant recommendations in terms of influence, for example by mapping entities, stakeholders and their positions.
Rather than being a standalone tool capable of generating analysis independently, ChatGPT acts as a facilitator of the strategic analysis that underlies every decision.
A tool to support influence operations
In terms of impact, ChatGPT has a number of advantages, the most notable of which is being able to generate realistic and compelling textual content – like DALL-E 2 for images. Indeed, this tool should facilitate certain tasks related to influence communication: its ability to generate realistic but more or less truthful content (fake headlines or articles or posts on social networks, etc.) holds potential for deception and propaganda. The belief in a necessarily “objective” machine reinforces this potential.
ChatGPT thus offers the opportunity to democratize influence and industrialize the manipulation of information: a group of activists or a public affairs firm could usurp them to inundate the legislature with letters of protest, presumably from its constituents. In lobbying, a study showed that the tool can predict the relevance of an invoice to a given company based on its annual 10-K report filed with the Securities and Exchange Commission (SEC).
ChatGPT is not only capable of generating realistic content quickly and en masse, but also adapting it to the recipient. So he can take into account a certain point of view, a certain language level or even embody a certain role. This makes it possible to personalize the content and thus increase its persuasive power. This can include, for example, personalizing press releases for targeted journalists to ensure they cannot ignore them. Also, ChatGPT offers more discretion for online influencer operations: going beyond the repetition of identical content characteristic of bot campaigns, it allows bypassing the platforms’ detection tools.
However, these opportunities come with risks that are at least as great: although ChatGPT has been trained not to respond to requests deemed inappropriate, issues related to lack of context and sourcing remain. The potential for misinformation here is undeniable: one study shows the tool’s ability to recover false narratives hits 80%, legitimizing a range of questionable or conspiratorial speeches. ChatGPT offers an inextricable potential for noise: it produces corrupt content that its users are likely to disseminate online, which the tool then collects as legitimate content. The authors caution against the potential for manipulation of a tool so vulnerable to the poisoning of its training data that a hostile foreign actor could execute for the purpose of manipulating public opinion and destabilizing a country.
Protection of Information Assets
When it comes to protecting information, ChatGPT offers as many choices as it has disruptive features. In the cyber domain, thanks to the persuasiveness mentioned above, the tool is a social engineering tool when composing spam or phishing emails. Multilingualism also makes it possible to generate code in different programming languages, especially malicious programs such as malware or ransomware, despite the questionable quality.
When it comes to corporate data, ChatGPT poses several risks. First, the intellectual property of its model’s training data: the product Copilot, born of the partnership between Open AI and Microsoft’s platform and subsidiary, Github, is the subject of lively controversy. A legal complaint was filed against the three companies for a lack of recognition of the creators of the code that Copilot was trained in. In addition, there is the question of data leaks and the resulting risks of industrial espionage: With the expansion of the range of uses of ChatGPT, the risk of sensitive data increases. This is the case for internal data passed to ChatGPT by Amazon employees via the instructions and subsequently found in the tool’s responses. Finally, private models imported into the paid version of ChatGPT are supposedly protected from all access except that of the owner. While Open AI claims an adequate and transparent privacy policy, the question of its application to the ChatGPT tool remains open.
From the point of view of personal data, ChatGPT also poses several risks: with regard to the General Data Protection Regulation (GDPR), for example, the tool does not allow users to know the information stored about them, nor to request its deletion and therefore the right to be forgotten about their data to assert.
There is also a risk of confidentiality of the instructions given by users to ChatGPT, beyond the information disclosed by the users themselves. For example, a law firm could train the tool on legal documents containing personal data of stakeholders, which could then be leaked. Finally, one has to wonder what might happen to this data in case ChatGPT is bought by a third party.
Some recommendations
The AI revolution is brewing and ChatGPT is just the visible part of it. Despite the many decision support opportunities it offers, significant limitations and biases remain. This leads to several recommendations.
From the analyst or observer perspective, the limitations of ChatGPT can be mitigated by cross-referencing results coming from alternative methods or from multiple AI tools.
From the more general point of view of IE players, it seems reasonable to monitor the upcoming launch of GPT 4, since the number of its parameters could be multiplied by 500. It is also advisable to acquire suitable internal capacities to master these tools for the benefit of IE activities, especially in terms of human resources in “Prompt Engineering”.
Finally, it’s wise to remain vigilant regarding the legal ambiguities surrounding ChatGPT – which existing regulations (DSA, IA Act, etc.) struggle to cover – and the reputational risk that a user who doesn’t know their limits could run is aware.
Chloe office And John Morel for the AEGE Data Intelligence Club
For further :