GPT, PaLM, Llama: What Is An AI Language Model?

Artificial intelligence is experiencing an unprecedented wind of democratization thanks to the advent of conversational agents like ChatGPT. They make it possible to show one of the most spectacular aspects of the power of deep learning and artificial intelligence to the general public: by demonstrating how to learn and discuss with people’s language without missing the point.

Contents

You’ve probably wondered how, in such a short time, we’ve gone from a relationship with the machine that’s essentially done through code to this simple natural language that just invites you to imagine talking to another human. This is where the language model comes into play. While it remains a computer model using vectors and functions, it acts as a buffer between two entities that everything seemed separate: the human and the machine.

Introduction to the language model

Language models are therefore computer systems with the task of translating natural language into the machine and allowing it to understand, analyze and respond to requests, to translate, to summarize, but also to simulate imagination and reflection and to take into account what previously could not be schematized and theorized in terms of functions: cultural nuances, feelings and emotions.

Since 2017, there has been a revolution with the advent of LLMs (Large Language Models) such as Google’s Transformers, raising the relevance of text comprehension and natural language response to unprecedented levels. From now on, the mathematical models generated by the machines merge with real human intelligence, devouring astronomical amounts of data and basing their responsiveness on billions of parameters.

The words are no longer analyzed or generated one after the other: the machine manages to grasp a statement as a whole in the shortest possible time and to offer analyses, summaries, translations or even corrections and tests much faster than any human being.

glossary

In order to get into the definition of the language model and ultimately explain how a conversational agent works, we need to use certain expressions and vocabulary. Before we start, let’s try to create a little glossary.

Chatbot: Commonly known as “chatbot”, it is an application for sending text requests and getting responses. ChatGPT and Google Bard are chatbots based on AI and a language model. So far, conversational agents have existed in a more limited form, such as in digital assistants.

Sequential data: All sentences, paragraphs and documents are examples of sequential data. In natural language processing, the order of words in a sentence or other unit of text is critical to understanding the overall meaning.

Inputs and outputs: The inputs are generally the sequential data sent by the internet user. The outputs are the sequential data produced by the machine considering the inputs and other parameters like previous sequential data, language model data obtained through its training, etc.

Ideas : In order to be able to understand and analyze an input and to be able to suggest answers to it, the language models use a whole series of parameters that are obtained through the training of an AI. These parameters are called weights, which are adjusted to match the examples in the database. The more parameters there are, the better the linguistic model should be able to analyze the inputs and propose more complex outputs.

NLP (Natural Language Processing): all areas of competence of the discipline of natural language processing. This can be translation, summary, generation or text classification. Tools like ChatGPT combine all of this.

As with image-generating AIs, language models translate text input into computer language before generating natural language text again © Nvidia

Defining a language model for AI

The language model behind a ChatGPT or Bard conversational agent is a system that allows a machine to understand and generate text in natural language, that of humans. In order to be able to speak in a language, understand the context, sense the tone and all other subtleties, as well as the cultural aspect, learn patterns and suggest relevant answers, the language model needs to draw on a large amount of data and know how to process it and apply it correctly to user input.

Some constrained language models are based on a purely statistical model, while others, such as B. LLMs, work with machine learning. The most successful language models are able to analyze the relationship between all words and phrases as well as the context and remember the previous sequences of text to account for a temporal context. It is based on many more parameters and introduces other techniques like “tokens” and “masks”.

Language models are the heart of conversational agents: they come into play when the user’s text input is converted into a sequence of numbers, called vectors, which are analyzed via multiple types of encoders and decoders. Methods follow that have evolved from the N-gram model to the LLM via recurrent networks. Its role ends when the text output is generated by the machine and then displayed on the internet user’s screen.

The role of vectors

In all cases, these models are mathematical models that predict the likelihood that a word or phrase will appear in the sentence. The models therefore go through real computer processing – it’s not magic. They are translated into algorithmic models, meaning the system converts inputs to numbers before then converting them back to text for outputs.

In the meantime, they become sequences of numbers called vectors. In particular, in the NLPs, these vectors allow a classification relative to the other words and thus the determination of approximations between them. The number of digits in a vector determines the number of dimensions in the model. These vectors are crucial in defining the meaning of each word, making natural language mathematical, and ultimately allowing human understanding and language to be mimicked.

Over time, vectors have also been used to accommodate more subtle aspects of natural language, such as innuendo, emotion, and humor.

The role of artificial intelligence

The language model is central to the artificial intelligence universe of natural language processing (NLP). What role does artificial intelligence play then? In order to be able to create a high-quality language model, it is necessary to consider a huge amount of data, if only to make a classification of the words between them, their similarities, their differences, etc. Neural networks make it possible to do the gigantic work that a human would do effortlessly and in record time.

The importance of AI in language models is therefore particularly when training from very large amounts of text data – in general, LLMs are pre-trained on a set of more than 10,000 billion words (10B), in particular from Common Crawl, The Pile, MassiveText, Wikipedia and GitHub. But artificial intelligence is also present to support the model’s ability to provide contextual and intelligent answers, and particularly through continuous learning.

Today, the capabilities of language models have advanced with the advent of machine learning and even deep learning. Language models reach 70 B-parameters on Metas Llama 2 and 175 billion on OpenAIs GPT-3. The most important models known (and used) by the general public are Large Language Models (LLM). However, to arrive at what we know today, these models drew on other, more limited models, each of which played its part in the design of LLMs.

The role of a recurring network

Before arriving at the LLM model we know today, linguistic models were initially based on the concept of recurrent networks. These models process text data numerically, analyzing each vector of each word with a thought vector. The thought vector follows the same principle and is therefore optimized after each new word added to a sentence. Like a human brain that, reading word by word, recognizes the meaning of a sentence and allows you to think about a sentence thanks to each word read in succession, thanks to this vector the recurrent network can have a good understanding and provide more relevant results and in context.

The Role of the Transformers

Large Language Models (LLM) didn’t come to market overnight. Between them and the returning networks came several improved versions that tried to correct the shortcomings in understanding the models, particularly due to the memory limitation and the weighting of the importance of the words. Particularly noteworthy are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). However, the real revolution in language models dates back to 2017 when Google researchers presented the Transformers model, which led to the birth of the most popular LLM principle.

A simplified representation of how the Transformers architecture works, developed by Google in 2017 and used by LLM-like language models such as GPT-3 (OpenAI), Bert (Google), XLNet, and RoBERTa. © Wikimedia

Transformers differ from recurrent networks in their approach: instead of analyzing each word, an entire sentence or series of sentences is analyzed. Each vector associated with a word is then weighted according to the principle of tokens and masks. Transformers are an architecture that enables a new way of modeling contextual and textual data. Enough to get rid of the problem of memory, the place of words in the sentence and the establishment of non-local relationships of words.

“Masks” are characterized by two types: causality filters, to affect one vector instead of another, depending on the context of the sentence, and padding filters, which don’t affect understanding or answers, but only allow to normalize different length sentences into same-sized sentences (we’re still at the math, everything has to be square) by adding words…which are useless and shouldn’t be considered by the machine.

Tokens, on the other hand, enrich the vectors of recurring networks by accounting for many more things and allowing a mutual dimension in understanding each word. For example, there is the token of features (called “embeddings”) that are added to layers of attention that weight the importance of each word in a sentence and provide a connection between each word without complicating the overall meaning. The various language models continue to optimize these tokens and their processing.

Google, which first introduced BERT, has since introduced LaMDA (also written as Lambda) and finally PaLM (for general language understanding and integration of multiple information sources). OpenAI is also based on a transformer model with GPT-3, GPT-3.5 and GPT-4. The first version of its language model dates back to 2018. Today, GPT-4 is characterized by a larger number of inputs (it is not limited to text input and also accepts images or audio) and the parameters would be significantly larger than GPT-3’s 175 billion weights.

The limits of the language model

Two things have to be distinguished here: the limits of language models and the limits of the language model in general. Challenging one type of language model is different from challenging the scalability of the language model as a whole.

However, are there solutions to act completely differently when interacting with the machine through natural language? So far, all language models have their limitations, but any path to improvement still leads through the same overall principle of language model and algorithmic enrichment…far from any comparison with the human soul and human conscience.

The principle of the language model thus remains particularly dependent on high-quality education with access to important data, but not indefinitely (lack of the necessary computer resources). At the same time, strictly speaking, language models know nothing. They just draw analogies and don’t remember anything. Therefore, invented reactions predominate, more often compared to “hallucinations”.

Ultimately, looking at linguistic models to understand conversational agents is akin to opening the doors of a data center to understand the workings (and limitations) of the Internet. The magic that ChatGPT offers today can be explained and the fruits of its labor are ready-made systems that were conceived and implemented by people.

At Meta, the development of artificial intelligence is partly a French story. After several years of collaboration with Jérome Pesenti, researcher and Touring Prize winner Yann LeCun last June discussed the topic of a new language model called JEPA (Joint Embedding Predictive Architecture), with the great advance of “having machines that are at least as intelligent as humans, if not more,” explained the head of scientific research on AI at the Facebook parent company. With JEPA, the architecture of the language model would take into account new factors to “understand the underlying world”.

“These days, machine learning is really bad compared to what humans can do. […] So we’re missing something big,” added Yann LeCun, who didn’t mince his words, also declaring that “today’s AI and machine learning really sucks.” Humans have common sense, machines don’t. For him, it’s all about the cognitive aspect, the actual functioning of the human brain. Language models that deal too much with mere theories about language and word weight.

Conclusion: When will the magic trick really overtake us

Thanks to language models, artificial intelligence has learned to speak. From the N-gram model to the Large Language Models (LLM), it’s at the heart of Conversational Agents and is ultimately the real surprise netizens were able to discover with the release of ChatGPT late last year. Today, as Google, Meta, OpenAI, and so many others refine their own technology, they all rely on the system logic of language models to connect human and machine in a near-perfect illusion of dialogue between two individuals.

However, in the discussions we prefer to associate the wow effect with artificial intelligence in the broadest sense, without citing and explaining the workings of the system of transcription of natural language into number sequences, vectors of “tokens”, “masks”, as well as inputs and outputs, which have little to do with cognitive learning, which are perfectly known by their creators. But enough of the criticism, the language model is and remains for now the only one that offers conversational agents and other tools in the NLP field the means to its ambitions.

Cognitive logic and quantum computing

Inference, analysis, and reflection are thus mimicked…but the results conversational agents offer, even if they are bluffing, leave the mystery of their magic trick patently open. The language model must therefore continue to grow in size and usability in the future. In the near future, hardware will offer the greatest opportunity for improvement: the advent of quantum computing will enable language models and AI to go beyond current standards.

Not all models to date are equal, and some fall back on basic logic at a limit: focus on a single goal and set aside global artificial intelligence—the conversational agent that has an answer for everything. Many companies will also find it more interesting to specialize in one area, especially in research (like medical and biological research) and the community that collectively exchanges ideas and advances on open-source models already meets on a well-known platform in the AI world: Hugging Face.

We need to address the limitations of the language model more broadly. Questioning its existence is questioning the very essence of how NLPs work to this day. From there, new systems could emerge, encompassing a larger system that this time will attempt to simulate and mimic human-specific cognitive abilities. From there, the magic trick gets bigger – the illusion will no longer be the rabbit in the hat, but the magician himself.

Are we ever going to stick with magic?