Behind The Scenes Of ChatGPT

This content is produced by Laval University.

Contents

ChatGPT was launched in late November 2022 and quickly surprised the world with its amazing performance. The text generation application was able to deceive many readers, even the most attentive ones, because they were unable to distinguish texts created by artificial intelligence (AI) from those written by a human. But how could what many thought was impossible yesterday become reality so quickly?

“The explanation for this rapid rise of artificial intelligence and ChatGPT can be viewed as a triangle, the three vertices of which are equally important. First, the computing power of computers has increased dramatically. Second, the amount of quality data for training neural networks has exploded. Thirdly, there have been several innovations in the architecture of neural networks,” explains Professor Nicolas Doyon.

Nicolas Doyon, professor in the Department of Mathematics and Statistics and researcher at the CERVO Research Center, explained the rise of AI and Chat GPT in his talk “The Mathematical Secrets of ChatGPT”. (Provided by Laval University)

At the invitation of the Continuing Education Department of the Faculty of Science and Technology to hold a general public conference on this topic, this professor from the Department of Mathematics and Statistics and researcher at the CERVO Research Center spoke about some milestones in the history of AI and certain scientific and mathematical principles popularized, on which the success of the famous computer application rests.

A champion chess machine

One of the greatest achievements in artificial intelligence dates back to 1996, when the computer Deep Blue managed to beat world chess champion Garry Kasparov. Deep Blue was programmed to create a tree of possibilities, assign a value to the final positions of the tree's various branches, and then determine the best possible move.

However, this approach, which worked well in chess, was less suitable for the game of Go, whose board forms a 19 x 19 grid – providing many more possibilities for moves than chess' 8 x 8 format. The tree of possibilities became too large even for a computer. “That’s why,” says Nicolas Doyon, “the researchers then said to themselves: ‘This doesn’t correspond to the way we think at all.’ How could we take inspiration from the way the human brain and neurons work to improve artificial intelligence? »»

Mimic neurons

By studying how human neurons work, we found that they do not respond to all messages they receive. A message must reach a minimum threshold for the neuron to emit a so-called action potential, which always has the same strength and shape regardless of the intensity of the original message. This action potential is passed on to the next neuron via a synapse. It's an all-or-nothing law.

However, synapses are not just used to transfer information from one neuron to another; Their plasticity would play a central role in learning. In fact, researchers have found that the connection strength of synapses changes over time. “Simply put: the more frequently a synapse is used, i.e. the more it transmits an action potential to the next neuron, the stronger it becomes.” Under the microscope we can clearly see that the dendritic spine, an area of the neuron, becomes larger, when a person learns. In short, by getting bigger and stronger, the synapse gradually changes the way we think,” explains the professor.

How can these biological facts be represented mathematically? “One way to transfer the all-or-nothing law into mathematics,” answers Nicolas Doyon, “is to use the Heaviside function.” In mathematics, functions often go continuously from 0 to 1. “The Heaviside function, on the other hand,” he explains, “is a function that has the value 0 until the input to the function reaches a certain threshold. Then suddenly it goes to 1.”

All or nothing can be represented mathematically by the Heaviside function. (Provided by Laval University)

“To illustrate the role of the synapses,” he adds, “we assign weights to the different inputs of the neuron.” From the graph, we can see that after determining the numerical values of the inputs, we assign these values to the weight of the synapse, add the results of these multiplications to obtain a weighted sum, and finally we check whether this output value meets the required threshold, resulting in 0 or 1.

1706345446 780 Behind the scenes of ChatGPT – La Tribune

“To illustrate the role of synapses, we assign weights to the different inputs of the neuron,” explains Professor Nicolas Doyon. (Provided by Laval University)

Train the network

In recent years, artificial intelligence has made major breakthroughs thanks to the development of deep learning. “We now work with neural networks with several layers: an input layer, intermediate layers and an output layer. Between a neuron in one layer and a neuron in another layer there is a connection strength, also called synaptic weight, and as the network learns, each of these weights is adjusted,” notes Nicolas Doyon.

And how does the network learn? Through training, the researcher states. Consider the case of a neural network tasked with confirming whether the photo is that of a cat or a dog. We will assign a value of 0 to the cat and a value of 1 to the dog. To train the network, we will use thousands or even millions of images of these little creatures and examine the percentage of well-classified images. If the network does not give the correct answer, it did not get the correct output value because the synaptic weights were not well matched. We will therefore continue to adjust these weights until we achieve a very high success rate.

But how do I adjust the weights? “One of the things we use is the gradient descent method. To illustrate this, we can imagine a person trying to descend to the base of a mountain as quickly as possible. This is easy to imagine if there are only two inputs. On the x-axis we represent the success rate associated with different weights by which we multiplied the first entry, and on the y-axis we represent the success rate associated with different weights by which we multiplied the second entry. The error is displayed on the Z axis. Then it is possible to visualize the point where the error is smallest and try to adjust the weights so that they move in that direction,” explains Professor Doyon, who adds in the same breath that the principle, Although always the same, it is more difficult to visualize in reality when the number of parameters to be adjusted is in the millions or even billions.

We adjust the synaptic weights using the gradient descent method. (Provided by Laval University)

Math and reading at the heart of ChatGPT

The exact numbers are of course not disclosed publicly, but we can estimate that ChatGPT needs to adapt a network of 60 to 80 billion neurons, 96 layers and 175 billion weights. For comparison: there are around 85 billion neurons in the human brain. “The comparison remains a bit lame,” agrees Nicolas Doyon, “because our neurons are not quite similar to artificial neurons, but we are roughly in the same order of magnitude.”

When the computer application is asked to define itself, it responds: “ChatGPT uses a deep neural network structure. It is important to note that ChatGPT does not possess deep understanding or self-awareness. The answers are based solely on the statistical probabilities of the words or phrases.” To generate a text, ChatGPT calculates the probabilities that another word sequence will follow from a word sequence and then suggests the most likely sequence.

To achieve this, ChatGPT had to train billions of data points. The content of this reading is of course subject to confidentiality. However, it can be assumed that the network was trained on over 300 billion words. “If you read 300 words per page and one page per minute 24 hours a day, you would have to read for 1900 years to absorb that much information,” explains the mathematician, to get an idea of the scale of the problem, using the library as a basis for learning ChatGPT.

“If you read 300 words per page and one page per minute 24 hours a day, you would have to read for 1,900 years to absorb that much information.”

– Nicolas Doyon on the supposed 300 billion words that make up the ChatGPT training database

Between amazement and fear

ChatGPT's sometimes breathtaking performance captures the imagination of some who see the future as a science fiction movie where artificial intelligences dominate the world. However, it is not this scenario that worries those among scientists who would like to see greater regulation of AI development. Rather, their intention is to prevent certain slips associated with human use. They also want us to take the time to better understand and analyze the negative impacts of this technology.

“What could possibly go wrong? Apparently students can use ChatGPT to cheat. Plus, people can lose their jobs. Recently, striking writers in Hollywood called for limiting the use of AI in screenwriting,” recalls Nicolas Doyon.

In addition, the professor reveals, other problems are less obvious and more insidious. “For example,” he says, “AI in the area of facial recognition would more easily recognize white men than women or people who are visible minorities.” This fact is a little surprising since we imagine a neutral artificial intelligence. It can't be sexist or racist. But because the AI was likely trained on a database that contained more male and white faces, it inherited our mistakes.”

Another example the professor gives comes from DeepL, a translation application that uses the same principles as ChatGPT. “If we ask DeepL to translate “she reads” into Hungarian, he says, we get “ὄ olvassa.” If we ask him to translate the same Hungarian words into French, he will say “il lit”. For what? Since the database has a statistical bias, the male subject is more often found before the verb “read”.

The often hidden environmental problem should not be taken lightly. “People think that AI is virtual and has no impact on the environment. However, according to an article, ChatGPT drinks 500 ml of water every time you talk to him. This image was used to remind us that massive amounts of water are required to cool supercomputers. In addition to this resource, ChatGPT also requires a lot of energy. Some say that AI will soon use as much electricity as an entire country,” says Professor Doyon.

So what does the future of AI and ChatGPT look like? “I don’t know,” Professor Doyon answers humbly. “Are there things ChatGPT can never do? I have no answer. Every month we hear that Chat GPT has done something new. It is impossible to know where this will all end,” concludes the mathematician.

For an overview of Nicolas Doyon's work
Learn more about past and upcoming general public conferences organized by Continuing Education at the Faculty of Science and Engineering
Watch the conference “The Mathematical Secrets of ChatGPT”: