LLMs, not unexpectedly, acquire new skills as they advance. You just need to know how to properly measure their performance.
Researchers believed that large language models (LLMs) could bring unpredictable improvements. This then posed a serious problem for the development of artificial intelligence. A recent study has refuted this postulate. Appropriate monitoring makes it possible to know when LLM acquire SKILLS.
Let's remember that the The most important language models are crucial for the development of artificial intelligence. OpenAI's very popular generative chatbot, ChatGPTbased on LLMs like GPT-4 Or GPT-4 Turbo.
The BIG Bench, LLMs and their skills
It's 2022. More than 400 researchers start a major project to test large language models. Nickname BIG Bankthis consists of handing over to LLM a sequence of 204 tasks. For most tasks, performance increases regularly depending on the size of the model. In some cases, researchers find that a Leap in performance after a latency period.
The authors call this behavior a “breakthrough,” similar to a Phase transition in physics. They also illustrate the unpredictability of this behavior. This unexpected development then increases Questions about security. Actually, a Generative AI The unexpected can be dangerous.
BIG Bench conclusions are questioned
This week, three Stanford University researchers published a paper detailing their work on the BIG Bench findings. This new research shows that the sudden appearance of skills only plays one role the consequence of how researchers measured LLM performance.
It's not the skills neither unexpected nor sudden, supports the trio of scientists. “The transition is much more predictable,” he said Sanmi Koyejo. Note that this Stanford computer scientist is the lead author of the article.
Additionally, LLMs train by analyzing large amounts of text. They determine the connections between words. The more parameters, the more connections the model can find. Let's remember this GPT-2 owns 1.5 billion parameterswhile GPT 3.5 in one 350 billion. GPT-4 running Microsoft Copilotneeded 1.75 trillion parameters.
The importance of the method of assessing LLMs and their skills
The rapid growth of large language models has led to impressive improvements in their performance. The Stanford trio recognizes that these LLMs are becoming increasingly popular more effective as they evolve. However, the improvement depends on it Choice of metrics for evaluationand not the internal processes of the model.
According to the BIG Bench GPT-3 from OpenAI and LaMDA from Google showed a sudden ability to solve addition problems with more parameters. However, this is “Origin“Depends on the metric used, says the new study. For a metric that gives partial credit, there appears to be improvement gradual and predictable.
In short, this development of the concept of emergence is not trivial. It will certainly encourage researchers to develop one Science of predicting the behavior of large language models.
Our blog is reader-run. When you purchase through links on our site, we may earn an affiliate commission.