“Games Student”, The Algorithm That Wins At Chess And Poker

A new algorithm called Student of Games is capable of beating various board games such as Chess, Go, Texas Hold’em Poker and Scotland Yard, a strategy game. The artificial intelligence program combines guided search, machine learning and game theory reasoning, as the researchers who developed it explain in the study published this Wednesday in the journal Science Advances. Previously, the AlphaZero algorithm could only solve games with perfect information, such as Chess and Go, where all players have access to the same information. However, he could not win at poker because it was a game with incomplete information and the opponents’ cards were not known.

The research was conducted while the experts worked at Google DeepMind, Google’s artificial intelligence research arm. However, several team members left Google in January 2022 and the company later laid off most of the remaining team in January 2023.

The tool is capable of winning in perfect and incomplete games with minimal knowledge. “Our algorithm is able to reason based on the rules of the game. For example, he learns to play them all (chess, poker, Go or Scotland Yard) using only the rules, without being given any further information,” explains Finbarr Timbers, mid-journey researcher and author of the study. “They allow you to determine what actions you can take and whether you have won or lost,” he continues.

More information

To know what steps to take at each point in time, the algorithm is based on what is known as “counterfactual regret minimization.” The focus is on analyzing all possible moves. According to Timbers, “regret” means “how well you could have done if you had played optimally, minus how well you actually played.” For example: If you won 200 chips in poker after some games, but had 1,000 in others can win, the regret is 800 chips. Therefore, Game Student’s goal is to reduce the 800 chips as much as possible. All possible scenarios with revealed cards, i.e. public information, are taken into account and the average of all cards is determined.

All possible scenarios converge to the Nash equilibrium, the theorem of the American mathematician John Nash. The players in a game play their strategies to maximize profits and adapt them to the moves of others as the game progresses. Timbers and his colleagues target the algorithm to find an optimal strategy in most situations.

Each game takes the participant into different scenarios. In chess, when you are at a specific position on the board, you can search through the possible moves to find the best move. However, this doesn’t work in poker. Timbers explains that one needs to consider the impact of plays in other situations: “If you bet big every time you have a strong hand, you’re showing your opponent that you have a good hand by betting aggressively.” Likewise, betrayed Tell your opponent what your hand is if you stop betting even though you have a weak hand.”

The British company DeepMind, owned by Google since 2014, has developed an algorithm called R-NaD that is able to play Stratego like a skilled human, a popular 40-chip game in which players take the opponent’s flag conquer or leave him without chips. R-NaD uses algorithmic tricks to achieve good performance, but without using the search method. For this reason, it is not as strong as a Student algorithm: “Literature has shown in the past that algorithms that search for possible actions are usually better in games than algorithms that do not use search, but they are slower and more expensive to train.” Timbers reveals.

Competitive artificial intelligence is used to measure the effectiveness of computer programs and provide a better gaming experience, but can also have negative effects: “Cheating is very likely to occur on poker betting websites and similar games.” “Many competitive video games will try to be inflexible with the software allowed on each player’s computer to ensure that no artificial intelligence is playing, which Riot Games is already doing with Valorant (2020),” explains Diego Rodríguez-Ponga Albalá, founder and director from Pontica. He points out that it is foreseeable “that very sophisticated artificial intelligence will be developed to automatically recognize whether the player is human or not.”

Gema Ruiz, head of innovation at Softtek EMEA, also points out other limitations of the algorithm, such as the use of betting abstractions in poker and “computational overhead.” The use of abstractions is to group similar games that are treated in the same way to reduce the complexity of the game. When practicing poker, the student uses random betting abstractions to reduce the number of actions from 20,000 to 4 or 5. The study suggests that its use could be replaced in the future by “a more comprehensive policy that can handle a variety of actions in game situations with a variety of possible decisions,” says Ruiz. In addition, enumerating all possible movements of the algorithm is associated with high costs and for this, according to the study, they propose a “generative model”. This generates state examples [estrategias] of the world and works with the subset of the selected samples rather than listing all possible hand combinations.

Nevertheless, for Ruiz, the tool is “a promising contender in the field of artificial intelligence-based gaming algorithms.” It highlights “its ability to improve performance with more computing resources, along with solid theoretical foundations.”

You can follow EL PAÍS technology on Facebook and X or sign up here to receive our weekly newsletter.