Researchers have trained DeepStack, a deep learning algorithm, to become the first artificial intelligence program to beat human players at heads-up no-limit Texas hold ‘em poker.
Prior to DeepStack, gaming programs have won over human competitors in games involving perfect information: games like chess, or Jeopardy!, where all players have the same information at the same time. Poker, however, is a game of imperfect information where information is distributed asymmetrically, dependent upon the individual cards in each player’s hand.
Programming AI to manage asymmetric information becomes a much more complex process, as the correct decision at each play depends on probablility distributions based on information known only to opponents, revealed through their actions; which are also changed based on other player’s interpretations of one another’s actions.
Similar programs have attempted to win a competition involving imperfect information by mapping out a strategy for the entire game prior to beginning; this, however, is not a functional strategy for no-limit poker where varying bets can cause upwards of 10 to the power of 160 different iterations within each game.
The research team solved this problem by using counterfactual regret minimization (CFR) to teach the algorithm to adapt its strategy using recursive reasoning, in effect interpreting the actions of each player and completing limited probability scenarios to give itself the best chance of winning the game.
To keep the AI from planning a complete strategy with each play, however, the team trained DeepStack to create a fast approximate estimate of the current state of the game rather than attempting to reason all variants through to the end.
DeepStack was trained with over 10 million different examples of random poker situations. Then, 33 professional poker players from 17 different countries were recruited to play poker against DeepStack.
Each player was required to play a 3,000 game match over a four-week period. In total, DeepStack played 44,852 poker games against professional players
In the end, the team was able to show that DeepStack is theoretically sound, and able to beat professional poker players at heads-up no-limit Texas hold ‘em with an average win rate of over 450 milli-big-blinds per game (a measure of comparison that normalizes wins for different numbers of games with varying stakes.)
For comparison, 50 mbb/g is considered to be a large margin of winning by professional poker players, and 750 mbb/g is the measure for a win against a player that folds immediately.
As a practical application of an artificially-intelligent algorithm that is able to perform successfully with imperfect information, DeepStack has the potential for applications outside of the gaming arena, where perfect information is unavailable. These could include medical treatment recommendations, or defending strategic resources in a changing environment among other potential applications.