Intelligent Poker Player

Orr Bernstein (ob29@cornell.edu), Jonathan Margulies (jm392@cornell.edu), Cliff Tsai (ct247@cornell.edu)
Cornell University, Department of Computer Science

Introduction

Problem Definition and Algorithm

Program Design

Experimental Evaluation

Conclusion

Future Work

References

Problem Definition and Algorithm

Problem Definition

We are interested in creating a program that will play limit hold'em poker, the most common form of poker in American casinos. Our goal is to evaluate different poker-playing models and to hopefully find which algorithms and heuristics are most effective.

Texas Hold'em

A hand of Texas Hold'em begins with the pre-flop, where each player is dealt two hole cards face down, followed by the first round of betting. Three community cards are then dealt face up on the table, called the flop, and the second round of betting occurs. On the turn, a fourth community card is dealt face up and another round of betting ensues. Finally, on the river, a fifth community card is dealt face up and the final round of betting occurs. All players still in the game turn over their two hidden cards for the showdown. The best five card poker hand formed from the two hole cards and the five community cards wins the pot. If a tie occurs, the pot is split. Texas Hold'em is typically played with 8 to 10 players.

Limit Texas Hold'em uses a structured betting system, where the order and amount of betting is strictly controlled on each betting round. 1 There are two denominations of bets, called the small bet and the big bet ($2 and $4 in this paper). In the first two betting rounds, all bets and raises are $2, while in the last two rounds, they are $4. In general, when it is a player's turn to act, one of five betting options is available: fold, call/check, or raise/bet. There is normally a maximum of three raises allowed per betting round. The betting option rotates clockwise until each player has matched the current bet or folded. If there is only one player remaining (all others having folded) that player is the winner and is awarded the pot without having to reveal their cards.

Texas Hold'em Terms

Board - The "community cards," the cards that are face-up on the table during a hand for everyone to share.

Hole Cards - A player's two face-down cards that belong only to that player and that only that player is allowed to see unless the player must show them in a showdown.

Showdown - In the event that multiple players are still playing when the betting is completed at the end of the river round, players show their cards one by one until a winner is determined. If a player elects not to show his/her cards, that player cannot win the hand. The last player to bet or raise in the hand is the first player, and the other players follow in clockwise order from that player.

Preflop - The round of betting that takes place after the hole cards have been dealt, but before the flop has been dealt.

Flop - After the preflop, three community cards are dealt face-up on the board for everyone to share and use as part of their hands. After the three cards are dealt, a round of betting ensues. Turn - After the flop, another face-up card is added to the board, and another round of betting ensues. This round is called the turn.

River - After the turn, a final card face-up is added to the board, and a final round of betting ensues. This round is called the river.

Outs - The number of cards left in the deck that might be added to the board in subsequent rounds that would help a given player advance to a winning position.

Pocket Pair - When a player's two hole cards both have the same value.

Pot Odds - The ratio of the amount of money a player must add to the pot to stay in the hand to the amount of money that is currently in the pot.

Algorithm Definition

HumanStrategy - HumanStrategy is the strategy that asks the user for input. The user is given the option Check/Fold, Check/Call, or Bet/Raise (the Bet/Raise option is only available until the 4 aggressive actions per round limit is reached) and the user's choices are applied to the game.

RandomStrategy - Whenever this strategy is asked to act, it picks a random integer from 0-2 and returns an action based on the integer. A 0 causes it to return Check/Fold, a 1 returns Check/Call, and a 2 returns Bet/Raise.

AggressiveStrategy - This strategy returns Bet/Raise any time an action is requested.

PreflopProbabilisticStrategy - This strategy uses probabilities to make decisions in the preflop round, but acts the same as AggressiveStrategy in subsequent rounds. Its action in the first round is based on a breakdown of all possible two-card hands into 9 hierarchical groups. Based on each group's expected performance against a random hand (raised to the power of the number of opponents, to reflect that number of random hands) this strategy computes its odds of being in the lead and compares those odds to the pot odds (how much money must be contributed to the pot to stay in the hand compared to how much money would be won from the pot if the player were to win the hand at that moment) and, depending on how favorably the odds compare, makes a decision. If the odds do not compare favorably, the player returns Check/Fold; if they do compare favorably, if the player determines there is a better than .5 probability that it is leading, it returns Bet/Raise, and otherwise it returns Check/Call.

This is, at its heart, a search strategy. It searches through all possible two-card hands and compares its hand to all of them, calculating how many of them it could beat. The more of these hands it can beat, the more money it is willing to put in before the flop.

ProbabilisticStrategy - This is an extension of the PreflopProbabilisticStrategy, where probability is used to make all decisions in every round instead of just the preflop round. This is also our slowest strategy (averaged over 1 minute per player who uses this strategy per hand) so it could not be used in any experiments. In this strategy, the preflop round is played as discussed in the PreflopProbabilisticStrategy. In subsequent rounds, this strategy searches through the space of all possible two-card random hands combined with all cards already on the board and all combinations of cards that could potentially be added to the board before the end of the hand, calculating how many such hands it will beat. It then makes its action decision in the same way as does the PreflopProbabilisticStrategy.

LookAheadProbabilisticStrategy - This is the same as the probabilistic strategy, with one small difference. In this strategy, in the flop round, the strategy only searches through all possible combinations of opponent hands combined with the cards already on the board and every possible single card that might be added to the board next, as opposed to searching through all possible combinations of opponent hands combined with the cards already on the board and every possible combination of two cards that might be added to the board before the end of the hand. This worsens the performance of the simulator slightly, but speeds up the flop decision from about 1 minute to about 10 seconds.

FastLookAheadProbabilisticStrategy - This strategy again uses the same preflop strategy as the PreflopProbabilisticStrategy, but uses a wholly different strategy for subsequent rounds. In the flop and turn rounds, it estimates its chances of winning using a very fast and interesting algorithm whose logic was taken from the University of Alberta's Poker Research Group. According to the research group, this algorithm hand potential within 5% of the full state space search in 95% of situations, and it does so in 3 orders less time. The idea of the algorithm is to try to estimate the probability that, if the player is not winning, a card or two cards that will put the player in the position of winning will come out. This is known as counting "outs". It then decides its action using the same logic as previous strategies, as determined by its estimated chances of winning and its pot odds. As in previous strategies, it estimates its chances of winning in the river round by counting the number of possible two-card hands that could be beating it (a very similar search to that of the preflop round).

LessPredictableFastLookAheadStrategy - This strategy is identical to the FastLookAheadProbabilisticStrategy except that when the player determines its odds of winning to be greater than .5 it does not necessarily bet. This is to add an element of unpredictability to the player, allowing it to disguise good hands by occasionally not acting aggressively with a good hand. In this strategy, when the computer determines its hand has a better than .5 chance winning, it returns Bet/Raise 80% of the time and Check/Call 20% of the time. The unpredictability is dubiously effective: The randomness only helps against agents that base their moves on their opponents', so while this randomness is helpful against humans and the best computer agents because it makes the player less predictable, it actually hurts the computer's play against dumb agents by causing it to not be able to extract as many bets from them (as it does not Bet/Raise as often with good hands).

PredictablePreflopModelingStrategy - It is widely accepted that the way an opponent plays the preflop round is the best determinant of the strength of the opponent's two "hole" cards. This model keeps statistics on how an opponent tends to play the preflop round and then, by comparing the opponent's play in a given preflop to the opponent's general preflop tendencies, tries to narrow down the opponent's possible hands. This model then takes that guess into account when making its preflop decisions. Essentially, this turns the normal preflop search done in the above strategies into a heuristic search, where the most weight is given to the hands that are most likely given the opponent's behavior.

PreflopOpponentModelingStrategy - Same as PredictablePreflopModelingStrategy, only with the same 80% Bet/Raise, 20% Check/Call action when the computer estimates its hand to be very good as in LessPredictableFastLookAheadStrategy. The unpredictability is more effective against agents that do some opponent modeling, so this model is more effective against its more predictable counterpart than the previous unpredictable model was against its predictable counterpart.

PredictableSecondModelingStrategy - This is the same as PredictablePreflopModelingStrategy except that it generalizes the hand guesses by applying them to all rounds. The model still only makes estimates of its opponents' hands during the preflop round, but it bases all its decisions in all rounds on those estimates.

SecondModelingStrategy - This is PredictableSecondModelingStrategy with the same 80%/20% unpredictablility added as before. Once again, this closes the gap even more between the unpredictable strategy and its predictable counterpart, as the PredictableSecondModelingStrategy is even more susceptible to this model's unpredictability as it does even more opponent modeling. It should be noted that both these models will generally beat any other model we designed, and both previous opponent modeling models generally beat all opponents we designed before them, so, although the opponent modeling makes these models more susceptible to unpredictability, it also improves their overall play.

Strategy Class Diagram:

Intelligent Poker Player Orr Bernstein (ob29@cornell.edu), Jonathan Margulies (jm392@cornell.edu), Cliff Tsai (ct247@cornell.edu) Cornell University, Department of Computer Science

Introduction Problem Definition and Algorithm Program Design Experimental Evaluation Conclusion Future Work References	Problem Definition and Algorithm Problem Definition We are interested in creating a program that will play limit hold'em poker, the most common form of poker in American casinos. Our goal is to evaluate different poker-playing models and to hopefully find which algorithms and heuristics are most effective. Texas Hold'em A hand of Texas Hold'em begins with the pre-flop, where each player is dealt two hole cards face down, followed by the first round of betting. Three community cards are then dealt face up on the table, called the flop, and the second round of betting occurs. On the turn, a fourth community card is dealt face up and another round of betting ensues. Finally, on the river, a fifth community card is dealt face up and the final round of betting occurs. All players still in the game turn over their two hidden cards for the showdown. The best five card poker hand formed from the two hole cards and the five community cards wins the pot. If a tie occurs, the pot is split. Texas Hold'em is typically played with 8 to 10 players. Limit Texas Hold'em uses a structured betting system, where the order and amount of betting is strictly controlled on each betting round. 1 There are two denominations of bets, called the small bet and the big bet ($2 and $4 in this paper). In the first two betting rounds, all bets and raises are $2, while in the last two rounds, they are $4. In general, when it is a player's turn to act, one of five betting options is available: fold, call/check, or raise/bet. There is normally a maximum of three raises allowed per betting round. The betting option rotates clockwise until each player has matched the current bet or folded. If there is only one player remaining (all others having folded) that player is the winner and is awarded the pot without having to reveal their cards. Texas Hold'em Terms Board - The "community cards," the cards that are face-up on the table during a hand for everyone to share. Hole Cards - A player's two face-down cards that belong only to that player and that only that player is allowed to see unless the player must show them in a showdown. Showdown - In the event that multiple players are still playing when the betting is completed at the end of the river round, players show their cards one by one until a winner is determined. If a player elects not to show his/her cards, that player cannot win the hand. The last player to bet or raise in the hand is the first player, and the other players follow in clockwise order from that player. Preflop - The round of betting that takes place after the hole cards have been dealt, but before the flop has been dealt. Flop - After the preflop, three community cards are dealt face-up on the board for everyone to share and use as part of their hands. After the three cards are dealt, a round of betting ensues. Turn - After the flop, another face-up card is added to the board, and another round of betting ensues. This round is called the turn. River - After the turn, a final card face-up is added to the board, and a final round of betting ensues. This round is called the river. Outs - The number of cards left in the deck that might be added to the board in subsequent rounds that would help a given player advance to a winning position. Pocket Pair - When a player's two hole cards both have the same value. Pot Odds - The ratio of the amount of money a player must add to the pot to stay in the hand to the amount of money that is currently in the pot. Algorithm Definition HumanStrategy - HumanStrategy is the strategy that asks the user for input. The user is given the option Check/Fold, Check/Call, or Bet/Raise (the Bet/Raise option is only available until the 4 aggressive actions per round limit is reached) and the user's choices are applied to the game. RandomStrategy - Whenever this strategy is asked to act, it picks a random integer from 0-2 and returns an action based on the integer. A 0 causes it to return Check/Fold, a 1 returns Check/Call, and a 2 returns Bet/Raise. AggressiveStrategy - This strategy returns Bet/Raise any time an action is requested. PreflopProbabilisticStrategy - This strategy uses probabilities to make decisions in the preflop round, but acts the same as AggressiveStrategy in subsequent rounds. Its action in the first round is based on a breakdown of all possible two-card hands into 9 hierarchical groups. Based on each group's expected performance against a random hand (raised to the power of the number of opponents, to reflect that number of random hands) this strategy computes its odds of being in the lead and compares those odds to the pot odds (how much money must be contributed to the pot to stay in the hand compared to how much money would be won from the pot if the player were to win the hand at that moment) and, depending on how favorably the odds compare, makes a decision. If the odds do not compare favorably, the player returns Check/Fold; if they do compare favorably, if the player determines there is a better than .5 probability that it is leading, it returns Bet/Raise, and otherwise it returns Check/Call. This is, at its heart, a search strategy. It searches through all possible two-card hands and compares its hand to all of them, calculating how many of them it could beat. The more of these hands it can beat, the more money it is willing to put in before the flop. ProbabilisticStrategy - This is an extension of the PreflopProbabilisticStrategy, where probability is used to make all decisions in every round instead of just the preflop round. This is also our slowest strategy (averaged over 1 minute per player who uses this strategy per hand) so it could not be used in any experiments. In this strategy, the preflop round is played as discussed in the PreflopProbabilisticStrategy. In subsequent rounds, this strategy searches through the space of all possible two-card random hands combined with all cards already on the board and all combinations of cards that could potentially be added to the board before the end of the hand, calculating how many such hands it will beat. It then makes its action decision in the same way as does the PreflopProbabilisticStrategy. LookAheadProbabilisticStrategy - This is the same as the probabilistic strategy, with one small difference. In this strategy, in the flop round, the strategy only searches through all possible combinations of opponent hands combined with the cards already on the board and every possible single card that might be added to the board next, as opposed to searching through all possible combinations of opponent hands combined with the cards already on the board and every possible combination of two cards that might be added to the board before the end of the hand. This worsens the performance of the simulator slightly, but speeds up the flop decision from about 1 minute to about 10 seconds. FastLookAheadProbabilisticStrategy - This strategy again uses the same preflop strategy as the PreflopProbabilisticStrategy, but uses a wholly different strategy for subsequent rounds. In the flop and turn rounds, it estimates its chances of winning using a very fast and interesting algorithm whose logic was taken from the University of Alberta's Poker Research Group. According to the research group, this algorithm hand potential within 5% of the full state space search in 95% of situations, and it does so in 3 orders less time. The idea of the algorithm is to try to estimate the probability that, if the player is not winning, a card or two cards that will put the player in the position of winning will come out. This is known as counting "outs". It then decides its action using the same logic as previous strategies, as determined by its estimated chances of winning and its pot odds. As in previous strategies, it estimates its chances of winning in the river round by counting the number of possible two-card hands that could be beating it (a very similar search to that of the preflop round). LessPredictableFastLookAheadStrategy - This strategy is identical to the FastLookAheadProbabilisticStrategy except that when the player determines its odds of winning to be greater than .5 it does not necessarily bet. This is to add an element of unpredictability to the player, allowing it to disguise good hands by occasionally not acting aggressively with a good hand. In this strategy, when the computer determines its hand has a better than .5 chance winning, it returns Bet/Raise 80% of the time and Check/Call 20% of the time. The unpredictability is dubiously effective: The randomness only helps against agents that base their moves on their opponents', so while this randomness is helpful against humans and the best computer agents because it makes the player less predictable, it actually hurts the computer's play against dumb agents by causing it to not be able to extract as many bets from them (as it does not Bet/Raise as often with good hands). PredictablePreflopModelingStrategy - It is widely accepted that the way an opponent plays the preflop round is the best determinant of the strength of the opponent's two "hole" cards. This model keeps statistics on how an opponent tends to play the preflop round and then, by comparing the opponent's play in a given preflop to the opponent's general preflop tendencies, tries to narrow down the opponent's possible hands. This model then takes that guess into account when making its preflop decisions. Essentially, this turns the normal preflop search done in the above strategies into a heuristic search, where the most weight is given to the hands that are most likely given the opponent's behavior. PreflopOpponentModelingStrategy - Same as PredictablePreflopModelingStrategy, only with the same 80% Bet/Raise, 20% Check/Call action when the computer estimates its hand to be very good as in LessPredictableFastLookAheadStrategy. The unpredictability is more effective against agents that do some opponent modeling, so this model is more effective against its more predictable counterpart than the previous unpredictable model was against its predictable counterpart. PredictableSecondModelingStrategy - This is the same as PredictablePreflopModelingStrategy except that it generalizes the hand guesses by applying them to all rounds. The model still only makes estimates of its opponents' hands during the preflop round, but it bases all its decisions in all rounds on those estimates. SecondModelingStrategy - This is PredictableSecondModelingStrategy with the same 80%/20% unpredictablility added as before. Once again, this closes the gap even more between the unpredictable strategy and its predictable counterpart, as the PredictableSecondModelingStrategy is even more susceptible to this model's unpredictability as it does even more opponent modeling. It should be noted that both these models will generally beat any other model we designed, and both previous opponent modeling models generally beat all opponents we designed before them, so, although the opponent modeling makes these models more susceptible to unpredictability, it also improves their overall play. Strategy Class Diagram: