It was a simple process, and we generated millions of hands pretty quickly. We simply had the DumbAI play against itself repeatedly, and evaluated each hand position using the PokerStove hand evaluation library. Instead, we decided to try generating our own poker hands using the Dumb AI we created. Which means that data on hands is extremely valuable, and therefore quite expensive.
Poker holdem bot professional#
Why? Because professional poker players pay (try saying that fast!) a lot of money to find and analyze possible moves. We couldn't find a freely available set of poker hands being played across, for example, a Poker Tournament, or otherwise play by play records of poker games. The first thing we discovered when we started hunting for datasets: There were NO datasets. Since finding real datasets was impossible, we decided to first have our neural network emulate our dumb AI, and then improve from there through self play and reinforcement learning. Secondly, we needed a starting point for our neural network to simulate. First, we needed a quick and efficient method of generating millions of datasets, and since we couldn't find any reasonable datasets online, we manufactured them by using this dumb AI to simulate multiple players on the same table.
Poker holdem bot how to#
However, this "AI" wouldn't win any games or indeed be very effective against any opponent who knew how to counter it or knew the rules it worked on.
Poker holdem bot series#
Thsi was simply a series of nested if statements designed to play each game out to the best of it's ability. The very first thing we did was create a "dumb" AI.
Q-learning penalizes actions that may end up badly in the future.
For our purposes, it will suffice to know that: A complete explanation of Q-Learning can be found here. Q-learning is the specific reinforcement learning technique we wanted to apply to our PokerBot. We wanted to explore the possible benefits of using Q-Learning to create a poker bot that automatically learns the best possible policy through self-play over a period of time. However, Libratus does not use current deep learning and reinforcement learning techniques, as outlined by the AlphaGO or Deepmind papers. Libratus, a Poker playing Neural Network developed by Carnegie Mellon University, applies Reinforcement Learning techniques along with standard backpropagation and temporal delay techniques in order to win against Poker players across the world, including the winners of past Poker Grand Tournaments. Reinforcement Learning has grown in popularity in recent years, and since Google Deepmind's AlhpaGo emerged victorious against Lee Sedol and other GO Grandmasters, Reinforcement Learning has been proven to be an effective training method for neural networks, especially in cases of deterministic and non-deterministic gameplay.