CMU’s Libratus Bluffs its way to Victory in #BrainsVsAI Poker Match 🤖

14 min readFeb 1, 2017

After three week and 120,000 hands of heads-up No Limit Hold’em poker, the computer showed us who’s boss.

It was a pretty serious test for the computer. The four professional poker players, Jason Les, Dong Kim, Jimmy Chou and Daniel McAulay — are “in the top 10, maybe in the top 15” heads-up poker players in the world, according to fellow top-10 player and poker YouTube start Doug Polk.

Five top-15 heads-up poker players in one place 🤔

The computer program was created by CMU graduate student Noam Brown, as well as Professor Tuomas Sandholm, a top-5 legend in the field of AI science. They named the program Libratus (perhaps that’s Latin for 🇺🇸), but the players called their tormentor “the bot,” and then eventually “Libby,” as the days wore on and it became clear that the humans could not win the match.

Hand History

The bot jumped out to a big lead over the first two days. It doesn’t show up in the graph above since there were fewer hands played on the first couple of days as the setup got ironed out, but Libratus was up about $25/hand after each player had played a couple thousand hands. After one day, it was clear that Libby was playing a more nuanced game than its predecessor Claudico — which narrowly lost to a similar group of players in 2015.

At that point, I wrote that the players would adapt their game to the bot — they are the best in the world at adjusting to heads-up opponents — and narrow the match. But then they would get tired, it would be difficult for them to play their best every day, and the bot would end up winning about $15/hand (including profits from its early lead).

That is exactly what happened. The players came close to net zero after six days and cut the bot’s lead to $2/hand.

But then it was all Libby for two straight weeks.

The final score: Libratus won $1.7M in chips over 120,000 hands, or $14.7 per hand.

That’s a significant win. I don’t know what official error bars were announced, but two standard deviations over this sample is roughly $8/hand, assuming that the daily standard deviation was about ±$100k. In any case, it was a significant victory. The AI won by something like +4 standard deviations, if you’re into that sort of thing.

It’s also a solid win rate from a human perspective. The No Limit Hold’em game was played with $50/$100 blinds and 200 big blinds per player ($20,000 chips total). A player would lose $75/hand by folding every time. Victories of $20-$40 per hand are normal between decent AIs (like my deep learning poker bot) and state of the art AIs in the Annual Computer Poker Competition. Winning margins would be similar between top poker pros (like those who took on Libby) and normal good poker players. If a weak player took on one of the top pros in a long match, he might be better off folding every hand — or folding most hands except for very strong ones, to lock in something like a $50/hand loss.

I won’t keep recapping the results — I’m sure most of you are more interested in how the AI works, and what all of this means — but let me make a quick note on Jason Les’s results, which add up to about half the human team’s losses.

It is easy to read a lot into the graph. Jason took the biggest loss, but he also booked some winning sessions — while talking tech, Bitcoin, professional poker, and shoutouts on Twitch. Did Jason get tired? Did he get unlucky? You can not be sure. 40,000 hands per player sounds like a lot, but it really isn’t. All of the players had about the same result halfway through the match — and all were down to to the bot 🤖.

Maybe Jason got demoralized after Coach Doug sent him to the bench.

“It’s been established, you guys are losers. But let’s not be big losers” — Coach Doug

Like many coaches, Doug may not have understood small samples. But coach’s favorite Dong Kim did take home the MVP 🏆 and the biggest slice of the $200k (real money) prize pool — as the best-performing grinder. This 💸 is for humans only.

How does Libby do it?

Graduate student Noam Brown has been building poker bots at Carnegie Mellon since 2014, and possibly longer than that unofficially. However, he wrote the code for Libratus in a matter of months. Noam and the players did a Reddit AMA near the end of the match. It’s worth a full read, and my summary of how the bot plays is drawing largely from there.

At a high level, the AI tries to find a Nash equilibrium solution for heads up Hold’em — a strategy over all possible game situations, which can not be exploited by an opponent in the long run. There are 10^165 possible game states, an incomprehensibly large number. A lot of that complexity comes from taking the same path with a lot of different bet sizes. The players start $20,000 chips deep, and technically they can bet any number of chips, between the minimum bet ($100) and the maximum bet (allin!).

In practice, it is not necessary to consider every bet size in order to solve for an approximate Nash equilibrium. Until recently, it was standard not to even consider every unique card combination. Instead, poker AIs would create a “card abstraction” — grouping similar hands into buckets, making it easier to solve for the Nash equilibrium in a smaller game space.

So why was Libratus stronger than previous approximate Nash equilibrium solutions? According to Noam on Reddit:

Libratus uses no card abstraction — all legal card arrangement are considered as-is.
Libratus “trains itself” through self-play over millions of hands.
Noam and Professor Sandholm trained their AI — as well as the live “endgame solver” — on 200 nodes (linked computers) of the Pittsburgh Supercomputing Center.
Libratus starts with a baseline strategy, but also uses an “endgame solver” to refine its strategy in real time against the players (only on fourth street and the river)

What follows is my take on how it’s working. I’m also simplifying a bit for clarity. Please correct me if I make a mistake, and feel free to ask questions.

Considering every possible arrangement of cards (no card abstraction) is expensive, so when Libratus trains itself through self-play, it only allows bets to be made in specific size. These bet sizes might be {min bet, 20% pot, 50% pot, 1x pot, 2x pot, allin}. That way, a very complicated game like No Limit Hold’em can be reduced to a decision tree, in this case with six branches from each node— actually more branches, if not betting is allowed. Here is an example game tree from Noam’s AAAI paper about endgame solving — a tree for a hidden information game simpler than Hold’em.

I’m glossing over how the computer plays learns from playing itself, and settles on a approximate equilibrium solution for the poker game. A game in which you can only bet a few fixed amounts. In practice, this approximate equilibrium strategy is pretty hard to beat — as long as the players stick to making the same sized bets as the computer expects.

Since the players bet whatever amounts they feel like, the computer needs to re-evaluate its strategy whenever the players go “off book.” With endgame solving, this means that the computer solves for a new Nash equilibrium, in a world where the player’s bet size is a allowed, as well as all of the normal bet sizes that the computer considers.

This sounds insanely complicated — to solve the whole game again every time the human bets — but Libratus uses a few tricks to keep the response time reasonable.

Libratus plays its pre-computed strategy pre-flop and on the flop, to save time — and because endgame solving earlier in the hand would be exponentially slower, while being less likely to change the baseline strategy.
A solver should consider every possible response, and every possible path — if I raise $300 and my opponent re-raised to $900, you’d have to consider every possible hand that my opponent might raise to $900 in this situation.
But in practice, it is ok to consider just 50 or 100 (or however many) randomly sampled responses, to decide what you should do.
Libratus uses “Monte Carlo CFR” to quickly computer a solution for the current game situation, which also respects the non-exploitability of other game situations.
The 200-node supercomputer cloud helps. Noam notes that his AI could play even faster if it were hooked up to Google’s datacenter.

In practice, the solver went to work on the turn and took about 20 seconds to act. Occasionally it would respond instantly, if a player checked back for example, thus not changing the previous game solution. But most of the time late in the hand, the started resolving for best response.

By solving each game state — approximately solving, but still — Libratus was able to consider exact bet sizes made by the players, as well as individual card combinations without any generalizations. As noted earlier, CMU poker AI alum and Claudico developer Sam noticed early on that the bot was much better than before at considering every card in its hand before deciding when to bluff.

Poker sharks know that it’s better to bluff at the end, when the cards in your hand are “blocking” some of your opponent’s possibilities of having a big hand. At the same time, it’s better to “catch a bluff” by calling with a hand whose cards “block” your opponent as well. In a previous post, I broke down a hand where Jason perhaps should have called with Ten-Ten on a King-9-4–5-Queen + two clubs board.

Libby shoved on the river for 3x the pot, polarizing his range between a huge hand and a pure bluff. Jason’s two tens block the Jack-Ten straight, while his lack of clubs make it more likely that the AI was drawing for a flush. Sure enough, Libratus held the 7-3 of clubs for a busted flush.

Did the Bot call an audible?

Halfway through the match, the players noticed the AI playing with noticeably differences. They asked if the humans might be adjusting it at night — I asked if they may not just be hallucinating from the bad beats, the time waiting for Libratus’s endgame solver, and the casino sandwiches.

They told me they might be losing their mind, but it wasn’t that. The bot went from a “mixed strategy” preflop, to a fixed-size raise before the flop. Play after the flop also changed, in less describable ways. They were doing their best to adjust to the AI, then the bot changed its strategy and improved.

Here is one story that fits those facts.

A big part of Noam’s job is to reduce the Hold’em game from a ludicrously big problem to something more manageable. He committed to not using a card abstraction, as considering every card as-is was one of Libby’s non-negotiables. The other way of speeding up the player would be to use a smaller bet abstraction — in other words to consider fewer bet sizes in the tree.

Deploying fewer different bets against the players, and considering fewer bet responses obviously can’t help the AI play better. But it’s possible that it doesn’t make much of a difference in the endgame solver’s final answer, while making the tree much easier to manage. Remember, after the turn the AI solves for the best response in the specific game, with the specific cards, and the specific bet sizes made so far. Looking at a smaller number of bet responses by the AI might not matter.

If my theory is correct — and I don’t know that it is — you would indeed see a simpler preflop strategy, as well as a strategy that’s going to be different on every future street. Even if the new strategy isn’t better in theory, now the players have to adjust from what they had already adjusted to. That might have been a bridge too far.

If we want to go deeper for a minute, there are effects from a simpler betting abstraction that the players could have taken advantage of:

The AI doesn’t endgame solve the game pre-flop, so humans could find specific good bet sizes that the AI mis-interprets. It needs to map every human bet to a computer bet from a fixed set of bets. A human might get the AI to fold too often, if it maps bets to bigger bets in its abstraction.
Once the pot gets big, within range of an all-in bet, the human could use a similar trick to exploit the AI, which can only consider calling or going allin in response. Again, the idea is to make something like a $3,000 bet, which the AI maps to a $5,000 bet.
As the computer uses a bigger and bigger betting abstraction, these holes become smaller variations from ideal play, and harder to find.

I’ve found tricks like that against less complex equilibrium AIs such as Slumbot, which you can play online for free. Slumbot finished 2nd in last year’s Annual Computer Poker Competition, less than $2/hand behind the winner — also from CMU. Slumbot a very strong bot, but it uses card abstractions, a betting abstraction, and no endgame solving. At least that was true about the 2016 Slumbot. Who knows what’s coming this year.

Correction

According to Professor Sandholm in The Verge, Libratus uses a mix of high-level of strategies against the players, and optimizes hyper-parameters at night while the humans are sleeping, adjusting the mix for the next day. In other words, it eliminated strategies that were least effective against the players after several days.

I find this surprising. If CMU was using something like a multi-armed bandit, I would think that it might reduce but not eliminate the mixed preflop strategy. Otherwise, what happens if the players found a good preflop betting counter-strategy and started winning? How would the mixed strategy get re-introduced into the fold.

In any case, it was not a case of overt human intervention. And I still wonder whether the players could have exploited the narrower bet abstraction that the AI ended up adopting. It’s hard for us humans to keep making adjustments.

Addendum

Former CMU graduate student, Claudico co-creator (and Miami-based AI professor) Sam Ganzfried explains why Libratus won big.

Where does this lead?

As impressive as chess and Go AIs have been at beating human world champions, these are full-information games. In other words, you can play these games perfectly, approaching an unexploitable Nash equilibrium, by looking only at the board and making the greedy best move independent of past and future moves. In math, we call this a Markov process. It doesn’t matter how you got here. What you need to know for this decision is not dependent on how you got to the current state — only what the current state is.

The Markov property does not poker, as it is not to in life. In a game of hidden information, the path taken to get to the current decision point matters. It matters because the previous bets, combined with your own hidden cards, change the range of cards that your opponent is likely to have. As the betting gets heavy, it is more likely that your opponent is holding a strong hand. It is also more likely that he is bluffing.

The same is true in a negotiation, or any other type of real-life game.

Whether you negotiate for a new job or are selling a house, there is information that one side has that the other side does not, all in the context of randomness and uncertainty. You do not necessarily sell a house because you know that it has hidden problems, any more than the Yankees trade away a pitcher because they know that he is injured. Just as a poker player can’t fold every hand until he gets deal Aces.

Sophisticated poker AIs can beat the best players in the world, at least in heads-up big bet Hold’em. The bots — Libratus at CMU, and DeepStack at the University of Alberta — are custom solutions, tailor made for heads-up poker. However they show that there is probably no game that can humans master, which the computer can’t learn to play even better given a good problem setup, a clever graduate student, and a boatload of computing resources.

For one, I would like to see strong AIs for strategy computer games. The automated players for these games — be they Civ, Grand Theft Auto, or a first person shooter, usually either follow a hand-tuned script, or operate in “God mode” where the players see what a human can’t see — and then are artificially limited in their ability to attack the humans. This makes for pretty unrealistic opposition. How great would it be if you could play against human-like realistic computer opponents?

Bots have been playing StarCraft (without cheating) for years — but not very well

Google’s DeepMind as well as Facebook’s FAIR AI research lab are working on deep learning agents for StarCraft — which aim to play the game at a high level with the same information that’s given to a human player. It’s been slow going, but we thought the same thing about AlphaGo before it went from “pretty strong player” to beating the human world champion in about a year’s time.

Can these 🤖 help me play better poker?🤔

Soon, but probably not yet. AI-based poker training come, I’m sure. AI will transform the process of learning poker, as it transformed learning chess. Remember that, all you bot haters who think that the AIs will ruin the game. Chess is bigger than ever.

But what can you do for now? I know that my own heads-up game improved dramatically, from the week I spent in Pittsburgh with the players. Professor Sandholm was vehement that I could not discuss strategy with the players, so it was more observation than anything. Much of the humans vs Libby match on the Twitch streams was recorded — I saw the big external HDs — and will be available on YouTube at some point. Stay tuned for that. I’ll post a link once that’s available.

For now, there are a few ways that you can boost your game:

Check out Doug Polk’s YouTube videos. I can’t tell you how rare it is for a top player to give away so much strategy for free and with good production value. It might not be obvious how sharp the analysis is, since Doug is having such a good time doing it.

Play against Slumbot — the strongest approximate equilibrium heads-up Hold’em AI on the internets— and it plays fast to boot. Slumbot author Eric “Action” Jackson — who was my colleague on Google’s search algorithms team a decade ago — will explains how Slumbot can play so good, so fast, in his talk during this week’s AAAI Poker AI workshop.
Slumbot lets you to practice new strategies in a way that you never could against a human. For example, I learned a heads-up strategy where most of my bets are either 1/3 of the pot, or 1.5x the pot. I’d love to get the chance to practice that against a human — would have to be online of course.
You can also practice leveling — how to induce Slumbot to bluff? When to 4-bet bluff and how to do it?

If you’re the bookish type, Ed Miller’s books are still the best! His “Poker’s 1%: The One Big Secret That Keeps Elite Players On Top” is the best practical explanation of GTO I’ve ever read — in poker or otherwise. #GTOforLife

I’ve got more thoughts on what I learned about poker from playing against equilbirium bots and watching Dong, Jason, Jimmy and Daniel battle against Libratus. But sounds like a topic for a separate post. If you have thoughts or questions, do comment below.

And yes, I think equilibrium based bots could be great at heads-up PLO. Everybody thinks so. It’s a much bigger state space, so you’re back to card abstractions — and actually it’s a much longer loop just to traverse all possible card configurations. Other than that…

Press

The showdown in Pittsburgh received a lot of press. If you want to read a more succinct, less opinionated take on this match from professional writes, here are some options. Let me know if I missed a good one.

Great job by Noam, Professor Sandholm, CMU, the Rivers Casino, and the poor humans who had to sit there and range against the machine.