[Poker #BrainsVsAI update] Does the bot ever bluff? 🤖🎲💸
Last week I watched Libratus, the poker AI designed by Carnegie Mellon graduate student Noam Brown and CS Professor Tuomas Sandholm, play against four high stakes professional poker players. After two days I called the match for the computer. Libratus was up over $20/hand at 50/100 blinds of deep stacks play ($20k, or 200 big blinds).
To put that into perspective, a player would lose by $75/hand if he folded every time. Although quite a few “good” poker players would probably lose almost that much in a long match against one of these professional heads-up specialists. Especially once those pros started to notice patterns and pick apart their game.
Don’t call it a comeback
I said that the humans would come back a bit as they adjusted to the AI’s style, but would then start to get tired, eventually losing something like $15/hand to Libratus. This would be a statistically significant.
Three days later, my prediction was not looking good.
The players had booked a small loss (cutting the AI’s win rate in half), then they booked a small win, and finally a huge six figure victory. At night they had been studying the hands of their matches, comparing notes, and seemed to have concocted some new strategies to counter the computer.
I commented on the different preflop and value-bet sizes made by the players since their cold start, but they are maintaining Omertà about game adjustments. As is occasional Twitch stream visitor Doug Polk, a famous heads-up poker player who led the professional poker team against Claudico, in a previous 👥 vs 🤖 match at Carnegie Mellon back in 2015.
The players were looking good. They had cut the AI’s lead to 50k, and some even noticed that humanity would be up, had a half-day of hands not been invalidated.
The four player contest is set up as a two pairs of duplicate matches — each day the pair play opposite sides of the same set of hands. They initialize the deck shuffler with the same “random seed” for you programmer folk — thus ensuring that the cards are dealt in the same order in both halves of the match. The pair does not communicate, and half of the Twitch streams are delayed with the players remaining silent. This creates a huge pain for tournament organizer (and Libratus developer) Noam, but the fans have responded well to this features, which allows us to see both sides of all-ins and other interesting situations.
Primarily, the feature is there to reduce variance. Thus the hands accidentally played with un-matched random seeds had to be voided, once the delayed stream showed that the cards weren’t being replicated.
People who have not played high stakes poker could image the total wins and losses — they can relate these to other “jobs” — but they can not comprehend the swings.
The hand to hand difference between big winners and break-even players is small, and any short term results in a game of decent opponents is overwhelmingly due to variance. Good players win more tournaments than average players, but each individual winner was most likely the luckiest player on that day. Poker is specifically designed to ensure that while skill wins out in the long run, every decent player has a chance in the short and medium term.
After steadily improving their results for four days, the pros lost 180k on Tuesday.
Dong Kim won +10k on the day, but Jason lost 100k, and Jimmy and Daniel both lost, while playing two sides of the same cards. Over the course of the day, the players lost $28/hand — as bad as their cold start on the first day, despite a week of practice against the AI and some secret adjustments — which looked to be working well the day before.
Of course this is also a small sample.
But seriously, does the bot ever bluff?
It certainly does.
The players have been impressed by the AI’s ability to over-bet bluff the pot, even on the river with no chance of winning if called. On Monday during the players’ six figure winning session, I remember Jason tanking for a long time with a pocket pair on a draw-heavy board.
Jason raised preflop with Ten-Ten on the button. The flop came King-9–4 with two clubs, and there was more betting. The turn brought the 5x (no club) and it went check-check. The river was a non-club Queen and Libratus shoved allin for 3x the pot.
It’s easy to see the AI betting on a draw early in the hand, and then trying to take the pot away. It’s made moves like this before, with mixed results. I thought that Ten-Ten (no clubs) in Jason’s hand made it even more likely that the AI would be holding a flush draw — more clubs and fewer Tens left in the deck. Others thought it less likely that the AI would bluff here, as it is not likely to have a Ten “blocker” — preventing Jason from holding Jack-Ten for the nut straight.
Eventually, Jason folded. Half an hour later we saw Dong get dealt 7–3 of clubs on hand #175. The AI raised big and Jason threw his cards away. This is what variance looks like — even with duplicate poker.
After another tough day, Libratus is back up over $10/✋. It’s possible that my original prediction will turn out close to correct.
I’m rooting for the humans to turn it around.
In other news, famous nosebleeds poker player Doug Polk, who led the Brains vs Claudico match at CMU in 2015, will be stopping by Pittsburgh and commenting on the stream. Be sure to tune in on Twitch, probably on Jason’s stream. Here are Doug’s comments on the match so far.
I have no insight into the AI making adjustments. Okham’s razor says that the players are just getting tired. I don’t think that self-adjusting heads up poker AI is here yet, at least not one that’s better than playing a strong equilibrium strategy in every spot — including accounting for card replacement as Sam Ganzfried pointed out on day one. Which Libratus seems to be very good at.
Noam and Professor Sandholm did a great job to build the AI, and to put this match together. Let’s wait for them to explain how Libratus plays after the match, as they said they would
Live on stream
Check out the Twitch links below for more beats, bluffs and variance. It’s rare that you get to watch high-end players compete and observe one player’s cards in real time — for free and unedited. Win or lose, Libratus has proven to be a worth opponent for some of the best heads-up pros in the world.
Join also for the wit and trolling of the peanut gallery, daily shoutouts and BitCoin updates, and a running thread on whether the AI adjusts between games. I have no idea, but this seems like a lot of work, so I imagine that it doesn’t. Just waiting for the variance to swallow us all.