I am the author/maintainer of rs-poker ( https://github.com/elliottneilclark/rs-poker ). I've been working on algorithmic poker for quite a while. This isn't the way to do it. LLMs would need to be able to do math, lie, and be random. None of which are they currently capable.
We know how to compute the best moves in poker (it's computationally challenging; the more choices and players are present, the more likely it is that most attempts only even try at heads-up).
With all that said, I do think there's a way to use attention and BERT to solve poker (when trained on non-text sequences). We need a better corpus of games and some training time on unique models. If anyone is interested, my email is elliott.neil.clark @ gmail.com
Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?
E.g. given a small code execution environment, it could use some secure random generator to pick between options, it could use a calculator for whatever math it decides it can't do 'mentally', and they are very capable of deception already, even more so when the RL training target encourages it.
I'm not sure why you couldn't train an LLM to play poker quite well with a relatively simple training harness.
> Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?
I think an RL environment is needed to solve poker with an ML model. I also think that like chess, you need the model to do some approximate work. General-purpose LLMs trained on text corpus are bad at math, bad at accuracy, and struggle to stay on task while exploring.
So a purpose built model with a purpose built exploring harness is likely needed. I've built the basis of an RL like environment, and the basis of learning agents in rust for poker. Next steps to come.
what makes you say this? modern LLMs (the top players in this leaderboard) are typically equipped with the ability to execute arbitrary Python and regularly do math + random generations.
I agree it's not an efficient mechanism by any means, but I think a fine-tuned LLM could play near GTO for almost all hands in a small ring setting
To play GTO currently you need to play hand ranges. (For example when looking at a hand I would think: I could have AKs-ATs, QQ-99, and she/he could have JT-98s, 99-44, so my next move will act like I have strength and they don't because the board doesn't contain any low cards). We have do this since you can't always bet 4x pot when you have aces, the opponents will always know your hand strength directly.
LLM's aren't capable of this deception. They can't be told that they have some thing, pretend like they have something else, and then revert to gound truth. Their egar nature with large context leads to them getting confused.
On top of that there's a lot of precise math. In no limit the bets are not capped, so you can bet 9.2 big blinds in a spot. That could be profitable because your opponents will call and lose (eg the players willing to pay that sometimes have hands that you can beat). However betting 9.8 big blinds might be enough to scare off the good hands. So there's a lot of probiblity math with multiplication.
Deep math with multiplication and accuracy are not the forte of llm's.
Agreed. I tried it on a simple game of exchanging colored tokens from a small set of recipes. Challenged it to start with two red and end up with four white, for instance. I failed. It would make one or two correct moves, then either hallucinate a recipe, hallucinate the resulting set of tiles after a move, or just declare itself done!
We know how to compute the best moves in poker (it's computationally challenging; the more choices and players are present, the more likely it is that most attempts only even try at heads-up).
With all that said, I do think there's a way to use attention and BERT to solve poker (when trained on non-text sequences). We need a better corpus of games and some training time on unique models. If anyone is interested, my email is elliott.neil.clark @ gmail.com