Sam Altman’s OpenAI o3 model—which was deprecated late last week with the release of GPT-5—demolished Elon Musk’s Grok 4 in four straight games Thursday to win Google’s Kaggle Game Arena AI Chess Exhibition.
You may think it was a super complex spectacle of high tech behemoths putting their reasoning to the ultimate test, but as an appetizer, let’s say world champion Magnus Carlsen compared both bots to “a talented kid who doesn’t know how the pieces move.”
The three-day tournament, which ran August 5-7, forced general-purpose chatbots—yes, the same ones that help you write email and claim to be approaching human-level intelligence—to play chess without any specialized training. No chess engines, no looking up moves, just whatever chess knowledge they’d randomly absorbed from the internet.
The results were about as elegant as you’d expect from forcing a language model to play a board game. Carlsen, who co-commentated the final, estimated both AIs were playing at the level of casual players who recently learned the rules—around 800 ELO. For context, he’s arguably the best chess player who ever lived, with an ELO of 2839 points. These AIs were playing like they’d learned chess from a corrupted PDF.
“They oscillate between really, really good play and incomprehensible sequences,” Carlsen said during a broadcast, following the game. At one point, after watching Grok walk its king directly into danger, he joked it might think they were playing King of the Hill instead of chess.
The actual games were like a masterclass in how not to play chess, even for those who don’t know the game. In the first match, Grok essentially gave away one of its important pieces for free, then made things worse by trading off more pieces while already behind.
Game two got even weirder. Grok tried to execute what chess players call the “Poisoned Pawn”—a risky but legitimate strategy where you grab an enemy pawn that looks free but isn’t. Except Grok grabbed the wrong pawn entirely, one that was obviously defended. Its queen (the most powerful piece in the board) got trapped and captured immediately.
By game three, Grok had built what looked like a solid position—good positional control, no obvious dangers, and basically a set up that can help you win the match. Then in mid game, it basically fumbled the ball directly to the opponent. It lost piece after piece in rapid succession.
This was actually weird, considering that before the match against o3, Grok was a pretty strong contender, showing solid potential—so much that the chess Grand Master Hikaru Nakamura praised it. “Grok is easily the best so far, just being objective, easily the best.”
The fourth (and last) game provided the only genuine suspense. OpenAI’s o3 made a massive blunder early in the game, which is a big danger in any reasonable match. Nakamura, who was streaming the match, said there were still “a few tricks” left for o3 despite the disadvantage.
He was right—o3 clawed back to win its queen back and slowly squeezed out a victory while Grok’s endgame play fell apart like wet cardboard.
“Grok made so many mistakes in these games, but OpenAI did not,” Nakamura said during his livestream. This was quite the reversal from earlier in the week.
The timing couldn’t have been worse for Elon Musk. After Grok’s strong early rounds, he’d posted on X that his AI’s chess abilities were just a “side effect” and that xAI had “spent almost no effort on chess.” That turned out to be an understatement.
Before this “official” chess tournament, International Master Levy Rozman hosted his own tournament earlier this year with less advanced models. He respected all the moves the chatbots recommended, and the whole situation ended up being a complete mess with illegal moves, piece summonings, and incorrect calculations. Stockfish, an AI built specifically for chess, ended up winning the tournament against ChatGPT. Altman’s AI was matched against Musk’s in the semifinals, and Grok lost. So it’s 2-0 for Sam.

However, this tournament was different. Each bot got four chances to make a legal move—if they failed four times, they automatically lost. This wasn’t hypothetical. In early rounds, AIs tried to teleport pieces across the board, bring dead pieces back to life, and move pawns sideways like they were playing some fever-dream version of chess they’d invented themselves.
They got disqualified.
Google’s Gemini grabbed third place by beating another OpenAI model, salvaging some dignity for the tournament organizers. That bronze medal match featured a particularly absurd drawn game where both AIs had completely winning positions at different points but couldn’t figure out how to finish.
Carlsen pointed out that the AIs were better at counting captured pieces than actually delivering checkmate—they understood material advantage but not how to win. It’s like being great at collecting ingredients but unable to cook a meal.
These are the same AI models that tech executives claim are approaching human intelligence, threatening white-collar jobs, and revolutionizing how we work. Yet they can’t play a board game that has existed for 1,500 years without trying to cheat or forgetting the rules.
So it’s probably safe to say we’re safe, AI won’t take control of humanity, for now.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.