A few hours of playing around and I'm suitably impressed.
Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly. Additionally Claude Code has developed a number of bugs, including rapidly re-scrolling the terminal buffer, pegging local CPU to 100%, and consuming vast amounts of RAM. Codex CLI was woefully behind a few months ago and, despite overly conservative out-of-the-box sandbox settings, has quite caught up to Claude Code. (Gemini CLI is an altogether embarrassing experience, but Google did just put a solid PM behind it and 3.0 Pro should be out this month if we're lucky.)
Codex with 5.1 high managed to thoughtfully paw through the documentation and source code and - with a little help pulling down parts of the Swift Book - managed to correctly resolve the issue.
I remember getting the thread manager right being one of the harder parts of my operating systems course doing an undergrad in computer science; testing threaded programs has always been a challenge. It's a strange circle-of-life moment to realize that what was hard for undergrads also serves as a benchmark for coding agents!
"including rapidly re-scrolling the terminal buffer"
Yes this bug is brutal.
"consuming vast amounts of RAM"
Also this. Claude will leave hanging instances all the time. If you check your task manager after a few days of using it without doing a full reset you'll see a number of hanging Claude processes using up 400 mb of RAM.
Claude actually has a huge number of very painful bugs. I'm aware of at least a dozen.
Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly. Additionally Claude Code has developed a number of bugs, including rapidly re-scrolling the terminal buffer, pegging local CPU to 100%, and consuming vast amounts of RAM. Codex CLI was woefully behind a few months ago and, despite overly conservative out-of-the-box sandbox settings, has quite caught up to Claude Code. (Gemini CLI is an altogether embarrassing experience, but Google did just put a solid PM behind it and 3.0 Pro should be out this month if we're lucky.)
Codex with 5.1 high managed to thoughtfully paw through the documentation and source code and - with a little help pulling down parts of the Swift Book - managed to correctly resolve the issue.
I remember getting the thread manager right being one of the harder parts of my operating systems course doing an undergrad in computer science; testing threaded programs has always been a challenge. It's a strange circle-of-life moment to realize that what was hard for undergrads also serves as a benchmark for coding agents!