This has been saving me a lot of time as well in a decade old code base. I can paste a stack trace and provide additional relevant context, then ask the LLM to do a first pass debug.
From that I usually get a list of file+lines to manually review, along with some initial leads to chase.
Another use case is when fixing performance issues. I can feature flag my fix and ask the model to confirm the new code path will produce the same result for a given set of inputs. We also have test coverage for this kind of thing, but the LLM can do a once-over and point out some flaws before I ever run those tests.
I haven’t gotten to the point where it writes much code for me beyond the auto-complete, which has been a modest boost in efficiency.
Yeah. As a debugging aid, I think it's fairly solid at surfacing things to look at and fix manually. And when you do that, you're actually hoping for more false positives than false negatives - which plays to the strengths of an LLM. When it comes to asking for rewrite suggestions for anything, I have to really go over its logic with a fine-tooth comb, because there are usually edge cases that can be spotted if you really think through it. I abhor its tendency to use try/catch. I've seen it write weird SQL joins that slow down queries by 30x. I'd never trust it to debug a race condition or to consider any side effects outside the 30 LoC it's currently looking at.
I guess I wouldn't trust it to confirm that new code would give the same result, but it can't hurt to ask, since if it told me the code wouldn't, that would make me look more closely at it.
I think as long as you look at it as part of a distillation process, and aim for false positives, and never actually trust it, it's good at helping to surface issues you may have missed.
From that I usually get a list of file+lines to manually review, along with some initial leads to chase.
Another use case is when fixing performance issues. I can feature flag my fix and ask the model to confirm the new code path will produce the same result for a given set of inputs. We also have test coverage for this kind of thing, but the LLM can do a once-over and point out some flaws before I ever run those tests.
I haven’t gotten to the point where it writes much code for me beyond the auto-complete, which has been a modest boost in efficiency.