I'm actually in awe. I wish I had lists like these for other "hot-button" issues where the common narrative is that things are constantly on the brink of some kind of catastrophe or resolution. Really puts things into perspective.
i recommend writing prompts for Gemini / ChatGPT along the lines of "as a history professor put X in perspective. Compare the evidence for / against. Be sure to include grounding on each fact ..."
I think that mostly depends on how good a writer you are. A lot of people aren't, and the AI legitimately writes better. As in, the prose is easier to understand, free of obvious errors or ambiguities.
But then, the writing is also never great. I've tried a couple of times to get it to write in the style of a famous author, sometimes pasting in some example text to model the output on, but it never sounds right.
It depends how you define "good writing", which is too often associated with "proper language", and by extension with proper breeding. It is a class marker.
People have a distinct voice when they write, including (perhaps even especially) those without formal training in writing. That this voice is grating to the eyes of a well educated reader is a feature that says as much about the reader as it does about the writer.
Funnily enough, professional writers have long recognised this, as is shown by the never-ending list of authors who tried to capture certain linguistic styles in their work, particularly in American literature.
There are situations where you may want this class marker to be erased, because being associated with a certain social class can have negative impact on your social prospects. But it remains that something is being lost in the process, and that something is the personality and identity of the writer.
You may be in a bubble of smart, educated people. Either way, one of the key ways to "put in the effort" is practice. People who haven't practiced often don't write well even if they're trying hard in the moment. Not even in terms of beautiful writing, just pure comprehensibility.
I may be in a bubble of smart people, but IMO AI consistently far worse than many high school works I’ve read in terms of actual substance and coherent structure.
Of course I’ve had arguments where people praise AI output then I’ve literally pointed out dozens of mistakes and they just kind of shrug saying it’s not important. So I acknowledge people judge writing very differently than I do. It just feels weird when I’d give something a 15% and someone else would happily slap on a B+.
With the gap between 1 and 2 being driven by the underlying quality of the writer and how well they use AI. A really good writer sees marginal improvements and a really poor one can see vast improvements.
I am really conflicted about this because yes, I think that an LLM can be an OK writing aid in utilitarian settings. It's probably not going to teach you to write better, but if the goal is just to communicate an idea, an LLM can usually help the average person express it more clearly.
But the critical point is that you need to stay in control. And a lot of people just delegate the entire process to an LLM: "here's a thought I had, write a blog post about it", "write a design doc for a system that does X", "write a book about how AI changed my life". And then they ship it and then outsource the process of making sense of the output and catching errors to others.
It also results in the creation of content that, frankly, shouldn't exist because it has no reason to exist. The number of online content that doesn't say anything at all has absolutely exploded in the past 2-3 years. Including a lot of LLM-generated think pieces about LLMs that grace the hallways of HN.
> A lot of people aren't, and the AI legitimately writes better.
It may write “objectively better”, but the very distinct feel of all AI generated prose makes it immediately recognizable as artificial and unbearable as a result.
Yes, it's been odd to observe the parallels with the web3 craze.
You asked people what their project was for and you'd get a response that made sense to no one outside of that bubble, and if you pressed on people would get mad.
The bizarre thing is that this time around, these tools do have a bunch of real utility, but it's become almost impossible online to discuss how to use the tech properly, because that would require acknowledging some limitations.
Very similar to web3! On paper the web3 craze sounded very exciting: yes, I absolutely would love an alternate web of truly decentralized services.
I've been pretty consistently skeptical of the crypto world, but with web3 I was really hoping to be wrong. What's wild is there was not a single, truly distributed, interesting/useful service at all to come out of all that hype. I spent a fair bit of time diving into the details of Ethereum and very quickly realized the "world computer" there (again, wonderful idea) wasn't really feasible for anything practical (I mean other than creating clever ways to scam people).
Right now in the LLM space I see a lot of people focused on building old things in new ways. I've realized that not only do very few people work with local models (where they can hack around and customize more), a surprisingly small number of people write code that even calls an LLM through an API for some specific task that previously wasn't possible (regular ol'software build using calls to an LLM has loads of potential). It's still largely "can some variation on a chat bot do this thing I used to do for me".
As a contrast, in the early web, plenty of people were hosting their own website, and messing around with all the basic tools available to see what novel thing they could create. I mean "Hamster Dance" was it's own sort of slop, but the first time you say it you engaged with it. Snarg.net still stands out as novel in it's experiments with "what is an interface".
>As a contrast, in the early web, plenty of people were hosting their own website, and messing around with all the basic tools available to see what novel thing they could create
I'm hoping that the already full of slop centralized platforms now with LLM fueled implosion will overflow and lead to a renaissance of sorts for small and open web, niche communities and decoupling from big tech.
It's already gaining traction among the young, as far as I can see.
Walk! At 50 meters, you'll get there in under a minute on foot. Driving such a short distance wastes fuel, and you'd spend more time starting the car and parking than actually traveling. Plus, you'll need to be at the car wash anyway to pick up your car once it's done.
Am I the only one who thinks these people are monkey patching embarrassments as they go? I remember the r in strawberry thing they suddenly were able to solve, while then failing on raspberry.
Nah. It's just non-deterministic. I'm here 4 hours later and here's the Opus 4.6 (extended thinking) response I just got:
"At 50 meters, just walk. By the time you start the car, back out, and park again, you'd already be there on foot. Plus you'll need to leave the car with them anyway."
Sure there are many very very naive people that are also so ignorant of the IT industry they don't know about decades of vendors caught monkeypatching and rigging benchmarks and tests for their systems, but even so, the parent is hardly the only one.
Thats my thought too. The chatbot bros probably feel the need to be responsive and there's probably an express lane to update a trivia file or something lol
No doubt about it, and there's no reason to suspect this can only ever apply to embarassing minor queries, either.
Even beyond model alignment, it's not difficult to envision such capabilities being used for censorship, information operations, etc.
Every major inference provider more or less explicitly states in their consumer ToS that they comply with government orders and even share information with intelligence agencies.
Claude, Gemini, ChatGPT, etc are all one national security letter and gag order away from telling you that no, the president is not in the Epstein files.
Remember, the NSA already engaged in an unconstitutional criminal conspiracy (as ruled by a federal judge) to illegally conduct mass surveillance on the entire country, lie about it to the American people, and lie about it to congress. The same organization that used your tax money to bribe RSA Security to standardize usage of a backdoored CSPRNG in what at the time was a widely used cryptographic library. What's the harm in a little bit of minor political censorship compared to the unconstitutional treason these predators are usually up to?
That's who these inference providers contractually disclose their absolute fealty to.
I want to get my car's transmission fixed, the mechanic shop is 50 meters away. Would it be better to walk or to drive to a location 50 meters away from my front door?
-
Opus 4.6:
<thinking>
The user is asking whether to walk or drive 50 meters to a mechanic shop, but there's a key nuance here - their transmission needs fixing, so driving might not be wise.
Done
</thinking>
Walk, absolutely. Driving a car with a compromised transmission even 50 meters risks making the problem worse and turning a repair into a more expensive one. Plus, 50 meters is barely a minute on foot.
There's probably a comedy film with an AGI attempting to take over the world with its advanced grasp of strategy, persuasion and SAT tests whilst a bunch of kids confuse it by asking it fiendish brainteasers about carwashes and the number of rs in blackberry.
(The final scene involves our plucky escapees swimming across a river to escape. The AIbot conjures up a speedboat through sheer powers of deduction, but then just when all seems lost it heads back to find a goat to pick up)
This would work if it wasn’t for that lovely little human trait where we tend to find bumbling characters endearing. People would be sad when the AI lost.
In the excellent and underrated The Mitchells vs the Machines there's a running joke with a pug dog that sends the evil robots into a loop because they can't decide if it's a dog, a pig or a loaf of bread.
There is a Star Trek episode where a fiendish brainteaser was actually considered to genocide an entire (cybernetic, not AI) race. In the end, captain Picard choose not to deploy it.
One thing that my use of the latest and greatest models (Opus, etc) have made clear: No matter how advanced the model, it is not beyond making very silly mistakes regularly. Opus was even working worse with tool calls than Sonnet and Haiku for a while for me.
At this point I am convinced that only proper use of LLMs for development is to assist coding (not take it over), using pair development, with them on a tight leash, approving most edits manually. At this point there is probably nothing anyone can say to convince me otherwise.
Any attempt to automate beyond that has never worked for me and is very unlikely to be productive any time soon. I have a lot of experience with them, and various approaches to using them.
I think this lack of 'G' (generality, or modality) is the problem. A human visualizes this kind of problem (a little video plays in my head of taking a car to a car wash). LLM's don't do this, they 'think' only in text, not visually.
A proper AGI would have have to have knowledge in video, image, audio and text domains to work properly.
4.6 Opus with extended thinking just now:
"At 50 meters, just walk. By the time you start the car, back out, and park again, you'd already be there on foot. Plus you'll need to leave the car with them anyway."
You're right, and I enjoy using coding agents too. I've built some things with them I wouldn't have otherwise.
However, it's been a full quarter now since November 2025.
Based on facts on the ground, i.e. the rate and quality of new software and features we observe, change has been nowhere as dramatic as your comment would suggest.
It seems to me that a possible explanation is that people get very excited about massive speedups in specific tasks, but the bottleneck of the system shifts somewhere else immediately (e.g, human capacity for learning, team coordination costs, communication delays).
That "full quarter" included the Christmas holidays for many people, during which not a lot of work gets done.
I think it's a bit early to expect to see huge visible output from these new tools. A lot of people are still spinning up on them - learning to use a coding agent effectively takes months.
And for people who are spun up, there's a lot more to shipping new features and products that writing the code. I expect we'll start to see companies ship features to customers that benefited from Opus 4.5/4.6 and Codex 5.2/5.3 over the next few months, but I'm not surprised there hasn't been a huge swell in stuff-that-shipped in just the ~10 weeks since those models become available.
There is one notable example that's captured the zeitgeist: https://github.com/openclaw/openclaw had its first commit on November 25th 2025, 3 months later it's had more than 10,000 commits from 600 contributors, attracted 196,000 stars and (kind-of) been featured in a Superbowl commercial (apparently that's what the AI.com thing was, if anyone could get the page to load - https://x.com/kris/status/2020663711015514399 )
One of the things about this story that don't sit right with me is how Scott and others in the GitHub comments seem to assign agency to the bot and engage with it.
It's a bot! The person running it is responsible. They did that, no matter how little or how much manual prompting went into this.
As long as you don't know who that is, ban it and get on with your day.
If past patterns are anything to go by, the complexity moves up to a different level of abstraction.
Don't take this as a concrete prediction - I don't know what will happen - but rather an example of the type of thing that might happen:
We might get much better tooling around rigorously proving program properties, and the best jobs in the industry will be around using them to design, specify and test critical systems, while the actual code that's executing is auto-generated. These will continue to be great jobs that require deep expertise and command excellent salaries.
At the same, a huge population of technically-interested-but-not-that-technical workers build casual no-code apps and the stereotypical CRUD developer just goes extinct.
None of this works if the testers are collaborating with the trainers. The tests ostensibly need to be arms-length from the training. If the trainers ever start over-fitting to the test, the tester would come up with some new test secretly.
reply