More

svara · 2026-02-17T19:41:35 1771357295

Do you keep a collection of these? What for?

tonymet · 2026-02-18T00:15:57 1771373757

it took me 5 seconds to google this

dyauspitr · 2026-02-17T20:46:10 1771361170

Probably organized propaganda or fanatical support of Russia. My bet is on the former.

tonymet · 2026-02-18T00:20:59 1771374059

I'm guessing you mean the publications. Certainly not me.

AlexandrB · 2026-02-17T20:01:32 1771358492

I'm actually in awe. I wish I had lists like these for other "hot-button" issues where the common narrative is that things are constantly on the brink of some kind of catastrophe or resolution. Really puts things into perspective.

tonymet · 2026-02-18T00:18:40 1771373920

i recommend writing prompts for Gemini / ChatGPT along the lines of "as a history professor put X in perspective. Compare the evidence for / against. Be sure to include grounding on each fact ..."

tonymet · 2026-02-18T00:16:22 1771373782

i googled it

svara · 2026-02-17T17:35:19 1771349719

I think that mostly depends on how good a writer you are. A lot of people aren't, and the AI legitimately writes better. As in, the prose is easier to understand, free of obvious errors or ambiguities.

But then, the writing is also never great. I've tried a couple of times to get it to write in the style of a famous author, sometimes pasting in some example text to model the output on, but it never sounds right.

aaplok · 2026-02-17T21:20:24 1771363224

It depends how you define "good writing", which is too often associated with "proper language", and by extension with proper breeding. It is a class marker.

People have a distinct voice when they write, including (perhaps even especially) those without formal training in writing. That this voice is grating to the eyes of a well educated reader is a feature that says as much about the reader as it does about the writer.

Funnily enough, professional writers have long recognised this, as is shown by the never-ending list of authors who tried to capture certain linguistic styles in their work, particularly in American literature.

There are situations where you may want this class marker to be erased, because being associated with a certain social class can have negative impact on your social prospects. But it remains that something is being lost in the process, and that something is the personality and identity of the writer.

Retric · 2026-02-17T17:43:28 1771350208

I find most people can write way better than AI, they simply don’t put in the effort.

Which is the real issue, we’re flooding channels not designed for such low effort submissions. AI slop is just SPAM in a different context.

andrewflnr · 2026-02-17T19:42:24 1771357344

You may be in a bubble of smart, educated people. Either way, one of the key ways to "put in the effort" is practice. People who haven't practiced often don't write well even if they're trying hard in the moment. Not even in terms of beautiful writing, just pure comprehensibility.

Retric · 2026-02-17T20:14:11 1771359251

I may be in a bubble of smart people, but IMO AI consistently far worse than many high school works I’ve read in terms of actual substance and coherent structure.

Of course I’ve had arguments where people praise AI output then I’ve literally pointed out dozens of mistakes and they just kind of shrug saying it’s not important. So I acknowledge people judge writing very differently than I do. It just feels weird when I’d give something a 15% and someone else would happily slap on a B+.

JamesBarney · 2026-02-17T21:06:53 1771362413

My experience has been

(ordered from best to worst)

1. Author using AI well

2. Author not using AI

3. Author using AI poorly

With the gap between 1 and 2 being driven by the underlying quality of the writer and how well they use AI. A really good writer sees marginal improvements and a really poor one can see vast improvements.

lich_king · 2026-02-17T17:53:05 1771350785

I am really conflicted about this because yes, I think that an LLM can be an OK writing aid in utilitarian settings. It's probably not going to teach you to write better, but if the goal is just to communicate an idea, an LLM can usually help the average person express it more clearly.

But the critical point is that you need to stay in control. And a lot of people just delegate the entire process to an LLM: "here's a thought I had, write a blog post about it", "write a design doc for a system that does X", "write a book about how AI changed my life". And then they ship it and then outsource the process of making sense of the output and catching errors to others.

It also results in the creation of content that, frankly, shouldn't exist because it has no reason to exist. The number of online content that doesn't say anything at all has absolutely exploded in the past 2-3 years. Including a lot of LLM-generated think pieces about LLMs that grace the hallways of HN.

layer8 · 2026-02-17T18:17:19 1771352239

Even if they “stay in control and own the result”, it’s just tedious if all communication is in that same undifferentiated sanded-down language.

littlestymaar · 2026-02-17T18:50:19 1771354219

> A lot of people aren't, and the AI legitimately writes better.

It may write “objectively better”, but the very distinct feel of all AI generated prose makes it immediately recognizable as artificial and unbearable as a result.

svara · 2026-02-16T19:00:16 1771268416

Yes, it's been odd to observe the parallels with the web3 craze.

You asked people what their project was for and you'd get a response that made sense to no one outside of that bubble, and if you pressed on people would get mad.

The bizarre thing is that this time around, these tools do have a bunch of real utility, but it's become almost impossible online to discuss how to use the tech properly, because that would require acknowledging some limitations.

crystal_revenge · 2026-02-16T20:45:25 1771274725

Very similar to web3! On paper the web3 craze sounded very exciting: yes, I absolutely would love an alternate web of truly decentralized services.

I've been pretty consistently skeptical of the crypto world, but with web3 I was really hoping to be wrong. What's wild is there was not a single, truly distributed, interesting/useful service at all to come out of all that hype. I spent a fair bit of time diving into the details of Ethereum and very quickly realized the "world computer" there (again, wonderful idea) wasn't really feasible for anything practical (I mean other than creating clever ways to scam people).

Right now in the LLM space I see a lot of people focused on building old things in new ways. I've realized that not only do very few people work with local models (where they can hack around and customize more), a surprisingly small number of people write code that even calls an LLM through an API for some specific task that previously wasn't possible (regular ol'software build using calls to an LLM has loads of potential). It's still largely "can some variation on a chat bot do this thing I used to do for me".

As a contrast, in the early web, plenty of people were hosting their own website, and messing around with all the basic tools available to see what novel thing they could create. I mean "Hamster Dance" was it's own sort of slop, but the first time you say it you engaged with it. Snarg.net still stands out as novel in it's experiments with "what is an interface".

neoromantique · 2026-02-16T22:49:59 1771282199

>As a contrast, in the early web, plenty of people were hosting their own website, and messing around with all the basic tools available to see what novel thing they could create

I'm hoping that the already full of slop centralized platforms now with LLM fueled implosion will overflow and lead to a renaissance of sorts for small and open web, niche communities and decoupling from big tech.

It's already gaining traction among the young, as far as I can see.

svara · 2026-02-16T07:21:00 1771226460

Opus 4.6:

Walk! At 50 meters, you'll get there in under a minute on foot. Driving such a short distance wastes fuel, and you'd spend more time starting the car and parking than actually traveling. Plus, you'll need to be at the car wash anyway to pick up your car once it's done.

crimsonnoodle58 · 2026-02-16T07:49:55 1771228195

That's not what I got.

Opus 4.6 (not Extended Thinking):

Drive. You'll need the car at the car wash.

almost · 2026-02-16T08:39:26 1771231166

Also what I got. Then I tried changing "wash" to "repair" and "car wash" to "garage" and it's back to walking.

visarga · 2026-02-16T16:56:07 1771260967

> That's not what I got.

My Opus vs your Opus, which is smarter?!

nosuchthing · 2026-02-16T23:23:57 1771284237

LLMs can't access the training data that's less than the statistically most common token, so they use a random jitter.

With that randomness comes statistically irrelevant results.

silisili · 2026-02-16T08:07:18 1771229238

Am I the only one who thinks these people are monkey patching embarrassments as they go? I remember the r in strawberry thing they suddenly were able to solve, while then failing on raspberry.

plexicle · 2026-02-16T12:01:35 1771243295

Nah. It's just non-deterministic. I'm here 4 hours later and here's the Opus 4.6 (extended thinking) response I just got:

"At 50 meters, just walk. By the time you start the car, back out, and park again, you'd already be there on foot. Plus you'll need to leave the car with them anyway."

mentalgear · 2026-02-16T08:19:33 1771229973

They definitely do: at least openAi "allegedly" has whole teams scanning socials, forums, etc for embarrassments to monkey-patch.

londons_explore · 2026-02-16T08:40:14 1771231214

Which raises the question why this isn't patched already. We're nearing 48 hours since this query went viral...

groundzeros2015 · 2026-02-16T14:17:39 1771251459

This is part of why they need to be so secretive. If you can see the tree of hardcoded guidance for common things it won’t look as smart.

viking123 · 2026-02-16T08:55:31 1771232131

They should make Opus Extended Extended that routes it to actual person in a low cost country.

andrewaylett · 2026-02-16T10:42:30 1771238550

Artificial AI.

raincole · 2026-02-16T08:18:41 1771229921

Yes, you're the only one.

coldtea · 2026-02-16T08:44:42 1771231482

Sure there are many very very naive people that are also so ignorant of the IT industry they don't know about decades of vendors caught monkeypatching and rigging benchmarks and tests for their systems, but even so, the parent is hardly the only one.

silisili · 2026-02-16T08:31:17 1771230677

Works better on Reddit, really.

chvid · 2026-02-16T08:44:22 1771231462

Of course they are.

cowboylowrez · 2026-02-16T14:33:38 1771252418

Thats my thought too. The chatbot bros probably feel the need to be responsive and there's probably an express lane to update a trivia file or something lol

anonym29 · 2026-02-16T08:26:04 1771230364

No doubt about it, and there's no reason to suspect this can only ever apply to embarassing minor queries, either.

Even beyond model alignment, it's not difficult to envision such capabilities being used for censorship, information operations, etc.

Every major inference provider more or less explicitly states in their consumer ToS that they comply with government orders and even share information with intelligence agencies.

Claude, Gemini, ChatGPT, etc are all one national security letter and gag order away from telling you that no, the president is not in the Epstein files.

Remember, the NSA already engaged in an unconstitutional criminal conspiracy (as ruled by a federal judge) to illegally conduct mass surveillance on the entire country, lie about it to the American people, and lie about it to congress. The same organization that used your tax money to bribe RSA Security to standardize usage of a backdoored CSPRNG in what at the time was a widely used cryptographic library. What's the harm in a little bit of minor political censorship compared to the unconstitutional treason these predators are usually up to?

That's who these inference providers contractually disclose their absolute fealty to.

surgical_fire · 2026-02-16T09:09:28 1771232968

That you got different results is not surprising. LLMs are non-deterministic; which is both a strength and a weakness of LLMs.

mvdtnz · 2026-02-16T08:15:52 1771229752

We know. We know these things aren't determination. We know.

GoatInGrey · 2026-02-16T18:07:21 1771265241

Me:

I want to get my car's transmission fixed, the mechanic shop is 50 meters away. Would it be better to walk or to drive to a location 50 meters away from my front door?

-

Opus 4.6:

<thinking> The user is asking whether to walk or drive 50 meters to a mechanic shop, but there's a key nuance here - their transmission needs fixing, so driving might not be wise.

Done </thinking>

Walk, absolutely. Driving a car with a compromised transmission even 50 meters risks making the problem worse and turning a repair into a more expensive one. Plus, 50 meters is barely a minute on foot.

viking123 · 2026-02-16T07:44:27 1771227867

Lmao, and this is what they are saying will be an AGI in 6 months?

notahacker · 2026-02-16T08:39:32 1771231172

There's probably a comedy film with an AGI attempting to take over the world with its advanced grasp of strategy, persuasion and SAT tests whilst a bunch of kids confuse it by asking it fiendish brainteasers about carwashes and the number of rs in blackberry.

(The final scene involves our plucky escapees swimming across a river to escape. The AIbot conjures up a speedboat through sheer powers of deduction, but then just when all seems lost it heads back to find a goat to pick up)

throwway123 · 2026-02-16T10:47:47 1771238867

There is a Soviet movie, "Teens in the Universe" [0], where teens cause robots' brains to fry by giving them linguistic logical puzzles.

[0]: https://en.wikipedia.org/wiki/Teens_in_the_Universe

simonask · 2026-02-16T09:00:25 1771232425

This would work if it wasn’t for that lovely little human trait where we tend to find bumbling characters endearing. People would be sad when the AI lost.

notahacker · 2026-02-16T12:59:26 1771246766

Maybe infusing the AI character with the boundless self confidence of its creators will make it less endearing :)

layer8 · 2026-02-16T11:19:46 1771240786

What’s wrong with having a bittersweet movie?

simonw · 2026-02-16T12:43:52 1771245832

In the excellent and underrated The Mitchells vs the Machines there's a running joke with a pug dog that sends the evil robots into a loop because they can't decide if it's a dog, a pig or a loaf of bread.

OneMorePerson · 2026-02-16T10:39:52 1771238392

This theme reminds me of Blaine the Mono from the Dark Tower series

GeoAtreides · 2026-02-16T10:19:24 1771237164

There is a Star Trek episode where a fiendish brainteaser was actually considered to genocide an entire (cybernetic, not AI) race. In the end, captain Picard choose not to deploy it.

misnome · 2026-02-16T08:25:37 1771230337

But “PhD level” reasoning a year ago.

hypeatei · 2026-02-16T08:54:34 1771232074

Yes, get ready to lose your job and cash your UBI check! It's over.

prmph · 2026-02-16T11:03:11 1771239791

Laughable indeed.

One thing that my use of the latest and greatest models (Opus, etc) have made clear: No matter how advanced the model, it is not beyond making very silly mistakes regularly. Opus was even working worse with tool calls than Sonnet and Haiku for a while for me.

At this point I am convinced that only proper use of LLMs for development is to assist coding (not take it over), using pair development, with them on a tight leash, approving most edits manually. At this point there is probably nothing anyone can say to convince me otherwise.

Any attempt to automate beyond that has never worked for me and is very unlikely to be productive any time soon. I have a lot of experience with them, and various approaches to using them.

moogly · 2026-02-16T16:45:00 1771260300

They seem to have stopped talking about AGI and pivoted to ads and smut.

bigfishrunning · 2026-02-16T21:35:10 1771277710

This was probably wise, because ads and smut are well understood, and known to exist.

cbozeman · 2026-02-16T07:53:07 1771228387

Well in fairness, the "G" does stand for "General".

dsr_ · 2026-02-16T08:13:08 1771229588

In fairness, they redefined it away from "just like a person" to "suitable for many different tasks".

briansm · 2026-02-16T13:03:46 1771247026

I think this lack of 'G' (generality, or modality) is the problem. A human visualizes this kind of problem (a little video plays in my head of taking a car to a car wash). LLM's don't do this, they 'think' only in text, not visually.

A proper AGI would have have to have knowledge in video, image, audio and text domains to work properly.

actionfromafar · 2026-02-16T08:12:34 1771229554

Show me a robotic kitten then, in six months. As smart and learning.

stingraycharles · 2026-02-16T07:34:19 1771227259

That’s without reasoning I presume?

plexicle · 2026-02-16T12:02:28 1771243348

4.6 Opus with extended thinking just now: "At 50 meters, just walk. By the time you start the car, back out, and park again, you'd already be there on foot. Plus you'll need to leave the car with them anyway."

gf000 · 2026-02-16T07:46:08 1771227968

Not the parent poster, but I did get the wrong answer even with reasoning turned on.

tezza · 2026-02-16T08:08:18 1771229298

Thank you all! We needed further data points.

comparing one shot results is a foolish way to evaluate a statistical process like LLM answers. we need multiple samples.

for https://generative-ai.review I do at least three samples of output. this often yields very differnt results even from the same query.

e.g: https://generative-ai.review/2025/11/gpt-image-1-mini-vs-gpt...

svara · 2026-02-15T08:56:57 1771145817

You're right, and I enjoy using coding agents too. I've built some things with them I wouldn't have otherwise.

However, it's been a full quarter now since November 2025.

Based on facts on the ground, i.e. the rate and quality of new software and features we observe, change has been nowhere as dramatic as your comment would suggest.

It seems to me that a possible explanation is that people get very excited about massive speedups in specific tasks, but the bottleneck of the system shifts somewhere else immediately (e.g, human capacity for learning, team coordination costs, communication delays).

simonw · 2026-02-15T17:06:04 1771175164

That "full quarter" included the Christmas holidays for many people, during which not a lot of work gets done.

I think it's a bit early to expect to see huge visible output from these new tools. A lot of people are still spinning up on them - learning to use a coding agent effectively takes months.

And for people who are spun up, there's a lot more to shipping new features and products that writing the code. I expect we'll start to see companies ship features to customers that benefited from Opus 4.5/4.6 and Codex 5.2/5.3 over the next few months, but I'm not surprised there hasn't been a huge swell in stuff-that-shipped in just the ~10 weeks since those models become available.

There is one notable example that's captured the zeitgeist: https://github.com/openclaw/openclaw had its first commit on November 25th 2025, 3 months later it's had more than 10,000 commits from 600 contributors, attracted 196,000 stars and (kind-of) been featured in a Superbowl commercial (apparently that's what the AI.com thing was, if anyone could get the page to load - https://x.com/kris/status/2020663711015514399 )

svara · 2026-02-15T08:47:46 1771145266

> So getting AI to do bits, really means getting AI to do the really easy bits.

As someone who gets quickly bored with repetitive work, this is big though.

svara · 2026-02-14T08:58:37 1771059517

> The wording here is fascinating, mainly because they're effectively acting as arbiters of "vibes".

A very common tension in law everywhere.

In the US you now have a 'major questions doctrine'. What the hell is a major question?

svara · 2026-02-14T08:28:12 1771057692

One of the things about this story that don't sit right with me is how Scott and others in the GitHub comments seem to assign agency to the bot and engage with it.

It's a bot! The person running it is responsible. They did that, no matter how little or how much manual prompting went into this.

As long as you don't know who that is, ban it and get on with your day.

sapphicsnail · 2026-02-14T23:48:45 1771112925

> It's a bot! The person running it is responsible. They did that, no matter how little or how much manual prompting went into this.

This! Everyone seems to have decided that this is some sort of completely autonomous bot gone wrong.

svara · 2026-02-14T07:42:24 1771054944

If past patterns are anything to go by, the complexity moves up to a different level of abstraction.

Don't take this as a concrete prediction - I don't know what will happen - but rather an example of the type of thing that might happen:

We might get much better tooling around rigorously proving program properties, and the best jobs in the industry will be around using them to design, specify and test critical systems, while the actual code that's executing is auto-generated. These will continue to be great jobs that require deep expertise and command excellent salaries.

At the same, a huge population of technically-interested-but-not-that-technical workers build casual no-code apps and the stereotypical CRUD developer just goes extinct.

svara · 2026-02-12T19:07:00 1770923220

More likely you would just train for emitting svg for some description of a scene and create training data from raster images.

recursive · 2026-02-12T23:49:39 1770940179

None of this works if the testers are collaborating with the trainers. The tests ostensibly need to be arms-length from the training. If the trainers ever start over-fitting to the test, the tester would come up with some new test secretly.