More

Esophagus4 · 2026-03-31T16:42:36 1774975356

That’s why I really don’t like these stats… they’re pretty meaningless.

“S&P had its worst X since Y”

Worst quarter in four years. Worst week since 2018. Worst 3 days since 2008.

It’s all kind of silly.

Esophagus4 · 2026-03-30T22:04:19 1774908259

Did anyone not get beginner?

I got it as well.

Uncorrelated · 2026-03-30T23:19:11 1774912751

I responded with a mix of mostly B and C answers and got “advanced.” Yet, as pointed out by another commenter, selecting all D answers (which would make you an expert!) gets you called a beginner.

I can only assume the quiz itself was vibe-coded and not tested. What an incredible time we live in.

taftster · 2026-03-31T00:08:31 1774915711

Or that it's taking into account the Dunning-Kruger effect. In that, if you think you are an expert in all cases, you are really a beginner in everything.

the_other · 2026-03-30T23:01:56 1774911716

I'm a beginner with agentic coding. I vibe code something most days, from a few lines up to refactors over a few files. I don't knowingly use skills, rarely _choose_ to call out to tools, haven't written any skills and only one or two ad hoc scripts, and have barely touched MCPs (because the few I've used seem flaky and erratic). I answered as such and got... intermediate.

Esophagus4 · 2026-03-30T19:29:12 1774898952

Yeah…

> For tasks that would take a human under four minutes—small bug fixes, boilerplate, simple implementations—AI can now do these with near-100% success. For tasks that would take a human around one hour, AI has a roughly 50% success rate. For tasks over four hours, it comes in below a 10% success rate

Opus 4.6 now does 12hr tasks with 50% success. The METR time horizon chart is insane… exponential progression.

indoordin0saur · 2026-03-30T19:32:54 1774899174

Really depends on what you're working in. For me, I work with a lot of data frameworks that are maybe underrepresented in these models' training sets and it still tends to get things wrong. The other issue is business logic is complex to describe in a prompt, to the point where giving it all the context and business logic for it to succeed is almost as much work as doing it myself. As a data engineer I still only find models to be useful with small chunks of code or filling in tedious boilerplate to get things moving.

blonder · 2026-03-30T20:02:03 1774900923

Agreed. Common use cases like creating a simple LMS system Opus is shockingly good, saving hours upon hours from having to reinvent the wheel. Other things like simple queries to, and interactions with our ERP system it is still quite poor at, and increases development time rather than shortens it.

drzaiusx11 · 2026-03-31T02:37:56 1774924676

Just anecdotal but I work on some fairly left field service architectures; today it was a highly parallelized state machine processor operating on an in-house binary protocol.

Opus 4.6 had no issue correctly identifying and mitigating a hairy out-of-order state corruption issue involving a non-trivial sequence of runtime conditions from thrown errors and failed recoveries. This was simply from having access to the code repository and a brief description of the observed behavior that I provided. Naturally I verified it wasn't bullshitting me, and sure enough it was correct. Impressive really, given none of the specifics could have been in its training set, but I guess we're finding that nothing really is "new", just a remix of what's come before in various recombinations.

alistairSH · 2026-03-30T20:14:48 1774901688

How is success defined in those metrics? Is success "perfect - can deploy to prod immediately" or "saved some arbitrary amount of engineering time"?

Anecdotal experience from my team of 15 engineers is we rarely get "perfect" but we do get enough to massive time savings across several common problem domains.

Esophagus4 · 2026-03-30T22:32:58 1774909978

I think for me, it’s not so much an objective success metric as it is showing its progression over time.

That’s what marvels me is how fast LLMs are progressing. And it still feels like early days (!).

For methodology, I would check out the METR website though, they’ve published their results.

Esophagus4 · 2026-03-30T02:16:41 1774837001

The downside of an Apple is generally you can’t improve the hardware by replacing it piecemeal as new hardware comes out.

That was my goal buying a Framework… to get to refresh hardware regularly as better stuff came out rather than waiting 10 years to buy a new laptop.

Will it work that way in reality? No idea, but I thought it was at least interesting enough to take a gamble.

Esophagus4 · 2026-03-30T02:12:46 1774836766

> If you don’t like that Apple products are expensive to repair, don’t buy them or suck it up

Yea exactly. This is why I switched from Apple to Framework.

I like MacOS better than Linux, but it was worth the hardware trade off for me.

Esophagus4 · 2026-03-30T02:10:13 1774836613

Being an effective market doesn’t mean you get everything you want.

You’re actually saying: “I want Apple’s software, and I want certain chips, and I want a certain form factor. And if Apple won’t build what I want, I will pass a law to make them build it for me!”

Come on man. You will make tradeoffs either way. The answer isn’t: force a company to build what I want them to build.

danpalmer · 2026-03-30T02:31:32 1774837892

Well another version of it is: I want to be able to talk to my family, but I don't want to buy an iPhone. The EU rightly regulated that any chat network big enough must open their doors to different platforms. Or I don't want to buy Microsoft Office for my employees but I want to be able to do business with those who do, and thankfully we have relatively open document formats now.

The chips argument is contrived, the OS argument less so, but it's all just network effects at some level, and it's important for competition and effective markets that we prevent the largest networks from locking people in and forcing them to make a lot of other unrelated decisions.

hypeatei · 2026-03-30T11:24:39 1774869879

> I want to be able to talk to my family, but I don't want to buy an iPhone

How were you not able to do this without an iPhone?

danpalmer · 2026-03-30T23:33:53 1774913633

iMessage being a closed ecosystem. Apple finally added RCS support, but only after regulatory pressure.

To not recognise this as a limitation is to be wilfully blind to network effects. The "green bubbles" issue was a huge issue in the US. Similarly, WhatsApp not being open is a huge problem in forcing people onto Meta's platforms.

Esophagus4 · 2026-03-28T22:58:03 1774738683

Oof, talk about making compliance difficult and expensive if a company has 50 different sets of regulations to comply with to do business in the US.

PaulDavisThe1st · 2026-03-29T03:27:34 1774754854

Do you believe that states are the laboratories of democracy, and have rights, or do you believe that reducing the cost of regulatory compliance is a more important goal?

I take no position on this currently, but it's an important question that deserves a serious answer. Trading off the costs of "state experimentation" and "enforced regulatory conformity" is non-trivial to do.

Esophagus4 · 2026-03-28T12:52:10 1774702330

> so many inefficiencies that are trivially solved with existing tech frameworks.

Out of curiosity, like what specifically?

Didn’t DOGE’s failure highlight that it actually wasn’t trivial? I’m skeptical at first glance but open to being proven wrong.

bojan · 2026-03-28T13:00:49 1774702849

DOGE wasn't actually trying to make things more efficient. You can't count it as an honest attempt.

j-bos · 2026-03-28T22:46:44 1774738004

To be clear, I wasn't exclusively referring to government. I was actually only thinking of the use of git-like version control across a number of different technical domains, law, design, book writing, architecture, etc

Esophagus4 · 2026-03-28T23:02:41 1774738961

Oh, I gotcha.

Yes, I’ve noticed that software like MS Word and Atlassian Confluence now has version control built in

0x3f · 2026-03-28T16:13:49 1774714429

> Out of curiosity, like what specifically?

For example, there are thousands of divisions of government out there provisioning largely the same systems in duplicate. E.g. the very local government here has a web portal for the sports venue bookings like pools and tennis courts. They have a waste collection portal. Local tax portal.

Only recently has this been slightly standardized but even those efforts are purely regional. You might get 5 local councils in the city using one SaaS platform, another 5 using another SaaS platform, and another 5 rolling their own. For each function of local government.

Nevermind the fact that a local government in France like this probably has very similar needs to one in Belgium or even the US.

And the worst part is they are terrible at procurement so even when they do consolidate, they're basically getting scammed.

I often think about starting a cost-plus-priced open core project to deal with these issues. Like we build common government functions, and sell it for cost plus 20% markup, with a licence that lets the gov run it themselves if we ever go bust. But then I think procurement is largely a grift game and it might not do well for that reason.

akudha · 2026-03-28T21:01:42 1774731702

Wouldn’t consolidation lead to monopoly? If 50 local governments use the same SaaS/vendor, the 51st local government would likely go for the same vendor just because 50 others used that vendor before them, no? What prevents the vendor from jacking up prices or general enshittification at the stage?

0x3f · 2026-03-28T21:42:21 1774734141

> What prevents the vendor from jacking up prices or general enshittification at the stage?

Well what I'm proposing building would be source-available and licensed such that the gov can run it themselves if it ever gets too expensive. The sub-gov entities should really band together for the negotiation though, then they can ask for whatever they want: non-profit vendors, liberal licensing, price agreements. A collective of government buyers form basically a monopsony larger than any individual vendor could ever be.

hirako2000 · 2026-03-28T13:08:58 1774703338

DOGE made the token gain more in market cap than it saved in expenses. Despite having at head a master of blind layoffs.

Esophagus4 · 2026-03-28T00:14:13 1774656853

Come on man, that’s not what GP is saying.

phyzome · 2026-03-28T01:17:02 1774660622

How is it not? It reads to me as them saying that all these devs have deskilled from "barely competent" to "completely helpless". Or is your claim that they were actually really good devs, and the deskilling has been even more intense than I'm picturing?

Because that also sounds real bad!

pizzafeelsright · 2026-03-29T17:10:26 1774804226

My personal experience is we're seeing a magnification of results. The slog is reading hundreds of files, updating some active code to remove some old function from 100k lines of code. Last week the modification, while trivial, would have took weeks and AI agents were able to correct the code with 100% verified accuracy in 20 minutes.

Esophagus4 · 2026-03-28T00:10:46 1774656646

> AI tools demo really well

Yes, and they work really well for small side projects that an exec probably used to try out the LLM.

But writing code in one clean discrete repo is (esp. at a large org) only a part of shipping something.

Over time, I think tooling will get better at the pieces surrounding writing the code though. But the human coordination / dependency pieces are still tricky to automate.