More

crustycoder · 2026-04-04T16:41:17 1775320877

A timely link - I've just spent the last week failing to get a ChatGPT Skill to produce a reproducible management reporting workflow. I've figured out why and this article pretty much confirms my conclusions about the strengths & weaknesses of "pure" LLMS, and how to work around them. This article is for a slightly different problem domain, but the general problems and architecture needed to address them seem very similar.

crustycoder · 2026-04-04T16:33:51 1775320431

"SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6"

I know virtually nothing about this area but my naive take is that something that means it still only passes tests around half the time doesn't seem like a particularly big jump forwards.

What am I missing?

SEMW · 2026-04-04T17:50:13 1775325013

There's no shortage of benchmarks (coding or otherwise) that any competent coding model will now pass with ~100%.

But no-one quotes those any more because if everyone passes them, they don't serve any useful purpose in discriminating between different models or identifying advancements

So people switch to new benchmarks which either have more difficult tasks or some other artificial constraints that make them in some way harder to pass, until the scores are low enough that they're actually discriminating between models. and a 50% score is in some sense ideal for that - there's lots of room for variance around 50%.

(whether the thing they're measuring is something that well correlates to real coding performance is another question)

So you can't infer anything in isolation from a given benchmark score being only 50% other than that benchmarks are calibrated to make such scores the likely outcome

crustycoder · 2026-04-04T20:33:13 1775334793

So it's the relative and not the absolute diff that matters - thanks.

martinrolph · 2026-04-05T08:09:41 1775376581

Think of it less like a test suite and more like an exam. If you're trying to differentiate between the performance of different people/systems/models, you need to calibrate the difficulty accordingly.

When designing a benchmark, a pass rate of roughly 50% is useful because it gives you the most information about the relative performance of different models. If the pass rate is 90%+ too often, that means the test is too easy: you're wasting questions asking the model to do things we already know it can do, and getting no extra information. And if it's too low then you're wasting questions at the other end, trying to make it do impossible tasks.

crustycoder · 2026-02-24T22:37:03 1771972623

Things have moved on since 1994, not only can you still embed it in C and a load of other languages, you can even run it directly in your browser as there's a WASM port.

https://www.swi-prolog.org/pldoc/man?section=wasm-version

crustycoder · 2026-02-21T22:42:15 1771713735

Dead? What, again?

Yet another rehash of the smoke and mirrors bullshit I've been hearing every 5 years or so for the last 40+ years.

rapnie · 2026-02-21T22:47:15 1771714035

I would formulate it as: The software development lifecycle is inevitable, or you will not have any software. The lifecycle is just not acknowledged and thus implicit to many people. If you hack in Notepad, FTP it to your webserver, then your lifecycle lasts till you switch it all off. A simple lifecycle, but unavoidable to have one.

crustycoder · 2026-02-02T09:58:01 1770026281

You are using mutexes, they are on the Actor message queues, amongst other places. "Just use mutexes" suggests a lack of experience of using them, they are very difficult to get both correct and scalable. By keeping them inside the Actor system, a lot of complexity is removed from the layers above. Actors are not always the right choice, but when they are they are a very useful and simplifying abstraction.

Horses for courses, as they say.

b33j0r · 2026-02-02T14:08:11 1770041291

Lock-free queues and 16-core processors exist though. I use actors for the abstraction primarily anyway.

koakuma-chan · 2026-02-04T00:53:32 1770166412

Can you share some insights why mutexes are difficult to get correct and scalable?

crustycoder · 2026-02-02T09:51:39 1770025899

Eh?

I've written a non-distributed app that uses the Actor model and it's been very successful. It concurrently collects data from hundreds of REST endpoints, a typical run may make 500,000 REST requests, with 250 actors making simultaneous requests - I've tested with 1,000 but that tends to pound the REST servers into the ground. Any failed requests are re-queued. The requests aren't independent, request type C may depend on request types A & B being completed first as it requires data from them, so there's a declarative dependency graph mechanism that does the scheduling.

I started off using Akka but then the license changed and Pekko wasn't a thing yet, so I wrote my own single-process minimalist Actor framework - I only needed message queues, actor pools & supervision to handle scheduling and request failures, so that's all I wrote. It can easily handle 1m messages a second.

I have no idea why that's a "huge dead end", Actors are a model that's a very close fit to my use case, why on earth wouldn't I use it? That "nurseries" link is way TL;DR but it appears to be rubbishing other options in order to promote its particular model. The level of concurrency it provides seems to be very limited and some of it is just plain wrong - "in most concurrency systems, unhandled errors in background tasks are simply discarded". Err, no.

Big Rule 0: No Dogmas: Use The Right Tool For The Job.

kibwen · 2026-02-02T15:24:40 1770045880

> That "nurseries" link is way TL;DR

Please read and understand that blog post, I promise it's worth your time.

crustycoder · 2026-02-02T22:28:00 1770071280

Um, no I won't and no it won't. I have no time for tub-thumping.

crustycoder · 2026-01-11T15:46:22 1768146382

Or perhaps just use a language that's designed to solve those sorts of problems? In 14 lines of code.

https://www.swi-prolog.org/pldoc/man?section=clpfd-sudoku

cenamus · 2026-01-11T17:30:27 1768152627

Is there a similarly short/simple solution not using all of the built ins? Haven't worked with prolog in a while but should be easy enough with primitives (albeit with more duplication)?

crustycoder · 2026-01-11T22:31:52 1768170712

Well no, not really. The whole point is to use the appropriate tool for the task at hand. In this case it's the CLP(FD) library, https://www.swi-prolog.org/pldoc/man?section=clpfd

nurettin · 2026-01-11T18:00:05 1768154405

Why not just

    blocks(Rows, Blocks), maplist(all_distinct, Blocks), maplist(label, Rows)

crustycoder · 2025-11-23T23:18:27 1763939907

He hates on C++ pretty much the same as he does on Rust. Your argument seems to be that Rust is better than C++, which is akin to trying to make the case that Cholera is better than Smallpox.

Language wars are boring and pointless, they all have areas of suckage. The right approach is to pick whichever one is the least worst for the job at hand.

crustycoder · 2025-09-20T23:16:03 1758410163

The other reason jump changes are not a revolution and have remained just a curiosity is physics, something that's ignored by the article.

For non-ringers, in change ringing the bells rotate 360 degrees each time they strike, from mouth up to mouth up. The clapper hits the bell when the it has rotated roughly 270 degrees from mouth up and is more or less horizontal, approximately 2 seconds after it starts moving. The bells are usually in the 100kg to 1000kg range (for US folks, that's 220lb to 2200lb), although they can be up to 4000kg. The only point when the ringer can exert control on the bell via the rope is when it is near the balance and mouth upwards, and speeding it up or slowing it down any more than one "beat" is physically very difficult on heavier bells, particularly if you are doing it for a full peal, which usually takes 3+ hours.

About the least important thing in the 2022 rules changes (https://framework.cccbr.org.uk/version2) was the allowing of jump changes.

p.s. there's a split-screen video showing the ringer and the bell he's ringing here: https://youtu.be/qrdLP15Xsuk?t=67

crustycoder · 2025-09-20T22:52:40 1758408760

Here you go: https://www.whitingsociety.org.uk/articles/basic-tuition/ita...

There are a fair few videos on YouTube as well.

dcminter · 2025-09-20T23:01:07 1758409267

That link (like the Wikipedia article) is talking about the mechanism by which the bell is rung, not what is rung out on them i.e. they are not ringing the changes.

crustycoder · 2025-09-20T23:19:23 1758410363

It's got compositions on the page, a link to a PDF with compositions in it and a link to the Veronese ringing association which has many more examples - if you can read Italian.

dcminter · 2025-09-21T09:51:51 1758448311

Ah, I stand corrected. I couldn't listen to the examples last night so I have egg on my face this morning :)

ajb · 2025-09-21T11:26:45 1758454005

I hadn't completely checked it either FWIW so worth asking the question.

It does sound slightly different as they use chords.