A timely link - I've just spent the last week failing to get a ChatGPT Skill to produce a reproducible management reporting workflow. I've figured out why and this article pretty much confirms my conclusions about the strengths & weaknesses of "pure" LLMS, and how to work around them. This article is for a slightly different problem domain, but the general problems and architecture needed to address them seem very similar.
"SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6"
I know virtually nothing about this area but my naive take is that something that means it still only passes tests around half the time doesn't seem like a particularly big jump forwards.
There's no shortage of benchmarks (coding or otherwise) that any competent coding model will now pass with ~100%.
But no-one quotes those any more because if everyone passes them, they don't serve any useful purpose in discriminating between different models or identifying advancements
So people switch to new benchmarks which either have more difficult tasks or some other artificial constraints that make them in some way harder to pass, until the scores are low enough that they're actually discriminating between models. and a 50% score is in some sense ideal for that - there's lots of room for variance around 50%.
(whether the thing they're measuring is something that well correlates to real coding performance is another question)
So you can't infer anything in isolation from a given benchmark score being only 50% other than that benchmarks are calibrated to make such scores the likely outcome
Think of it less like a test suite and more like an exam. If you're trying to differentiate between the performance of different people/systems/models, you need to calibrate the difficulty accordingly.
When designing a benchmark, a pass rate of roughly 50% is useful because it gives you the most information about the relative performance of different models. If the pass rate is 90%+ too often, that means the test is too easy: you're wasting questions asking the model to do things we already know it can do, and getting no extra information. And if it's too low then you're wasting questions at the other end, trying to make it do impossible tasks.
Things have moved on since 1994, not only can you still embed it in C and a load of other languages, you can even run it directly in your browser as there's a WASM port.
I would formulate it as: The software development lifecycle is inevitable, or you will not have any software. The lifecycle is just not acknowledged and thus implicit to many people. If you hack in Notepad, FTP it to your webserver, then your lifecycle lasts till you switch it all off. A simple lifecycle, but unavoidable to have one.
You are using mutexes, they are on the Actor message queues, amongst other places. "Just use mutexes" suggests a lack of experience of using them, they are very difficult to get both correct and scalable. By keeping them inside the Actor system, a lot of complexity is removed from the layers above. Actors are not always the right choice, but when they are they are a very useful and simplifying abstraction.
I've written a non-distributed app that uses the Actor model and it's been very successful. It concurrently collects data from hundreds of REST endpoints, a typical run may make 500,000 REST requests, with 250 actors making simultaneous requests - I've tested with 1,000 but that tends to pound the REST servers into the ground. Any failed requests are re-queued. The requests aren't independent, request type C may depend on request types A & B being completed first as it requires data from them, so there's a declarative dependency graph mechanism that does the scheduling.
I started off using Akka but then the license changed and Pekko wasn't a thing yet, so I wrote my own single-process minimalist Actor framework - I only needed message queues, actor pools & supervision to handle scheduling and request failures, so that's all I wrote. It can easily handle 1m messages a second.
I have no idea why that's a "huge dead end", Actors are a model that's a very close fit to my use case, why on earth wouldn't I use it? That "nurseries" link is way TL;DR but it appears to be rubbishing other options in order to promote its particular model. The level of concurrency it provides seems to be very limited and some of it is just plain wrong - "in most concurrency systems, unhandled errors in background tasks are simply discarded". Err, no.
Big Rule 0: No Dogmas: Use The Right Tool For The Job.
Is there a similarly short/simple solution not using all of the built ins? Haven't worked with prolog in a while but should be easy enough with primitives (albeit with more duplication)?
He hates on C++ pretty much the same as he does on Rust. Your argument seems to be that Rust is better than C++, which is akin to trying to make the case that Cholera is better than Smallpox.
Language wars are boring and pointless, they all have areas of suckage. The right approach is to pick whichever one is the least worst for the job at hand.
The other reason jump changes are not a revolution and have remained just a curiosity is physics, something that's ignored by the article.
For non-ringers, in change ringing the bells rotate 360 degrees each time they strike, from mouth up to mouth up. The clapper hits the bell when the it has rotated roughly 270 degrees from mouth up and is more or less horizontal, approximately 2 seconds after it starts moving. The bells are usually in the 100kg to 1000kg range (for US folks, that's 220lb to 2200lb), although they can be up to 4000kg. The only point when the ringer can exert control on the bell via the rope is when it is near the balance and mouth upwards, and speeding it up or slowing it down any more than one "beat" is physically very difficult on heavier bells, particularly if you are doing it for a full peal, which usually takes 3+ hours.
That link (like the Wikipedia article) is talking about the mechanism by which the bell is rung, not what is rung out on them i.e. they are not ringing the changes.
It's got compositions on the page, a link to a PDF with compositions in it and a link to the Veronese ringing association which has many more examples - if you can read Italian.
reply