But doesn't the program have to be written explicitly anyway? Like what happens if I open a file or network connection, yield, and then resume on another system?
It's probably better to let the user/application decide what to do in these cases, and for this reason we allow them to register type-specific (de)serialization routines.
In the case of network connections, the user could instead serialize connection details and then recreate the connection when deserializing the coroutine state. Same thing for files, where instead of serializing unstable low-level details like the file descriptor, the user can instead serialize higher level information (path, open flags, etc) and recreate the open file when deserializing the coroutine state.
The library provides a way to serialize coroutine state, and to later deserialize that state and resume a coroutine from its last yield point. Where you store this state (in a DB, in a queue, etc) is up to you!
A source-to-source compiler, and a library that bundles runtime implementation details, was the path of least resistance. We'd love to integrate this with the Go compiler (`go build -durable`, vs. `coroc && go build -tags durable`), but the compiler is closed and maintaining a fork of Go is not feasible for us at this time.
Vector length agnostic programming has its own share of problems. I'm not familiar with the RISC-V V extension, but I assume it's similar to ARM's SVE. There's a good critical look at SVE and VLA here: https://gist.github.com/zingaburga/805669eb891c820bd220418ee...
I'm curious why you say they are very different? From where I sit, RVV also supports mask-like predication, and adds two concepts: LMUL (in-HW unrolling of each instruction) plus the ability to limit operations to a given number of elements.
The former is nifty, though intended for single-issue machines, and the latter seems redundant because masks can also do that.
We took a similar approach in our JSON decoder. We needed to support sets (JSON object keys) that aren't necessarily known until runtime, and strings that are up to 16 bytes in length.
We got better performance with a linear scan and SIMD matching than with a hash table or a perfect hashing scheme.
This is immediately what I thought of. If your set is small enough you should be able to store them in a linear block of memory which can fit into L1 cache.
I read an interesting blog post about doing a similar thing at AWS scale a while back:
“ No parent will name a favorite among their children. But I do have one among my brainchildren, my software contributions over the decades: The event-streaming code I helped build at AWS. After rage-quitting I missed it so much that over the last few months, I wrote a library (in Go) called Quamina (GitHub) that does some of the same things. This is about that.
Quamina offers an API to a construct called a “Matcher”. You add one or a hundred or a million “Patterns” to a Matcher then feed “Events” (data objects with possibly-nested fields and values) to it, and it will return you an array (possibly empty) of the Patterns each Event matches. Both Patterns and Events are represented by JSON objects (but it should be easy to support other Event encodings).
Quamina (and here I beg pardon for a bit of chest-pounding) is really freaking fast. But what’s more interesting is that its speed doesn’t depend much on the number of Patterns that have been added. (Not strictly speaking O(1), but pretty close.) ”
What if there were tools to inspect and debug the coroutine state? That's an area we're exploring now.