The benchmark doesn't accurately represent the real-world database performance because the dataset is too small (roughly half a gigabyte based on [1]?), meaning it fits into the page cache bypassing disk I/O.
I made a local change so that all Put benchmarks ran O(n) O(1) sized transactions and the results were quite different: Void was the slowest, followed by LMDB, Bold, LevelDB, then Badger.
I'd also wager the LMDB author would also (lovingly!) tell us we're holding it wrong
Go doesn't use the C calling convention, but has its own growable stack system and goroutine scheduler that maps to goroutines to threads. So a goroutine can't just call a C function directly.
In order to interface with C code safely, Go's runtime has to jump to the system stack and do some additional setup, make the call, and then switch back. (Adding to that, if the call takes too long, this prevents other goroutines on the same OS thread from running, so the scheduler must jump in and move those goroutines to a different thread.)
All of this is expensive, though we are talking about nanoseconds, not milliseconds. Performance is mostly a problem when doing lots of very quick calls (e.g. you're writing a game engine interacting with something like OpenGL) or lots of slow calls (causing scheduler trashing).
No, my understanding is that Rust uses normal stacks, and it uses a classic threading model, so aside from async, calling C doesn't need to any runtime stuff.
[1]: https://github.com/voidDB/voidDB/blob/master/test/bench_test...