I was always disappointed by the performance of fork()/clone(). CompSci class to...

vgel · on Feb 28, 2022

CS classes (and, far too often, professional programmers) talk about computers like they're just faster PDP-11s with fundamentally the same performance characteristics.

mark_undoio · on Feb 28, 2022

Agreed that these costs can be larger than is perhaps implied in compsci classes (though it's possible that they've changed their message since I took them!)

I suppose it is still essentially free for some common uses - e.g. if a shell uses `fork()` rather than one of the alternatives it's unlikely to have a very big address space, so it'll still be fast.

My experience has been that big processes - 100+GB - which are now pretty reasonable in size really do show some human-perceptible latency for forking. At least tens of milliseconds matches my experience (I wouldn't be surprised to see higher). This is really jarring when you're used to thinking of it as cost-free.

The slowdown afterwards, resulting from copy-on-write, is especially noticeable if (for instance) your process has a high memory dirtying rate. Simulators that rapidly write to a large array in memory are a good example here.

When you really need `fork()` semantics this could all still be acceptable - but I think some projects do ban the use of `fork()` within a program to avoid unexpected costs. If you really have a big process that needs to start workers I guess it might be worth having a small daemon specifically for doing that.

cryptonector · on Feb 28, 2022

Right, shells are no threaded and they tend to have small resident set sizes. Even in shells though, there's no reason not to use vfork(), and if you have a tight loop over starting a bunch of child processes, you might as well use it. Though, in a shell, you do need fork() in order to trivially implement sub-shells.

fork() is most problematic for things like Java.

smasher164 · on Feb 28, 2022

Also, mandating copy-on-write as an implementation strategy is a huge burden to place on the host. Now you’ve made the amount of memory a process is is using unquantifiable.

vgel · on Feb 28, 2022

It's not necessarily unquantifiable -- the kernel can count the not-yet-copied pages pessimistically as allocated memory, triggering OOM allocation failures if the amount of potential memory usage is greater than RAM. IIUC, this is how Linux vm.overcommit_memory[1] mode 2 works, if overcommit_ratio = 100.

However, if an application is written to assume that it can fork a ton and rely on COW to not trigger OOM, it obviously won't work under mode 2.

[1] https://www.kernel.org/doc/Documentation/vm/overcommit-accou...

> 2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable amount (default is 50%) of physical RAM.

> Depending on the amount you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate.

> Useful for applications that want to guarantee their memory allocations will be available in the future without having to initialize every page.

smasher164 · on Feb 28, 2022

You're right, "unquantifiable" was the wrong word here. I meant, a program has no real way of predicting/reacting to OOM. I didn't realize mode 2 with overcommit_ratio = 100 behaved that way, thanks for sharing.

vgel · on Feb 28, 2022

Yeah I think in a practical sense you're right, since AFAIK using mode 2 is fairly rare because most software assumes overcommit, and even if a program is written with an understanding that malloc can return NULL, its in the sense of

    if (!(ptr = malloc(...))) { exit(1); }

cryptonector · on Feb 28, 2022

POSIX doesn't require that fork() be implemented using copy-on-write techniques. An implementation is free to copy all of the parent's writable address space.

int_19h · on March 1, 2022

An implementation of fork() that doesn't do CoW would have borderline unusable perf in many real-world scenarios.

cryptonector · on March 1, 2022

If the parent is a JVM, for sure. But a copy-on-write fork() still doesn't perform well. The point isn't to just copy the whole parent. The point is to stop copying at all.

immibis · on Feb 28, 2022

You also mandate a system complex enough to have an MMU.

cryptonector · on Feb 28, 2022

Copy-on-write is supposed to be cheap, but in fact it's not. MMU/TLB manipulations are very slow. Page faults are slow. So the common thing now is to just copy the entire resident set size (well, the writable pages in it), and if that is large, that too is slow.