- vfork() is O(1)
- copying fork() is O(N) where N is the
amount of writable memory in the parent's
address space
- copy-on-write fork() is O(N) where N is
the resident set size (RSS) of the parent
O(1) beats O(N).
And O(N) is just the complexity of fork() for a single-threaded parent process. Now imagine a very busy, threaded, large-RSS process that forks a lot. You get threads and child processes stepping all over each other's CoW mappings, causing lots of page faults and copies. Ok, that is still O(N), but users will feel the added pain of all those page faults and TLB shootdowns.
Ok but you're just repeating "It's inefficient" and not saying in any way for what use is its inefficiency even noticeable. I want to reason about when I would care. You see?
The first link didn't even have units on its numbers(!) I assume they're milliseconds. When does that scale become something one would care about at all? Not launching a gui process. Not a shell pipeline. So when is this issue arising at all? What is being done that makes fork inefficiency anything other than academic interest. Must be something, right? Forking webserver?
> When does that scale become something one would care about at all? Not launching a gui process. Not a shell pipeline.
Indeed, in those cases one just does not care about performance.
Yet there are cases where one does. Imagine an orchestration system written in Java -- with lots of threads (perhaps because it might be a thread-per-client affair, or maybe just NCPU threads), with a large heap (because Java), and launching lots of small tasks as external programs. Maybe those tasks are ssh commands (ok, sure, today you could use an SSH library in Java) or build jobs (maybe your app is a CI/CD orchestrator). Now launching external jobs is the core of what this does, and now the cost of fork() bites.
So for software archtiectures that separate concerns by spawning many short-lived processes and using message passing, (which seems like a great idea, just can't think of anything that does that, would love examples if they exist) it /could/ be a factor but we have no numbers. Do you see it?
Let's just say I want to design a solution involving spawning a buttload of procesesses and pass messages back and forward. Roughly when does fork efficiency become something other than of academic concern? 10 processes per second, 1000, 100000? What does the inefficiency look like? Nothing? A stutter you might not notice? Through to everything grinds to a halt and you can't login to the box and neither will the oom killer help you.
That's a fair question. Basically, don't call fork() in Java (JNI or alike), or Java classes that do, and you might be fine, and if ever you're not, you'll know where to start looking.
Don't ever call fork from java? Not even once? And what are the consequences of calling fork? A minor stutter? Halt and catch fire? I don't java but it's hardly new tech. Surely someone has done some numbers on competing operating systems in the past couple of decades?
Until you quantify on some level, even very roughly, what the observed issue is, when you see it and how it degrades, that you're trying to optimize it's just urinating into the breeze. We might get lucky is the best outcome. The chances of it being a really good outcome are pretty limited. decrying something as "inefficient" based on big O or whatever is just meaningless until we actually do it. [1]
[1] selection sort is O(n^2) and can totally dominate O(n log n) algorithms in actual time and cycles spent depending on circumstance. We have to specify, it's not something that can be shortcut because it will likely get a terrible result.
I have had to debug slow forking cases with Java. No I can't point you at data from those. I can point you to the Microsoft paper and @famzah's posts if you want data. For Microsoft this is an important topic: they don't want to have to implement a real fork(), and I fully understand why they don't want to. My guess is they will eventually buckle and do it. fork() is not easy to implement.
> Compared to what?
vfork()
> Any numbers on that?
I added links to the gist, some of which discuss performance in detail. E.g., https://blog.famzah.net/tag/fork-vfork-popen-clone-performan... and https://bugzilla.redhat.com/show_bug.cgi?id=682922
But you can just reason about this:
O(1) beats O(N).And O(N) is just the complexity of fork() for a single-threaded parent process. Now imagine a very busy, threaded, large-RSS process that forks a lot. You get threads and child processes stepping all over each other's CoW mappings, causing lots of page faults and copies. Ok, that is still O(N), but users will feel the added pain of all those page faults and TLB shootdowns.