Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is something about fork which I have never understood. Maybe someone here can explain it to me.

Why would anyone ever want fork as a primitive? It seems to me that what you really want is a combination of fork and exec because 99% of the time you immediately call exec after fork (at least that's what I do 99% of the time when I use fork). If you know that you're going to call exec immediately after fork, then all the issues of dealing with the (potentially large) address space of the parent just evaporate because the child process is just going to immediately discard it all.

So why is there not a fork-exec combo? And why has it not replaced fork for 99% of use cases?

And as long as I'm asking stupid questions, why would anyone ever use vfork? If the child shares the parent's address space and uses the same stack as the parent, and the parent has to block, how is that different from a function call (other than being more expensive)?

None of this makes sense to me.



Because there are many, many use cases where you don't want to call exec() immediately after fork().

Want to constrain memory usage or CPU time of an arbitrary child process? You have to call setrlimit() before exec(). Privilege separation? Call setuid() before exec(). Sandbox an untrusted child process in some way? Call seccomp() (or your OS equivalent) before exec(). And so on and so forth. Any time you want to change what OS resources the child process will have access to, you'll need to do some set-up work before invoking exec().


Windows solves this by adding a bunch of optional parameters to CreateProcess, as well as having two more variants (CreateProcessAsUser and CreateProcessWithLogon). Some of the arguments are complicated enough that they have helper functions to construct them.

I like the more composable fork()->modify->exec() approach of unix, but I wouldn't call either of them really elegant.


That's one option, yes.

The one I've favored while reading these arguments has been the "suspended process" model. The primitives are CREATE(), which takes an executable as a parameter and returns the PID of a paused process, and START(), which allows the process to actually run.

Unix already has the concept of a paused executable, after all.

This model also requires all the process-mutation syscalls, like setrlimit(), to accept a PID as a parameter, but prlimit() wound up being created anyway, because the ability to mutate an already-running process is useful.


A third way is to grant the parent process access to the child such that they can use the child process handle to "remotely" set restrictions, write memory, start a thread, etc.


Practically, syscall overhead has gotten in the way of that being the ubiquitous in the past. Here's to hoping that newer models of syscalls that reduce kernel/user overhead make such a thing possible.


To me this feels like a call for more powerful language primitives. i.e. a way to specify some action to take to "set up" the child process that's more explicit and readable than one special behaving in a particularly odd way. I'm imagining closures with some kind of Rust-like move semantics, but not entirely sure.

(if we're speaking in terms of greenfield implementation of OS features)


Yeah, this. Why not mkprocess/exec instead of fork/exec?


Builder patterns for primitives? I think that seems super cool but then aren't you just building a new language?


But my child processes are not arbitrary or untrusted, they're hard-coded and written by me!

I'm not writing a shell, I'm writing an application!


Dennis Richie addresses this in a history of early Unix: https://www.bell-labs.com/usr/dmr/www/hist.html

"Process control in its modern form was designed and implemented within a couple of days. It is astonishing how easily it fitted into the existing system; at the same time it is easy to see how some of the slightly unusual features of the design are present precisely because they represented small, easily-coded changes to what existed. A good example is the separation of the fork and exec functions. The most common model for the creation of new processes involves specifying a program for the process to execute; in Unix, a forked process continues to run the same program as its parent until it performs an explicit exec. The separation of the functions is certainly not unique to Unix, and in fact it was present in the Berkeley time-sharing system [2], which was well-known to Thompson. Still, it seems reasonable to suppose that it exists in Unix mainly because of the ease with which fork could be implemented without changing much else."


OK, but why has it not be replaced with something better in the intervening 50 years? There have been a lot of improvements to unix since 1970. Why not this?


It was; ~20 years ago we got posix_spawn(3).


It was! vfork() was added to BSD because fork() sucks.

But then someone very opinionated wrote "vfork() Considered Dangerous" and too many people accepted that incorrect conclusion.


I mean, I've personally had to fix cves from vfork(2) being a such footgun; so I wouldn't consider it a "incorrect conclusion".


Which CVEs?


There is exactly a fork-exec combo like that: it's called posix_spawn(): https://man7.org/linux/man-pages/man3/posix_spawn.3.html

I think the reason for fork() and exec() as primitives goes back to the early days Unix design philosophy. Unix tends to favour "easy and simple for the OS to implement" rather than "convenient for user processes to use". (For another example of that, see the mess around EINTR.) fork() in early unix was not a lot of code, and splitting into fork/exec means two simple syscalls rather than needing a lot of extra fiddly parameters to set up things like file descriptors for the child.

There's a bit on this in "The Evolution of the UNIX Time-Sharing System" at https://www.bell-labs.com/usr/dmr/www/hist.html -- "The separation of the functions is certainly not unique to Unix, and in fact it was present in the Berkeley time-sharing system [2], which was well-known to Thompson. Still, it seems reasonable to suppose that it exists in Unix mainly because of the ease with which fork could be implemented without changing much else." It says the initial fork syscall only needed 27 lines of assembly code...

(Edit: I see while I was typing that other commenters also noted both the existence of posix_spawn and that quote...)


> Unix tends to favour "easy and simple for the OS to implement"

Well, yeah, but the whole problem here, it seems to me, is that fork is not simple to implement precisely because it combines the creation of the kernel data structures required for a process with the actual initiation of the process. Why not mkprocess, which creates a suspended process that has to be started with a separate call to exec? That way you never have to worry about all the hairy issues that arise from having to copy the parent's process memory state.


It was simple specifically for the people writing it at the time. We know this, because they've helpfully told us so :-) It might or might not have been harder than a different approach for some other programmers writing some other OS running on different hardware, but the accidents of history mean we got the APIs designed by Thompson, Ritchie, et al, and so we get what they personally found easy for their PDP7/PDP11 OS...


fork() was trivial to implement back then. It became non-trivial later when RAM sizes and resident set sizes too increased.


> why would anyone ever want fork as a primitive

Long ago in the far away land of UNIX, fork was a primitive because the primary use of fork was to do more work on the system. You likely were one of thee or four other people, at any given moment vying for CPU time, and it wasn't uncommon to see loads of 11 on a typical university UNIX system.

> so why is there not a fork-exec combo

you're looking for system(3). Turns out, most people waitpid(fork()). Windows explicitly handles this situation with CreateProcess[0] which does a way better job of it than POSIX does (which, IMO, is the standard for most of the win32 API, but that's a whole can of worms I won't get into).

> why would anyone ever use vfork?

Small shells, tools that need the scheduling weight of "another process" but not for long, etc. See also, waitpid(fork()).

When you have something with MASSIVE page tables, you don't want to spend the time copying the whole thing over. There's a huge overhead to that.

[0] https://docs.microsoft.com/en-us/windows/win32/api/processth...


system(3) is not a good alternative because it indirects through the shell, which adds the overhead of launching the shell as well as the danger of misinterpreting shell metacharacters in the command if you aren’t meticulous about escaping them correctly.


`fork` is a classic example, as others have mentioned, as something that was implemented because it was [at the time] easy rather than because it was a good design. In the decades since, we've found there are issues that are caused by the semantics of fork, especially if the most common subsequent system call is `exec`.

If you're designing an OS from scratch, support for `fork` and `exec` as separate system calls is not what you want. Instead, you'd be likely to describe something in terms of a process creation system call, which will have eleventy billion parameters governing all of the attributes of the spawned process.

POSIX specifies a fork+exec combo called posix_spawn. This is actually used a fair amount, but the reason it isn't used more is because it doesn't support all of the eleventy-billion parameters governing all of the attributes of the spawned process. Instead, these parameters are usually set by calling system calls that change these parameters between fork and exec. These system calls might, for example, change the root directory of a process or attach a debugger. Neither of these are supported by posix_spawn, which only allows the common operations of changing the file descriptors or resetting the signal mask in the list of actions to do.

And this suggests why you might want vfork: vfork allows you write something that looks like posix_spawn: you get to fork, do your new-process-attribute-setting-flags, and then exec to the new process image, all while being able to report errors in the same memory space.


> If you're designing an OS from scratch, support for `fork` and `exec` as separate system calls is not what you want. Instead, you'd be likely to describe something in terms of a process creation system call, which will have eleventy billion parameters governing all of the attributes of the spawned process.

Or if you happen to be sane you'll have a single, simple system call to create a blank, suspended child process, and all the regular system calls which operate on process state will take a handle or process "file descriptor" to indicate which process to modify rather than assuming the current process as the target.

This was the ultimate flaw of posix_spawn(). As you point out it doesn't support all the things you might want to tweak in the child process—a consequence of trying to capture every aspect of the initial process state in a single process-creation API rather than distributing the work through the normal system calls so that each new interface or state can be adjusted for child processes in the same way that it's adjusted for the current process.

Whatever you do, though, make sure it's possible to emulate fork() reliably with your "better" replacement. Consider the case of Cygwin where emulated fork() calls can (and frequently do) fail in bizarre ways because the "blank" child process was pre-loaded with some unexpected virtual memory mapping by AV software or other system tasks, with the result that a required DLL or private memory space can't be set up at same address used in the parent.


To be fair, posix_spawn() is extensible. New attributes, etc. can be added. And there are a number of extensions for it, too. Illumos has some.


Most APIs can be extended. The problem is that when someone adds a new tunable parameter or resource that one might want to modify for a child process it doesn't automatically get added to posix_spawn()—that takes extra effort. Which is why I emphasized using the same APIs for the current process and child processes, rather than duplicating the work in two places.


> Why would anyone ever want fork as a primitive?

fork() without exec() can make sense in the context of a process-per-connection application server (like SSH). I've also used it quite effectively as a threading alternative in some scripting languages.

> So why is there not a fork-exec combo?

There is; it's called posix_spawn(). Like a lot of POSIX APIs, it's kind of overcomplicated, but it does solve a lot of the problems with fork/exec.

> And as long as I'm asking stupid questions, why would anyone ever use vfork?

For processes with a very large address space, fork() can be an expensive operation. vfork() avoids that, so long as you can guarantee that it'll immediately be followed by an exec().


fork with copy-on-write semantics avoids copying the whole address space. It does have to copy some data structures that manage virtual memory and maybe the first level of the paging structure(page directory or whatever).


copy-on-write == slow when called from threaded processes with large resident set sizes.


Can you elaborate on this? I understand why copying a large address space might be slow but how or why does the number of threads in a process affects this? Is it scheduling?


Copy-on-write means twiddling with the MMU, and TLB updates across cores ("TLB shootdowns") can be very expensive. If the process is not threaded, then the OS could make sure to schedule the child and parent on the same CPU to avoid needing TLB shootdowns, but if it's threaded, forget about it.


From "Operating Systems: Three Easy Pieces" chapter on "Process API" (section 5.4 "Why? Motivating The API") [1]:

    ... the separation of fork() and exec() is essential in building a UNIX shell,
    because it lets the shell run code after the call to fork() but before the call
    to exec(); this code can alter the environment of the about-to-be-run program,
    and thus enables a variety of interesting features to be readily built.

    ...

    The separation of fork() and exec() allows the shell to do a whole bunch of
    useful things rather easily. For example:

      prompt> wc p3.c > newfile.txt
    
    In the example above, the output of the program wc is redirected into the output
    file newfile.txt (the greater-than sign is how said redirection is indicated).
    The way the shell accomplishes this task is quite simple: when the child is
    created, before calling exec(), the shell closes standard output and opens the
    file newfile.txt. By doing so, any output from the soon-to-be-running program wc
    are sent to the file instead of the screen.
[1] https://pages.cs.wisc.edu/~remzi/OSTEP/cpu-api.pdf


As an explanation it doesn't make much sense, because there are other ways to alter the environment of the about-to-be-run program (see any non-Unix OS for examples).


Because "fork" was easy to implement in UNIX on the PDP-11.

The original implementation was for a machine with very limited memory. So fork worked by swapping out the process. But then, instead of releasing the in-memory copy, the kernel duplicated the process table entry. So there were now two copies of the process, one in memory and one swapped out. Both were runnable, even if there wasn't enough memory for both to fit at once. Both executed onward from there.

And that's why "fork" exists. It was a cram job to fit in a machine with a small address space.


> So why is there not a fork-exec combo?

posix_spawn

> Why would anyone ever want fork as a primitive?

With fork you can very easily write a sever like mini_httpd:

https://acme.com/software/mini_httpd/

Or, in Unix shells:

  # function1 and funtion2 are shell functions

  $ function1 | grep foo | function2 
here, the shell must fork a process (without exec) to run one of these functions.

For instance function1 might run in a fork, the grep is a fork and exec of course, and function2 could be in the shell's primary process.

In the POSIX shell language, fork is so tightly integrated that you can access it just by parenthesizing commands:

  $ (cd /path/to/whatever; command) && other command
Everything in the parentheses is a sub-process; the effect of the cd, and any variable assignments, are lost (whether exported to the environment or not).

In Lisp terms, fork makes everything dynamically scoped, and rebinds it in the child's context: except for inherited resources like signal handlers and file descriptors.

Imagine every memory location having *earmuffs* like a defvar, and being bound to its current value by a giant let, and imagine that being blindingly efficient to do thanks to VM hardware.


I use fork a lot in my Python science programs. It's really great - you can stick it in a loop and get immediate parallelism. It's much better than multiprocessing, etc, as you keep the state from just before the fork happened, so you can share huge data structures between the processes, without having to process the same data again or duplicate them. I've even written a module for processing things in forked processes: https://pypi.org/project/forkqueue/


Splitting fork and exec allows you to do stuff before calling exec, for example redirecting file descriptors (like stdin/out/err), creating a pipe, modifying the child's environment, and so on.


(This is particularly useful for shells.)


These can all be made a part of the combined fork+exec API.


That would be the fugliest, most unwieldy API in history. In addition to the two most basic things I mentioned, there are namespaces, control groups, setuid/setgid, and probably a billion other things I can't think of.


Sure, just look at Win32 CreateProcessW.

But that's the price you pay for an API that doesn't have footguns.


> Why would anyone ever want fork as a primitive?

> So why is there not a fork-exec combo?

There are so many variations to what you can do with fork+exec that designing a suitable "fork-exec combo" API is really difficult, so any attempts tend to yield a fairly limited API or a very difficult-to-use API, and that ends up being very limiting to its consumers.

On the flip side, fork()+exec() made early Unix development very easy by... avoiding the need to design and implement a complex spawn API in kernel-land.

Nowadays there are spawn APIs. On Unix that would be posix_spawn().

> And as long as I'm asking stupid questions, why would anyone ever use vfork? If the child shares the parent's address space and uses the same stack as the parent, and the parent has to block, how is that different from a function call (other than being more expensive)?

(Not a stupid question.)

You'd use vfork() only to finish setting up the child side before it execs, and the reason you'd use vfork() instead of fork() is that vfork()'s semantics permit a very high performance implementation while fork()'s semantics necessarily preclude a high performance implementation altogether.


Well, fork() is simple. No args, simple semantics.

Flexibility; you can set up pipes.

> why is there not a fork-exec combo

There is, the spawn calls mentioned.


I think it's actually a pretty useful primitive for doing multiprocessing. Unlike threading, you have a completely separate memory space both for avoiding data races and performance (memory allocators still aren't perfect and weird stuff can happen with cache lines). Unlike exec after fork or anything equivalent, you still get to share things like file descriptors and read only memory for convenience.


> Why would anyone ever want fork as a primitive? It seems to me that what you really want is a combination of fork and exec because 99% of the time you immediately call exec after fork (at least that's what I do 99% of the time when I use fork).

If you eliminate fork, then what do you do for those 1% of cases where you actually do need it? I agree that it's uncommon, but I have written code before that calls fork() but then does not exec().

> So why is there not a fork-exec combo?

There is; it's called posix_spawn(3).

> And why has it not replaced fork for 99% of use cases?

Even though it's been around for about 20 years, it's still newer than fork+exec, so I assume a) many people just don't know about it, or b) people still want to go for maximum compatibility with old systems that may not have it, even if that's a little silly.


Lacking fork(), if you want to multi-process a service, you have to spawn (vfork()+exec() or posix_spawn(), or whatever) the processes and arrange for them to get whatever state and resources they need to start up. It's a pain, but I've done it.


You might want to move around some file descriptors if you don't want the child process to inherit your stdin/stdout/stderr (e.g. if you want to read the stdout of the process you launched, or give it some stdin).

And there does exist such a fork-exec combo - posix_spawn. It allows adding some "commands" of what file descriptor operations to do between the fork & exec before they're ever done, among some other things. But, as the article mentions, using it is annoying - you have to invoke various posix_spawn_file_actions_* functions, instead of the regular C functions you'd use.


> 99% of the time you immediately call exec after fork

What about forking servers? listen() and then immediately fork() to handle the inbound connection? Those don't need exec.

Also daemons. It's a common pattern to ditch permissions and then fork(), as per the old "Linux Daemon Writing HOWTO".


You can vfork()+exec(), why not? Exec too expensive? You can prefork[0].

  [0] https://github.com/elric1/prefork


Do people really do that? It sounds like a huge DOS vulnerability to me.


>So why is there not a fork-exec combo?

There is, posix_spawn.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: