Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

fork() is a dumb interface, and non-portable anyway. I've yet to see a use case that couldn't be handled with either threads, or spawning another process - after all, those are the only APIs you get elsewhere.

If you need to use a language with a runtime (not just Go by any means, the likes of Python also suffer from this issue) from two processes that need to be separate but communicate with each other, do the fork first, then start the language runtime (i.e. embed the language in your parent program).



"Spawning another process" is a very complicated task. There's tons of process state that you need to set before the child starts: process group, file descriptors, signal mask, foreground status, etc.

You can try to bundle it all up into a single API like posix_spawn, but the API becomes large and it's hard to cover everything. posix_spawn sure doesn't.

fork is an elegant solution to this problem: you can run code as the child, before the child gets to start. I am not aware of any alternative that's as flexible.


> I've yet to see a use case that couldn't be handled with either threads, or spawning another process - after all, those are the only APIs you get elsewhere.

Android uses this to pre-load framework resources and code in a way that lets all applications share the backing memory. And when applications crash, they don't bring down the initial process that preloaded everything (so it can continue spawning new apps).

How would you handle that with only threads or spawning another process?


It's possible to share memory between processes that weren't originally forks - consider e.g. X clients using XShm to communicate with the server, or jk for fast communication between apache and tomcat. I guess forking lets you do "share everything, COW", which is kind of handy, but it's also a very lazy way of programming; you get access to the whole address space, so it relies on the other processes to not reuse data that doesn't make sense when shared. Better to only share memory that processes explicitly want to share, and make it clear which one owns any given region of memory.


Shared memory also means that any modifications are also shared, which is really not good in this case. We could mark the sections RO I suppose, but then we have them occasionally copying things out of the RO pages to modify them which just bloats the address space (though doesn't change the number of backing pages). It's also slightly more brittle because you have to be careful about marking everything shared RO.

> Better to only share memory that processes explicitly want to share, and make it clear which one owns any given region of memory.

We are only sharing memory that we explicitly want to share: we load only what we care about, then fork.


Uh, dynamic linking?


That requires that any shared data be included as static data inside the compiled object file, right? Very often, you want to load "code" that is really data in an application-specific format, eg. DEX files on Android, .class files for Java, script text for Python, or templates for a webserver.


It's not dumb, it's just low level. Just like theads. Applications should be programmined in application level programming languages providing higher level notions such as monitors or actors, not threads and fork(2). A forking operation, at the appplication level, is a semantic operation that must be processed explicitely and specifically by the application (by each object having underlying threads).

The proof that the problem vs. forking is not threads, is that the worse example given in the article referenced, that of file I/O, occurs as well in processes without threads (or "single-thread" processes if you want): if you write to a file from both the parent and the child, you must take precautions at the application level. This has nothing to do with threads.


It's much less of a problem in a single-threaded program because no other threads can be running at the point where you call fork(). So there's no worry about a mutex being held (assuming you don't fork with a mutex held, but don't do that), and you can know exactly which files are open at that point in time.


> I've yet to see a use case that couldn't be handled with either threads, or spawning another process

fork() allows you to set up file descriptors such as stdin and stdout before execing another process. This is essential for pipelines.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: