Async web servers are more complex than thread per connection ones. I think that...

eecc · on April 2, 2018

Except I remember my times as sysadmin, putting out fires on Tomcats brought to their knees by 1 poorly performing query and a few impatient users hitting F5.

Yes, async is indeed more complex but it is a measurable improvement over previous - thread based - designs, and it is trivial to hit scenarios where it will make a difference.

Just like anything though, it must be applied sensibly: not every single call must be decomposed into a choreography of async collaborators (yes, listen that, it’s “architecture talk”, it’s like that dark language used in Mordor that even Gandalf is scared to pronounce. Ever worked on a DDD/CQRS/ES project? ;)

xtrapolate · on April 2, 2018

> "It's just not trendy."

The "thread-per-connection" model doesn't scale. It's simply not resource-efficient in any way.

xxs · on April 2, 2018

It does scale on modern hardware pretty fine... and I say that as have written tons of epoll networking layer (with a single buffer, no copy, parse off the network buffer, etc.)

Mid 2000s there were tons of benchmarks on Java NIO. Until you get into 10k connections thread's overhead and context switching is nothing to worry about. Also OS kernels got better at handling thousands of threads.

Writing a thousands connection server turns into writing a proper scheduler. The only downside of thread-per-conn are the latency non-guarantees, on short packages and long lasting connections (+the extra costs of context switches) as you rely on the OS scheduler mercy to do the right thing.

nvarsj · on April 2, 2018

> The "thread-per-connection" model doesn't scale. It's simply not resource-efficient in any way.

That's a myth. It can be shown that async (event based) and thread based models are semantically equivalent [1]. The details about performance entirely come down to the implementation of either model.

Generally if you implement green threads, such as in Erlang, Ruby, or Go, you can do "thread per request" very efficiently and in a cache friendly way - such that you should achieve the same performance as an event based model.

In languages that rely on the OS to schedule threads, the situation is much more dire, even with the great performance of NPTL on linux. Context switches are more expensive since they have to go to the kernel. And so on.

As a programmer, I think the thread per request model is substantially easier to reason about. I think the move to async at the top layer is trend driven. Instead, languages like Java should be looking at how to make their runtimes handle threads with far less overhead.

1. http://web.cecs.pdx.edu/~walpole/class/cs533/papers/duality7...

xtrapolate · on April 2, 2018

> "In languages that rely on the OS to schedule threads, the situation is much more dire, even with the great performance of NPTL on linux. Context switches are more expensive since they have to go to the kernel. And so on."

Which is a supporting argument against using a thread-per-connection. Use a reactor instead (IOCP, epoll, etc).

> "Generally if you implement green threads, such as in Erlang, Ruby, or Go, you can do "thread per request" very efficiently and in a cache friendly way - such that you should achieve the same performance as an event based model."

"Green threads" are not operating system threads, they shouldn't be confused. The runtime will still need some way to handle blocking IO, so under the hood, you will often find some sort of reactor/thread-pool/event-driven-system making it all work (it's simply abstracted for you by the runtime). Bottom line, a "Green-thread-per-connection" is not "OS-thread-per-connection.".

nvarsj · on April 2, 2018

> "Green threads" are not operating system threads, they shouldn't be confused. The runtime will still need some way to handle blocking IO, so under the hood, you will often find some sort of reactor/thread-pool/event-driven-system making it all work (it's simply abstracted for you by the runtime). Bottom line, a "Green-thread-per-connection" is not "OS-thread-per-connection.".

Threads are threads - they let you abstract the async details into what looks like a synchronous context. Whether they are mapped to a kernel thread or not is an implementation detail that varies across OS/threading library/language runtime. My point is simply that at the language level the thread model (regardless of implementation) is the correct abstraction. The rise of event/async frameworks at the language level is a workaround to a poor implementation, rather than fixing the implementation.

vosper · on April 2, 2018

I suspect it scales plenty for what most people are actually trying to do. Especially if the cost of throwing a few more web servers at the problem is less than the extra developer time that it costs to live in the async world. You can have quite a lot of threads on a fairly cheap machine these days.

adventured · on April 2, 2018

99%+ of all use cases will never need the kind of scaling that async is more ideal for.

A small fraction of 1% of all sites/services will ever push more than 10k or 20k uniques per day. With today's infrastructure you can trivially run a large number of the Quantcast US top 5,000 sites on boring old PHP 7.x (which is very fast) with MySQL or Postgres and a cost effective VPS cluster from Digital Ocean.

All of these use cases that talk about even a thousand simultaneous connections, are elite 1/10th of 1/10th of 1% of all use case scenarios.

Async is comical overkill for nearly everyone, unless you've got a very resource hungry service or a vast number of users. Exceptionally few developers ever run into that scenario. Optimizing that 5% or 10% resource savings when you're pushing 15% of the resources of a single machine, is a waste of time that should be focused on product instead.

The first site I ever got into that bracket of traffic, into the Quancast top 3k, it cost me $550 per month for two machines, using Nehalem 5500 processors and 24gb of ram. That was 2011-2012, and 1.x million uniques per month. That was using PHP 5.x and MySQL. And it was overkill by about 50%, while serving up entirely dynamic pages with zero caching and no CDN.

US traffic & usage hasn't increased much since 2012, unless you're one of the few hyper scale services. Mostly the audience growth has been stagnant, while the hardware has kept getting better. Today you can run what I was back then for $80 on two VPS machines.

Do some use cases need async? Absolutely. Do most? Not even remotely close.

kthejoker2 · on April 2, 2018

This is the article I thought OP was going to be.

terminalcommand · on April 2, 2018

As far as I know Golang uses the thread-per-connection model.

Furthermore, in languages such as Erlang having an actor-based model, there are thousands and thousands of microthreads.

I think if you use lightweight threads (green threads, goroutines, etc.), the model does scale.

sudhirj · on April 2, 2018

Lightweight threads don't count as real threads - that's a different concurrency model that may or may not map to actual OS threads.

In Go, for example, I don't think OS threads are used at all, just processes. You can set GO_MAX_PROCS=1, for instance, and have all your goroutines running on a single process, effectively making it a single threaded system like Node. In that scenario, you're back at cooperative scheduling, because control will only transfer to another goroutine if you delegate to a runtime call or call os.Gosched() which is an explicit sleep + control transfer.

sudhirj · on April 2, 2018

This is actually a viable deployment strategy for Go programs, with a few people recommending it. The advantage is that your code cannot possibly have race conditions, even as it works with any number of goroutines. To better utilize multicore hardware, you'd use something like https://github.com/stripe/einhorn to share the incoming socket with as many Go processes as you have processor cores. Since Go's memory footprint is so low, this is actually a pretty decent way to do things.

sudhirj · on April 2, 2018

The better example would be Ruby + Puma. If you tell puma to set WORKERS=4 and MAX_THREADS=10, for example, you'll run 4 processes with 10 threads on each, giving you an explicit 40 simultaneous requests on that machine. No more under any circumstances.

xxs · on April 2, 2018

>>I think if you use lightweight threads (green threads, goroutines, etc.), the model does scale.

That's no thread - unless there is a real context switch (and cold caches) to pay for.

pjc50 · on April 2, 2018

Ultimately the problem is "the system needs to retain enough state per connection to resume executing the 'next' step of the communication".

This state is currently split into different pieces and spread around the system. The operating system maintains the TCP state, for example. If you use threads to partition connections, the operating system looks after the instruction pointer to resume a non-running thread at. The program (whatever language) will usually have some kind of stream state and maybe buffers.

It's all a question of where you want to put the abstraction boundaries. Heavy threads? Light threads? Async? Continuations? Or go the other way and have process-per-core with userland drivers and TCP state, where each process is just listening for interrupts and handles everything from there?

zbentley · on April 2, 2018

> Or go the other way and have process-per-core with userland drivers and TCP state, where each process is just listening for interrupts and handles everything from there?

A kernel. You are describing a kernel, minus workers, plus a web server.

pjc50 · on April 2, 2018

I was thinking of things like https://shader.kaist.edu/mtcp/ , or https://github.com/snabbco/snabb or these kind of things: https://jvns.ca/blog/2016/06/30/why-do-we-use-the-linux-kern...

2sk21 · on April 2, 2018

Everytimee I have to wade into Node, I feel like they are reinventing the early days of programming with cooperative multi-tasking.

bennofs · on April 2, 2018

With green threads, I think you can have async io but with threads. When your current thread is making an async call, suspend the green thread (cheap) and yield back to the event loop. Threads are woken up when their async call finishes. This avoids the overhead of real OS threads per connection.

hliyan · on April 2, 2018

This seems like a good faith comment. I'm not really sure why it's getting downvoted. Someone care to shed some light?

emteycz · on April 2, 2018

Nowadays people downvote because they disagree. It's sad because it pointlessly frustrates the poster and kills discussions.

steveklabnik · on April 2, 2018

It's not just 'nowadays'. pg a decade ago: https://news.ycombinator.com/item?id=117171

> I think it's ok to use the up and down arrows to express agreement.

emteycz · on April 2, 2018

From my personal point of view, it's happening more in the last year than it did before.