Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What Color is Your Function? (2015) (stuffwithstuff.com)
109 points by thinkloop on April 2, 2018 | hide | past | favorite | 45 comments


See also this reply: The Function Colour Myth (https://lukasa.co.uk/2016/07/The_Function_Colour_Myth/).


I don't see how this reply addresses the core point, it just argues that async await is useful (that's fine but not novel by itself).

It is possible to work around the "color" by using generators [0], that way one algorithm can be used in synchronous and asynchronous way. Async await tightly bind to underlying Promises.

[0]: https://curiosity-driven.org/promises-and-generators#decoupl...


I agree with the article but not with the editorialization in the title.

In the last 2 years I've mostly used Elixir and JavaScript. In Elixir, generally all functions are synchronous. In JavaScript, there's the typical two-color functions that the author describes: synchronous and asynchronous.

Elixir is nicer by far for a lot of parallel programming. Our backend can handle large loads on a single VM simply because of this.

However, the world is asynchronous. It's very nice that Elixir abstracts it away, but sometimes that's not what you want and then Elixir makes things significantly harder than JavaScript.

For example, imagine that my backend gets many parallel requests for mildly different resources. Each resource needs to query some other system (some 3rd party API, a DB, whatever), but it turns out that many of them will do exactly the same request to that other system even though the user requests they're handling are all pretty much unique. A simple solution is to cache the requests every second/minute/whatever and combine them all. This is nontrivial because I want to cache the running requests, not the responses. If there's one request running, I don't want to launch an identical request.

In Elixir, I have to carefully design a GenServer to make this work properly. Or maybe use a library such as the fantastic `memoize`[0] which is great for this, and can even do it without spawning any additional processes at all. But in any way, I have to add complexity - either complexity that I write or complexity that I import. `memoize`'s core caching code is amazing, but not trivial[1].

In JavaScript, I can just cache a bunch of promises in a Map or an object. If the promise exists and hasn't expired, don't do a new request. Else, do the request and cache the promise. Then, in either case, await the promise and use the result. It's 10 lines of boring code.

I still like Elixir better for backends and I'm happy that our backend isn't Node. But I'm writing this to underline that explicit support for asynchronicity can be a feature too.

[0] https://github.com/melpon/memoize [1] https://github.com/melpon/memoize/blob/master/lib/memoize/ca...


I'm very much a beginner at this, so apologies if I'm missing something obvious, but I find this an interesting scenario to think about (so even just to learn how you would do it in Elixir would be nice).

Why not track the fetching processes in another managing process (or ETS)? Whenever a new request comes in, you check if it there's a process already busy fetching that particular resource. If there is, ask it to send the results back to this request process too, or use whatever storage that keeps this data to satisfy both requests.

In practice I'd probably be using some kind of managing process anyways, to deal with queuing multiple requests to 3rd party provider (because rate limiting and whatnot), and I'd perhaps also have some storage and 'inform requesting party of result' logic/process in place too, so I'd not be adding too much extra complexity.

Is this a valid approach to begin with? Or does this expose me as very much an Elixir/OTP beginner :)? And if the approach works, what complexity am I missing compared to keeping track of promises in a map? Or am I overestimating how much complexity you're talking about?

I had my first taste of doing some 'real' stuff exactly with caching responses and dealing with rate limiting and whatnot, and I found that to be quite nice to do in Elixir, but this particular 'problem' strikes me as one I'll probably run into fairly soon.


> However, the world is asynchronous.

Can you give me examples of sources that explains this concept? I really don’t think the world is asynchronous, because if it were the Big Bang could happen now. Aren’t you mixing the word ‘asynchronously’ with ‘independently’?


Have you been staring ahead mindlessly for the past hour, blocked while waiting for an answer, or did you do something else?

I’m not the one making the “the world is asynchronous” claim, but I think this is what (s)he ment: in the real world, there is no “make a request and wait for an answer, doing nothing at all until you get an answer, even if one never comes”.

That need not be true, though. One could model the human brain as a blackboard with N cores, each running a single process (one for breathing, one for your heart rate, etc), and one of those cores blocking forever on such a question, with a “OOT killer” killing a low priority thread when needed. I wonder how one would go about falsifying such a model…


Waiting for your answer pretty much was like that, and yes I did a lot of completley unrelated stuff. I don’t think computers model the “real world” pretty well. Though they can model a lot of other things very well. The real world isn’t asynchronous, it’s continuous with something seemingly unrelated. If this world were asynchronous, there would be an possibility that I would have given this answer before you gave me an question. Where are the synchronizing primitives in the world making sure that this never happens?


Could you define synchronous and asynchronous as you understand them? Because I don't think your definitions match the rest of the people in this thread.


”The primary focus of this article is asynchronous control in digital electronic systems.[1][2] In a synchronous system, operations (instructions, calculations, logic, etc.) are coordinated by one, or more, centralized clock signals. An asynchronous digital system, in contrast, has no global clock. Asynchronous systems do not depend on strict arrival times of signals or messages for reliable operation. Coordination is achieved via events such as: packet arrival, changes (transitions) of signals, handshake protocols, and other methods.”

https://en.m.wikipedia.org/wiki/Asynchronous_system


The thing being discussed is https://en.wikipedia.org/wiki/Asynchronous_method_invocation:

”asynchronous method invocation (AMI), also known as asynchronous method calls or the asynchronous pattern is a design pattern in which the call site is not blocked while waiting for the called code to finish. Instead, the calling thread is notified when the reply arrives.”


Uh, no. It was about if the world is asynchronous or not. I don’t have much to say about OP, but the “world is asynchronous” usually comes up in these discussions. I think that’s really wierd and wanted more info about it since I haven’t yet seen a convincing explanation about that.


You're just misunderstanding the claim. People mean something like "most things in the world operate in a non-blocking manner". It's not some deep claim about the universe.


In this case perhaps, but it usually becomes some sort of deep claim of how the universe work. And I do think asynchronous computer execution is not even close to how the world operates, so it’s not even a good metaphor.


Facebook built Haxl[0], a library in Haskell, precisely for this use case: batching, caching and paralyzing requests against external sources.

[0] https://github.com/facebook/Haxl


I think you meant "parallelizing" but "paralyzing" often seems to be the result. ;)


This article has already been posted 3 times on HN, and have been extensively discussed the first time here: https://news.ycombinator.com/item?id=8984648


Async web servers are more complex than thread per connection ones.

I think that's the big thing the author is communicating.

It's true.

And most types of development don't need async io. It's worth talking about.

The author makes the mistake of calling this a language problem and not a web server problem.

With python, Java, c#, and hell, even js, you can have a thread per connection web server if you choose to. This will avoid many async woes.

It's just not trendy.


Except I remember my times as sysadmin, putting out fires on Tomcats brought to their knees by 1 poorly performing query and a few impatient users hitting F5.

Yes, async is indeed more complex but it is a measurable improvement over previous - thread based - designs, and it is trivial to hit scenarios where it will make a difference.

Just like anything though, it must be applied sensibly: not every single call must be decomposed into a choreography of async collaborators (yes, listen that, it’s “architecture talk”, it’s like that dark language used in Mordor that even Gandalf is scared to pronounce. Ever worked on a DDD/CQRS/ES project? ;)


> "It's just not trendy."

The "thread-per-connection" model doesn't scale. It's simply not resource-efficient in any way.


It does scale on modern hardware pretty fine... and I say that as have written tons of epoll networking layer (with a single buffer, no copy, parse off the network buffer, etc.)

Mid 2000s there were tons of benchmarks on Java NIO. Until you get into 10k connections thread's overhead and context switching is nothing to worry about. Also OS kernels got better at handling thousands of threads.

Writing a thousands connection server turns into writing a proper scheduler. The only downside of thread-per-conn are the latency non-guarantees, on short packages and long lasting connections (+the extra costs of context switches) as you rely on the OS scheduler mercy to do the right thing.


> The "thread-per-connection" model doesn't scale. It's simply not resource-efficient in any way.

That's a myth. It can be shown that async (event based) and thread based models are semantically equivalent [1]. The details about performance entirely come down to the implementation of either model.

Generally if you implement green threads, such as in Erlang, Ruby, or Go, you can do "thread per request" very efficiently and in a cache friendly way - such that you should achieve the same performance as an event based model.

In languages that rely on the OS to schedule threads, the situation is much more dire, even with the great performance of NPTL on linux. Context switches are more expensive since they have to go to the kernel. And so on.

As a programmer, I think the thread per request model is substantially easier to reason about. I think the move to async at the top layer is trend driven. Instead, languages like Java should be looking at how to make their runtimes handle threads with far less overhead.

1. http://web.cecs.pdx.edu/~walpole/class/cs533/papers/duality7...


> "In languages that rely on the OS to schedule threads, the situation is much more dire, even with the great performance of NPTL on linux. Context switches are more expensive since they have to go to the kernel. And so on."

Which is a supporting argument against using a thread-per-connection. Use a reactor instead (IOCP, epoll, etc).

> "Generally if you implement green threads, such as in Erlang, Ruby, or Go, you can do "thread per request" very efficiently and in a cache friendly way - such that you should achieve the same performance as an event based model."

"Green threads" are not operating system threads, they shouldn't be confused. The runtime will still need some way to handle blocking IO, so under the hood, you will often find some sort of reactor/thread-pool/event-driven-system making it all work (it's simply abstracted for you by the runtime). Bottom line, a "Green-thread-per-connection" is not "OS-thread-per-connection.".


> "Green threads" are not operating system threads, they shouldn't be confused. The runtime will still need some way to handle blocking IO, so under the hood, you will often find some sort of reactor/thread-pool/event-driven-system making it all work (it's simply abstracted for you by the runtime). Bottom line, a "Green-thread-per-connection" is not "OS-thread-per-connection.".

Threads are threads - they let you abstract the async details into what looks like a synchronous context. Whether they are mapped to a kernel thread or not is an implementation detail that varies across OS/threading library/language runtime. My point is simply that at the language level the thread model (regardless of implementation) is the correct abstraction. The rise of event/async frameworks at the language level is a workaround to a poor implementation, rather than fixing the implementation.


I suspect it scales plenty for what most people are actually trying to do. Especially if the cost of throwing a few more web servers at the problem is less than the extra developer time that it costs to live in the async world. You can have quite a lot of threads on a fairly cheap machine these days.


99%+ of all use cases will never need the kind of scaling that async is more ideal for.

A small fraction of 1% of all sites/services will ever push more than 10k or 20k uniques per day. With today's infrastructure you can trivially run a large number of the Quantcast US top 5,000 sites on boring old PHP 7.x (which is very fast) with MySQL or Postgres and a cost effective VPS cluster from Digital Ocean.

All of these use cases that talk about even a thousand simultaneous connections, are elite 1/10th of 1/10th of 1% of all use case scenarios.

Async is comical overkill for nearly everyone, unless you've got a very resource hungry service or a vast number of users. Exceptionally few developers ever run into that scenario. Optimizing that 5% or 10% resource savings when you're pushing 15% of the resources of a single machine, is a waste of time that should be focused on product instead.

The first site I ever got into that bracket of traffic, into the Quancast top 3k, it cost me $550 per month for two machines, using Nehalem 5500 processors and 24gb of ram. That was 2011-2012, and 1.x million uniques per month. That was using PHP 5.x and MySQL. And it was overkill by about 50%, while serving up entirely dynamic pages with zero caching and no CDN.

US traffic & usage hasn't increased much since 2012, unless you're one of the few hyper scale services. Mostly the audience growth has been stagnant, while the hardware has kept getting better. Today you can run what I was back then for $80 on two VPS machines.

Do some use cases need async? Absolutely. Do most? Not even remotely close.


This is the article I thought OP was going to be.


As far as I know Golang uses the thread-per-connection model.

Furthermore, in languages such as Erlang having an actor-based model, there are thousands and thousands of microthreads.

I think if you use lightweight threads (green threads, goroutines, etc.), the model does scale.


Lightweight threads don't count as real threads - that's a different concurrency model that may or may not map to actual OS threads.

In Go, for example, I don't think OS threads are used at all, just processes. You can set GO_MAX_PROCS=1, for instance, and have all your goroutines running on a single process, effectively making it a single threaded system like Node. In that scenario, you're back at cooperative scheduling, because control will only transfer to another goroutine if you delegate to a runtime call or call os.Gosched() which is an explicit sleep + control transfer.


This is actually a viable deployment strategy for Go programs, with a few people recommending it. The advantage is that your code cannot possibly have race conditions, even as it works with any number of goroutines. To better utilize multicore hardware, you'd use something like https://github.com/stripe/einhorn to share the incoming socket with as many Go processes as you have processor cores. Since Go's memory footprint is so low, this is actually a pretty decent way to do things.


The better example would be Ruby + Puma. If you tell puma to set WORKERS=4 and MAX_THREADS=10, for example, you'll run 4 processes with 10 threads on each, giving you an explicit 40 simultaneous requests on that machine. No more under any circumstances.


>>I think if you use lightweight threads (green threads, goroutines, etc.), the model does scale.

That's no thread - unless there is a real context switch (and cold caches) to pay for.


Ultimately the problem is "the system needs to retain enough state per connection to resume executing the 'next' step of the communication".

This state is currently split into different pieces and spread around the system. The operating system maintains the TCP state, for example. If you use threads to partition connections, the operating system looks after the instruction pointer to resume a non-running thread at. The program (whatever language) will usually have some kind of stream state and maybe buffers.

It's all a question of where you want to put the abstraction boundaries. Heavy threads? Light threads? Async? Continuations? Or go the other way and have process-per-core with userland drivers and TCP state, where each process is just listening for interrupts and handles everything from there?


> Or go the other way and have process-per-core with userland drivers and TCP state, where each process is just listening for interrupts and handles everything from there?

A kernel. You are describing a kernel, minus workers, plus a web server.



Everytimee I have to wade into Node, I feel like they are reinventing the early days of programming with cooperative multi-tasking.


With green threads, I think you can have async io but with threads. When your current thread is making an async call, suspend the green thread (cheap) and yield back to the event loop. Threads are woken up when their async call finishes. This avoids the overhead of real OS threads per connection.


This seems like a good faith comment. I'm not really sure why it's getting downvoted. Someone care to shed some light?


Nowadays people downvote because they disagree. It's sad because it pointlessly frustrates the poster and kills discussions.


It's not just 'nowadays'. pg a decade ago: https://news.ycombinator.com/item?id=117171

> I think it's ok to use the up and down arrows to express agreement.


From my personal point of view, it's happening more in the last year than it did before.


Just the other day I was looking over my notes from (horror of horrors) 3 years ago on core problems to avoid when designing a language, and this article came up. Still just as good a read as I remembered, made all the more pertinent by the intervening time in which I have repeatedly cursed python's implementation for the massive friction burns it has caused me. Probably needs a (2015) on it for context, but I consider it timeless!


  > Wanna know one that doesn’t? Java.
Unless you use Vert.x of course.

A year ago I helped a friend on a big project in Java for which he was using Vert.x. Suddenly I find myself doing Javascript-style programming in Java. Convenient for calls to other services (a payment provider, for example), but indeed a bit cumbersome if you're used to Java being synchronous.


Please add a 2015 tag, this is old and has previously been extensively discussed.


Fibers (coroutines) solves most of the criticism for JS. Although not part of the language and you need to extend v8, it's a viable option for webservers. I believe that's how the Meteor framework has been doing it years before async/await or even promises was a thing.


And a great sage would say that it all boils down to category theory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: