Announcing GenStage

hosh · on July 14, 2016

Yep, I wished I had had access to this back when I was working on something in the Rails side. Ingesting large amounts of data from Shopify across multiple oauth access points while being limited to the PostgreSQL backend and Shopify rate limit restrictions was a pain to setup with Redis and Sidekiq. I ended up forking Sidekiq to accomplish that -- essentially using Sidekiq as a concurrency framework to build in a different set of behavior. Reading through the GenStage document, I could have implemented something better.

The way I implemented it (with Ruby) left gaps of idle time, and huge amounts of data gets staged through the queues. I couldn't figure out how to make it better. Since I only had a week to come up with something while the infrastructure was melting down, at the time, it was acceptable. Looking at this though, the demand-based backpressure would work very well.

There are a couple problems I think I can use this in my current work -- this is great work!

danvayn · on July 14, 2016

Install the Bullet gem and see if your queries look sharp!

hosh · on July 14, 2016

Already did. That wasn't the issue. I get that you're trying to help, but you don't really understand the problem and why GenStage would help it. The issue isn't N+1 queries but in concurrency and reasoning with concurrency, something that Rails, and many Rails developers seem to avoid.

See also: https://news.ycombinator.com/item?id=12074425

lucidstack · on July 14, 2016

This looks incredible, congratulations to the Elixir team!

Perhaps more exciting than the first part of the post is the second bit about the future. It's fantastic to see such a clear path forward for concurrency in Elixir. Definitely looking forward to GenStage.Flow

lpgauth · on July 14, 2016

Curious, how is the back pressure actually implemented? ACK message or ETS public table?

strmpnk · on July 14, 2016

It's demand based. The consumer will request a fixed size of input and then the producer will send up to but no more than that until more demand is made.

josevalim · on July 14, 2016

In addition to that, demand is asynchronous: the consumer can request for more events, at any amount, any time it wants to.

The default implementation requests more when half of the current demand is processed. So if you set :max_demand to 100, when the consumer has processed 50 events, we request 50 more.

lpgauth · on July 14, 2016

Sure it's demand based, but how is it implemented? How is the demand communicated between the processes?

josevalim · on July 14, 2016

I am on my phone (sorry) but the documentation for the GenStage module has a section on the message protocol between consumers and producers. GenStage is simply one implementation of this message protocol.

querulous · on July 14, 2016

i believe it's similar to window updates in http2. consumers send an async message to producers with a count of messages they are willing to accept and it's up to producers to not exceed the count

thibaut_barrere · on July 14, 2016

I was already diving into Elixir for ETL-based work before discovering GenStage, but now even more. Thanks José et al. for the hard work on this!

rpazyaquian · on July 14, 2016

I'm not quite at the point where I can appreciate how useful this is, because I don't really understand concurrency itself. What is a good resource from which to learn about concurrency, and especially Elixir/Erlang's approach to it?

OWaz · on July 14, 2016

If you want to watch a video, check out Joe Armstrong (one of the creators of Erlang) talk about concurrency and Erlang[0]. There are a bunch of Erlang/Elixir videos [1] posted by Erlang Solutions on YouTube.

[0]: https://www.youtube.com/watch?v=YaUPdgtUYko [1]: https://www.youtube.com/c/erlangsolutions/videos

macintux · on July 14, 2016

Fred Hebert's Learn You Some Erlang is quite approachable.

http://learnyousomeerlang.com/the-hitchhikers-guide-to-concu...

iamd3vil · on July 14, 2016

If you are interested in books, "Elixir in Action" is quite a nice introduction to OTP.

fgasparini · on July 14, 2016

I recommend this book about concurrency: https://pragprog.com/book/pb7con/seven-concurrency-models-in...

vemv · on July 14, 2016

Before diving into Erlang/Elixir, I'd make sure to learn the basics of concurrency/parallelism with more basic languages such as Java, Python or Ruby. i.e. start with the easy stuff, then with that strong foundation choose as you prefer!

dragonwriter · on July 14, 2016

Honestly, learning actor-model concurrency with Erlang (and probably Elixir) is a lot easier than dealing with concurrency in the "basic" languages you list. Concurrency in Java, Python, or Ruby isn't "the easy stuff" compared to concurrency in Erlang/Elixir.

vemv · on July 14, 2016

Conceptually, threads are probably easier than actors for a beginner. I'm not saying that it's easier to implement correct code with threads (in fact it's more complex), but most devs go from OO to FP, not the other way around.

These days there is such an explostion of languages, frameworks, techniques etc it must be scary for a beginner. Given that, i'd advise to go by the safe (average) path.

rubiquity · on July 14, 2016

But if you learn about concurrency via threads then you have to learn about mutexes... if you learn about mutexes then you have to learn about condition variables... and if you learn about condition variables then you have t...

Well you see where this is going. I think it's important to learn about Threads, Mutexes, Condition Variables, and why sharing isn't caring, but I don't think those lessons necessarily need to come before learning about the Actor model.

nugator · on July 14, 2016

On the other hand, if you want to sell the Actor model it might be good to first go through the complex techniques needed for trying to get concurrency right with threads.

dragonwriter · on July 14, 2016

> most devs go from OO to FP,

Most devs go from OO to FP because of dominance of OO in industry (with knock-on effects on pedagogy) from sometime around the late-1980s/early-1990s until recently. Shortly before that, the same would be true of procedural to either OO or FP. And probably at one point the same would have been true of unstructured imperative to procedural.

But that doesn't mean that concurrency in popular OO languages of today is easier than concurrency in Erlang/Elixir (which may be examples of functional-ish languages, which I assume is the relevance of your OO to FP statement), nor does it mean that concurrency in formerly popular procedural languages is easier than in any particular OO language, or that concurrency in unstructured imperative languages is easier than in any particular procedural language.

macintux · on July 14, 2016

I think the only argument left for "why I should learn OO before FP" is the lingering suspicion no one would tolerate the pain involved with OO once they reached that point.

peruvian · on July 14, 2016

Concurrency in those languages is anything but easy and the reason Erlang/Elixir exist.

losvedir · on July 14, 2016

Congrats to the Elixir team.

As I understand it GenServer is kind of a wrapper over erlang's underlying OTP genserver abstraction. (Is that correct?)

What's GenStage's relationship to erlang? Does erlang have an equivalent?

toolz · on July 14, 2016

All OTP abstractions are based on process creation/supervision/messaging primitives. So while erlang doesn't have this abstraction, they are all based on the same primitives that both erlang and elixir use.

ssijak · on July 14, 2016

It does not have it, this is new in Elixir.

poorman · on July 14, 2016

Super excited about GenStage.Flow. Might have to start another Elixir project just to try it out.

Dangeranger · on July 14, 2016

Could this be used to allow Elixir to load balance a third party server?

For example: You have an Elixir load balancer that manages requests for three other servers and distributes load based on the 'demand' that the consumer communicates back to the balancer?

felixgallo · on July 14, 2016

yep. The neat thing about this is that it's composable primitives, so you could imagine having an arbitrarily shaped tree of demand-creating producer-consumers and consumers. And with distribution, it'd be possible to scale this out pretty far.

Dangeranger · on July 14, 2016

That's great.

Are there any plans to use these primitives to communicate with servers not written in Elixir? For example can you foresee Elixir being used to manage and coordinate heterogeneous architectures?

felixgallo · on July 14, 2016

I'm on the erlang rather than the elixir side of the BEAM, but it's pretty easy and normal in the BEAM to communicate with heterogenous processes, either by implementing a dirty or regular NIF, opening a port, building a C Node or using JInterface with java, using CORBA haha, or just opening a socket.

robbles · on July 14, 2016

If the back-pressure is abstracted away by this interface, how do you monitor it?

felixgallo · on July 14, 2016

You'd still have the regular BEAM reduction-counting backpressure in place, and it would be trivial to have your flow components report their work rates and/or demand requests to whatever monitoring service (statsd, etc.) that you might care to set up. Or to respond to state query messages like sys:get_state/1.

windor · on July 14, 2016

It's definitely worth trying! I have used akka-stream once, and always expect something like that in erlang world. Finally, here it is. Thanks for the Elixir team.