Out of curiousity: what do the 5 string types do differently?

geekingfrog · on July 28, 2016

String: Linked list of Char. Nice for teaching, horrible in every other aspects. Text and lazy Text: modern strings, with unicode handling and so on. ByteString and lazy ByteString: these are actually arrays of bytes. Used to represent binary data. Because haskell is lazy by default, and sometimes you want strictness (mostly for performances), there are two variants of Text and ByteString, and going from one flavor to the other requires manual conversion.

tibbe · on July 28, 2016

Risking to go off-topic a bit, I think the lazy versions of Text and ByteString wouldn't have been needed if we had nice abstractions for streams (lists are not, they cause allocation we cannot get rid of) so that you don't need to implement a concrete stream type (e.g. lazy Text and lazy ByteString) for every data type.

Rust does this well with iterators, for example.

wyager · on July 28, 2016

The problem is that streams actually have very complicated semantics when they interact with the real world. What does it mean to traverse an effectful stream multiple times? Can you even do that?

Data.Vector provides a very efficient stream implementation for vector operation fusion, but it's unsuitable for iterators/streams that interact with the real world. Pipes, on the other hand, combined with FreeT, provides good, reasonable semantics for effectful streams.

As with many other things, Haskell forces you to be honest with what your code is actually doing (e.g. streaming things from a network) and this means that there's no one-size-fits-all implementation we can stuff everything into.

tibbe · on July 28, 2016

Just sticking with the pure types there's currently no generic stream model that works well. No stream fusion system fuses all cases (even in theory) and they also fail to fuse the cases they're supposed to handle too often in practice.

I haven't looked at pipes, but I'm guessing it doesn't all fuse away either.

wyager · on July 28, 2016

You're right, I believe Haskell's fusion framework could be greatly improved (although it is the best production solution I'm aware of). However, how would you go about solving this? I don't think there's any generalized solution to the problem of creating no-overhead iteration from higher-level iterative combinators.

losvedir · on July 28, 2016

> Haskell's fusion framework could be greatly improved (although it is the best production solution I'm aware of). However, how would you go about solving this?

Given that we're in a rust thread... are you familiar with rust's iterator fusion [0]? Basically there are three components: iterators (something like a source), iterator adapters (where all manipulations happen), and consumers (something like a sink). LLVM will compile all the iterator adapters into a single manipulation such that the underlying stream/vector/whatever only goes through it once.

I personally like it much better than Haskell's. With rust the fusion is guaranteed to happen, although it makes the types a little verbose and tricky to work with, but with, e.g., Haskell's Text's stream fusion I was never really sure that it was working, or if I could do something to prevent it. It seems like in Haskell it's more of a behind the scenes optimization that you hope kicks in, rather than designed into the types. Or do I misunderstand? I only dabbled in Haskell.

[0] https://doc.rust-lang.org/book/iterators.html

wyager · on July 28, 2016

Yes, I have used Rust a bit. Basically the primary difference is (and correct me if I'm wrong) you can't re-use an iterator in Rust without cloning it. On the other hand, you can use a Haskell pure stream object as many times as you want (without explicit cloning, because "draining" an iterator is stateful), so fusion becomes a bit of a more complicated problem.

If I had some Haskell code that was like

map f . map g . filter x . map y $ stream

It would almost certainly get fused into a single low-level loop without extraneous allocations. However, I can also do something like

foo = map y $ stream

bar = map f . map g . filter x $ foo

baz = map z $ foo

And now what do you do?

Haskell's fusion is also more general, because it allows you to do pretty much arbitrary syntactic transformations.

Unfortunately, this means it's somewhat fragile and is easy to prevent from functioning. Rust can guarantee fusion because you're restricted in the kinds of things you can do with iterators.

On the other hand, Haskell's Pipes restrict you from doing things like re-using an iterator, and I'm not sure what the optimization story is there.

stijlist · on July 28, 2016

That's one of the ideas behind transducers:

https://github.com/matthiasn/talk-transcripts/blob/master/Hi...

And they work! It's not stream fusion, but the composed functions being applied to whatever container or stream of values are applied per-value, so (map (comp xf1 xf2)) applied to [1 2 3] applies (xf2 (xf1 1)), (xf2 (xf1 2)), and so on, with similar allocation savings to stream fusion.