In C#, this is significantly more elegant: public IList<String> getDistinctTags(...

djb_hackernews · on Jan 12, 2015

"Significantly" is a bit dramatic when we are talking about the difference between a single extra method call.

One of the benefits of the Java version is it is easier to understand if you don't have a Java background but do have an FP background. With your C# example you'd need to find the documentation to find out what SelectMany does (which is probably just a helper method that abstracts a map and flatMap call)

iwwr · on Jan 12, 2015

>One of the benefits of the Java version is it is easier to understand if you don't have a Java background but do have an FP background.

Are there professional programmers with only FP and no imperative language experience?

anton_gogolev · on Jan 14, 2015

> "Significantly" is a bit dramatic when we are talking about the difference between a single extra method call.

It's not just a single method call; it's the entire approach. This entire method chain is something that cannot be done in Java as cleanly as it is in C#. Grouping is a prime example. Compare

    articles.stream().collect(Collectors.groupingBy(Article::getAuthor))

with

    articles.GroupBy(a => a.Author)

hueyp · on Jan 12, 2015

SelectMany is flatMap.

Could the java equivalent be?

  articles.stream().flatMap(article -> article.getTags().stream())

Or is the previous map required?

TranquilMarmot · on Jan 12, 2015

If you're writing a few hundred queries in a large appliaction, then that single extra method call will start to get awfully tedious to write over and over and over again. And talk about hurting readability, too... stream()! stream() everywhere! I'd say it's pretty significant.

jfager · on Jan 12, 2015

The main reason they build these on Stream rather than Iterable is b/c they wanted to include the `parallel()` method, which works via "spliterators" rather than plain old iterators.

In other words, in order to support a gimmick you can actually use in production in a maybe a handful of use cases, they complicated the api for the use cases you hit 99% of the time. Awesome.

hnriot · on Jan 12, 2015

actually this is not true, what you're not noticing is that the parallel() use is akin to Spark, basically these streams are just map functions and if you can put the closure onto multiple cores/machines you get much better performance without any additional programmer intelligence.

If you think that api is complicated then I don't think programming is for you, this is a very ordinary and usual construct in programming.

jfager · on Jan 12, 2015

In the cases where your application can benefit from parallelizing simple operations over a large data set stored in a collection, `parallel()` is fine.

It's even fine in the case where you're pulling data from a file or other low-latency sequential data source, assuming that the cost of filling a spliterator buffer is less than your cost of processing.

But there's a list of gotchas all more dangerous than the "magic make it faster" button of .parallel() imply:

- For the sequential data source case, if the cost of filling the spliterator buffers is higher than the cost of processing, you're just wasting a ton of overhead trying to use parallel.

- You have to be aware that by default all uses of parallel() run on the same threadpool, which makes it a potential timebomb if someone uses it in the context of, say, a webserver where multiple requests might all individually process streams. This also means blocking operations during stream processing are very dangerous.

- Mutating an external variable goes from being fine for a sequential stream to a race condition for a parallel one.

- You can't hand out Streams that you intend to be executed sequentially, b/c your callers can just call parallel() whenever they want.

And, yes, all of these considerations make the api more complicated than one operating over plain old iterators.

azurelogic · on Jan 12, 2015

Yeah, I read this and immediately thought it felt like a knockoff of LINQ.

peterashford · on Jan 13, 2015

Am I the only one bowled over by the irony of this statement given the nature of C#'s genesis?

peterashford · on Jan 13, 2015

I find the Java version more readable.

olavgg · on Jan 12, 2015

I've done this in Java for almost ten years now

articles.findAll{ it.tags.contains("Java") }

All I've done is adding groovy.jar

vorg · on Jan 13, 2015

Your example code is not valid Java syntax so you haven't been able to do it in Java for almost 10 years. Your example is written in Groovy, one of many alternative languages for the JVM, along with Scala, JRuby, Clojure, Beanshell (which Groovy was "inspired" by), Jython (which has been around for 18 years now, and originally called "JPython"), Gosu, and more recently Kotlin and Ceylon. Java didn't have such anonymous functions (called "lambdas" in Java, and "closures" in Groovy) until Java 8.