A Comparison of Futhark and Dex

marmaduke · on Dec 28, 2020

> Dex authors have a background of being frustrated with NumPy-style programming

It seems to happen that up and coming people look at the convenience that is NumPy and decide they can do better. This is cool, since without such an attitude, NumPy wouldn’t exist. I still think it’s hard to beat NumPy for all its faults, if you marginalize over the broad spectrum of scientific computing and data science.

alevskaya · on Dec 28, 2020

The Dex authors are also core Autograd & JAX authors, so they're not unaware of the benefits of the dense Numpy-style APIs. The challenge is that many numerical kernels and algorithms are just not easily or naturally expressed in this vectorized language. (Think sparsity, irregular access patterns, control flow, etc.)

Current languages, runtimes and hardware currently force most work into this dense, rectangular, vectorized style to achieve reasonable performance, but we would like our ideas to not always remain stuck in this "rut" [1]. Dex is one exploration of how we can improve expressiveness and generality while still allowing for performant compiled artifacts.

[1]: https://dl.acm.org/doi/10.1145/3317550.3321441

marmaduke · on Dec 28, 2020

Ha I’m well aware and completely agree actually but autograd is written as if they wanted to use Haskell. The author’s thesis mentions functional purity etc and debugging or instrumenting autograd is predictably challenging given that Python is not Haskell. I think Futhark, Dex, Jax etc are great. NumPy is still the baseline to beat (for general usage).

Athas · on Dec 28, 2020

NumPy is a really well isolated local maximum, because it gives you just enough of a performance edge in a generally comfortable programming environment (Python). Most things that are faster or more flexible than NumPy are much more alien and difficult to work with.

I think NumPy is worth praising for demonstrating that parallel programming does not have to be difficult. Sure, NumPy itself usually does not run in parallel, but its vectorised bulk operations are potentially parallel, yet don't have any of the race conditions, deadlocks, and other complexities we usually associate with "parallel programming". The same style of programming could be implemented in a library or language where those operations really do take advantage of parallel hardware, still without risk of race conditions. NumPy is just one of many programming languages and libraries that have this property (even Fortran array expressions do), but NumPy is demonstrably accessible to data scientists, students, and others who are inexperienced or poorly trained in programming. Hell, if you add NumPy, R, MATLAB, and all the other bulk-parallel programming models used for quick data analysis scripts, it may be that most of the world's code is actually written in a data parallel programming style...

marmaduke · on Dec 28, 2020

> Most things that are faster or more flexible than NumPy are much more alien and difficult to work with.

Yes, this is a better statement of the optimal trade off in NumPy.

> same style of programming could be implemented in a library or language where those operations really do take advantage of parallel hardware

I think this is well vindicated by Theano, Tensorflow and APL before them.

nerdponx · on Dec 28, 2020

Let's not forget to give Matlab and S/R credit for the programming model that Numpy implements :)

twobitshifter · on Dec 28, 2020

A comparison most people likely won’t make in the near future. I think the challenge with a new array programming is striking the right balance between added complexity and capability. If someone is using numba or Julia, what’s going to incentivize them to use Futhark or Dex?

cat199 · on Dec 28, 2020

Have just been 'going deep' into type theory + FP + ML from a data science perspective, definitely interesting to hear about Dex here since alot of data science use found in the wild is based around numpy ndarrays & associated tools & easy/quick ways to do multidimensional numerics is a pretty 'core' requirement for heavy-duty number crunching.. having this as a built-in using ML types for indexing would be pretty ideal i think..

Anyone have any perspective to share about ML-family languages in the domain of what we're calling 'data science' these days?

marmaduke · on Dec 28, 2020

The Owl lib for OCaml is pretty interesting

https://github.com/owlbarn/owl