Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That would be preposterous if it wasn't so hilariously false:

> These days, having a GPU is practically a prerequisite to doing science, and visualization is no exception.

It becomes really funny when they go on to this, as if it was a big deal:

> Depicted below is an example of plotting 3 million points

Anybody who has ever used C or fortran knows that a modern CPU can easily churn through "3 million points" at more than 30 frames per second, using just one thread. It's not a particularly impressive feat, three million points is the size of a mid-resolution picture, and you can zoom-in and out those trivially in real-time using a CPU (and you could do that 20 years ago, as well). Maybe the stated slowness of fastplotlib comes from the unholy mix of rust and python?

Now, besides this rant, I think that fastplotlib is fantastic and, as an (unwilling) user of Python for data science, it's a godsend. It's just that the hype of that website sits wrong in me. All the demos show things that could be done much easier and just as fast when I was a teenager. The big feat, and a really big one at that, is that you can access this sort of performance from python. I love it, in a way, because it makes my life easier now; but it feels like a self-inflicted problem was solved in a very roundabout way.



>> Depicted below is an example of plotting 3 million points

> Anybody who has ever used C or fortran knows that a modern CPU can easily churn through "3 million points" at more than 30 frames per second, using just one thread. It's not a particularly impressive feat, three million points is the size of a mid-resolution picture, and you can zoom-in and out those trivially in real-time using a CPU (and you could do that 20 years ago, as well). Maybe the stated slowness of fastplotlib comes from the unholy mix of rust and python?

That's a misrepresentation though, it's 3 million points in sine waves, e.g. something like 1000 sine waves with e.g. 3000 points in each. If you look at the zoomed in image, the sine waves are spaced significantly, so if you would represent this as an image it would be at least a factor 10 larger. In fact that is likely a significant underestimation, i.e. you need to connect the points inside the sine waves as well.

The comparison case would be to take a vector graphics (e.g. svg) with 1000 sine wave lines and open it in a viewer (written in C or Fortran if you want) and try zooming in and out quickly.


Thanks, and the purpose was to show what's possible on modest hardware that most people have. We have created gigabytes of graphics that live on the gpu for more complex use cases and they remain performant, but you need a gaming gpu.


But why do you want to fit the whole dataset in memory? If the dataset is stored in a tiled and multi-scaled representation you need to only grab the part of it that is needed to fit your screen (which is a constant, small amount of data, even if the dataset is arbitrarily large).

If you insist to fit the entire thing in memory, it may seem better to do so in the plain RAM, which nowadays is of humongous size even in "modest" systems.


Maybe it's an instance of Parkinson's law [1]: if it all fits in GPU memory, just put it all in and plot it. This is much simpler to implement than any out-of-memory technique. It's also easier for the user—`scatter(x, y)` would work effortlessly with, say, 10 million points.

But with 10 billion points, you need to consider more sophisticated approaches.

[1] https://en.wikipedia.org/wiki/Parkinson%27s_law


You can't draw a proper plot with 3 million points at 30 fps, unless you cut corners, like not showing distribution of data (showing black rectangle when there's internal structure) or skipping peaks, like many plotting tools do, e.g. Grafana.


Of course you can! The screen of my laptop has nearly 3 million points (2160x1350) and I can do a fair amount of processing on each of its pixels, with one CPU thread, and still be above 30fps. A naive plotting method that loops over all the points and puts them into a grid will work without problem. Try it yourself!


Setting the value of a pixel in an image is very different from drawing objects like lines, this is a good introduction: https://graphicscompendium.com/intro/01-graphics-pipeline


Ultimately, objects are always drawn in the screen by setting pixels into it. Plotting a point by setting a pixel is entirely reasonable, and can be indeed done directly, in realtime, for several millions of points. I just tested the C program below, compiled with gcc without optimizations, and it gives about 80 fps for three million points (on my 6 year-old thinkpad). My point: CPUs are ridiculously fast, and you can indeed do a lot of large-scale data visualization without need to meddle with the GPU.

    #define FPS 80

    void plot_points(
                    float *o,  // output raster array (w*h)
                    int w,     // width of raster
                    int h,     // height of raster
                    float *x,  // input point coordinates (2*n)
                    int n      // number of input points
                    )
    {
            // initialize the output raster
            for (int i = 0; i < 2*w*h; i++)
                    o[i] = 0;
    
            // accumulate the points that fall inside the raster
            for (int i = 0; i < n; i++)
            {
                    int p = x[2*i+0];
                    int q = x[2*i+1];
                    if (p >= 0 && p < w && q >= 0 && q < h)
                            o[w*q+p] += 1;
            }
    }
    
    #include <stdlib.h>
    int main(void)
    {
            int w = 1000;
            int h = 1000;
            int n = 3000000;
            float *x = malloc(2*n*sizeof*x);
            float *o = malloc(w*h*sizeof*o);
            for (int i = 0; i < 2*n; i++)
                    x[i] = 1000*(rand()/(1.0+RAND_MAX));
            for (int i = 0; i < FPS ; i++)
                    plot_points(o, w, h, x, n);
            return 0;
    }
    // NOTE: if this program runs in less than 1 second, it means that it
    // is faster than "FPS"


You're plotting individual points here, not a proper data graph. Even if you need a cloud of points, it's not enough, since you need to have different sizes and types of points that may have different sizes based on another data column, and definitely need to be drawn with antialiasing, even if they're simple squares.

Then, to draw something like this imgur.com/a/mXvEBzl (ADS-B data, ~10 million points iirc), you need to connect points with (antialiased) lines, where individual pixel should be blended into plot with respect of line opacity. Also, lines can be of different thickness, so it multiplies your `o[w x p+p] += 1` again.

I'm not even talking about multiple layers that are quite standard.

I use my own plotting app, it takes a lot more than just slap a bunch of points into "float *o". Try to write your own, you will figure it out pretty quickly, unless you're ok with black blobs that resemble the input data.


OK now try to do this in 3D with arbitrary projections and interactivity! And guess what, you'd create a rendering engine :)

My earlier reply has a link to how GPUs actually push pixels to the screen.

There are also some excellent blog posts on how line rendering is done:

https://almarklein.org/triangletricks.html

https://almarklein.org/line_rendering.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: