Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm conflicted; its a nice write-up, and probably generally true right now, for most stuff.

However, I still live with databases big enough to still need cubes, although these cubes can afford to be less refined these days. Saying 'bigtable can do a regex on 30M rows per second' isn't saying it can't be done cheaper and quicker without paying google etc, if you just have some cubes.

And I think its going to track the normal sine wave: over time, data sets get bigger, and we keep oscillating between needing to cube and being able to have the reporting tool 'cube on the fly' behind the scenes.

I think there's a general move not mentioned in the article as data-lakes become faster, and then data outstrips them, and so on too.

The strength will be tooling that transparently cubes-on-demand. I wish there were efficient statistics and CDC that tracked metadata so tools can say 'this mysql table has been written to since I last snapshotted something', and, even better, 'this materialized view that I have in this database is now out of date because of writes that affect the expression it is used from on that other database over there' etc. Basic classic data-sources can do a lot of new things to make downstream tools able to cache better.

I have a slight problem with the terminology in the middle of the article, as I'm so far down the rabbit-hole that I think of cubes _as_ databases; I suffer cognitive dissonance when I read about shifts from cubes to databases etc. To me, a cube is just a fancy term for a table/view for a particular use-case.

One tool that I'm terribly excited about these days is presto. https://prestosql.io/ allows you to take a constellation of different normal databases and query them as though they were one big database. And you can just keep on adding data-sources. Awesome!



Thanks for your comment, I'm not familiar with presto at all - but I did do a bit of reading of an older article: https://www.slideshare.net/frsyuki/prestogres-internals

Would you view presto in its current state as a replacement for vanilla Postgres with FDW for standard data analysis queries? I don't fully understand the Postgres/Presto relationship.


Hmm, presto is not Postgres.

In a way, presto is like a bunch of FDWs on steroids, and a query planner that has above average cost model for hive etc.

There are plenty of things that presto isn’t, such as a good replacement for Postgres in classic oltp workloads.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: