> Also, using raw LLVM, it seems like you could do all kinds of things to shoot yourself in the foot, such as breaking ACID guarantees or failing to properly update indexes.
I was picturing using these only for read-only queries (i.e. the kind you can run against a hot standby.) That could be enforced by the extension, if need be, and I don't think it would make much difference in the extension's usefulness. Nobody needs this level of optimization for OLTP queries, because OLTP queries just ain't complicated enough to confuse the query planner. Only OLAP queries are.
And OLAP queries are usually either "generate this report and return it" or "generate this report into this table." The precompiled query-plan could just be for the "generate this report" part, and recuse itself from the "into this table" part, leaving that to just-in-time planning. (Remember, LLVM IR is — before the LLVM optimizer runs — still abstract enough to have stuff like functions in it. So it's possible to send a "module" of LLVM IR that will "plug into" a larger query plan at a defined attach-point, and then get Whole-Program-Optimized down to straight-line code by the optimizer.)
> If you want low-level control, then maybe a NoSQL database would be better suited to your needs
The thing is, mostly we don't need that low-level control. 99.9% of queries can be planned by the DB just fine. But there's always that one query in the hot path of an app, that the DB just has no idea how to plan correctly. The query that makes you want to just grab the steering wheel away from the DBMS and swerve.
I don't want to have to denormalize my data into an entire secondary representation (either in the same DBMS, or worse, in a separate NoSQL DBMS) just for the sake of that one query. I want to just do the DB's homework for it. Sometimes it drives me to the point of considering forking the DBMS and manually patching in a query-plan literal datastructure for that one case.
IMHO, client-side pre-compiled query plans are in the same class of solutions as a DBA manually going in and partitioning large tables invisibly to the application that relies on them—or, heck, as `CREATE INDEX` existing at all as a command†: sometimes fully-general algorithms for solving an entire class of problem are hard, but coming up with the answer to your specific problem is very simple.
† (Did you know that CREATE INDEX is actually non-standard SQL, precisely because a theoretical sufficiently-advanced RDBMS could come up with the right indices itself, and so manually-specified indices have no theoretical place in the abstract expression of relational algebra that is standard Structured Query Language? But of course, every RDBMS has CREATE INDEX, because there is — as of yet! — no fully-general algorithm for indices smart enough that it can't be out-planned by your average DBA. Well, and also because of the bootstrapping problem, where without indices some queries would take an eternity to resolve, and so the statistics collector would never observe enough queries to know what sort of indices it needs.)
I get what you’re saying, and it makes sense why you’d want that. I wonder if you posted on Postgres’s forms or asked one of the devs if they’d have some other ideas about improving performance.
Just at a initial take level, LLVM seems too low level for this kind of usage to me, and it doesn’t seem very robust or elegant. From that perspective, it doesn’t seem like something that would get much adoption or support from the Postgres community. I’m no expert in Postgres however, so maybe it would be better than I’m picturing.
Maybe creating a more efficient alternative query language would work, or a system for adding hints to your query about how it should execute. Like if you could specify in low level terms how you think it should work, and the database would feel free to ignore those hints or change any part if necessary because some internal data structure changed.
I was picturing using these only for read-only queries (i.e. the kind you can run against a hot standby.) That could be enforced by the extension, if need be, and I don't think it would make much difference in the extension's usefulness. Nobody needs this level of optimization for OLTP queries, because OLTP queries just ain't complicated enough to confuse the query planner. Only OLAP queries are.
And OLAP queries are usually either "generate this report and return it" or "generate this report into this table." The precompiled query-plan could just be for the "generate this report" part, and recuse itself from the "into this table" part, leaving that to just-in-time planning. (Remember, LLVM IR is — before the LLVM optimizer runs — still abstract enough to have stuff like functions in it. So it's possible to send a "module" of LLVM IR that will "plug into" a larger query plan at a defined attach-point, and then get Whole-Program-Optimized down to straight-line code by the optimizer.)
> If you want low-level control, then maybe a NoSQL database would be better suited to your needs
The thing is, mostly we don't need that low-level control. 99.9% of queries can be planned by the DB just fine. But there's always that one query in the hot path of an app, that the DB just has no idea how to plan correctly. The query that makes you want to just grab the steering wheel away from the DBMS and swerve.
I don't want to have to denormalize my data into an entire secondary representation (either in the same DBMS, or worse, in a separate NoSQL DBMS) just for the sake of that one query. I want to just do the DB's homework for it. Sometimes it drives me to the point of considering forking the DBMS and manually patching in a query-plan literal datastructure for that one case.
IMHO, client-side pre-compiled query plans are in the same class of solutions as a DBA manually going in and partitioning large tables invisibly to the application that relies on them—or, heck, as `CREATE INDEX` existing at all as a command†: sometimes fully-general algorithms for solving an entire class of problem are hard, but coming up with the answer to your specific problem is very simple.
† (Did you know that CREATE INDEX is actually non-standard SQL, precisely because a theoretical sufficiently-advanced RDBMS could come up with the right indices itself, and so manually-specified indices have no theoretical place in the abstract expression of relational algebra that is standard Structured Query Language? But of course, every RDBMS has CREATE INDEX, because there is — as of yet! — no fully-general algorithm for indices smart enough that it can't be out-planned by your average DBA. Well, and also because of the bootstrapping problem, where without indices some queries would take an eternity to resolve, and so the statistics collector would never observe enough queries to know what sort of indices it needs.)