Show HN: PreQL/Trilogy – A Higher-Level, Composable SQL

al2o3cr · on June 19, 2024

General comment: the README opens with

    that replaces tables/joins with a lightweight semantic binding layer

But then the included example doesn't do any joins. :(

Similarly, the "Concepts" section of the documentation has a diagram about joins, but no code examples.

On a more-technical note, the generated queries seem to use a LOT of CTEs. While they aren't a strict optimization barrier in PG anymore, there are definitely situations where they'll fall back to materialization - and the generated SQL seems to commonly build a "all of the rows" CTE and then apply conditions in a subsequent one.

efromvt · on June 19, 2024

Ah - so you don't ever express a join in the language - they're completely abstracted away from the user and resolved at runtime based on the bound tables in the semantic model. The 'hello world' and demo examples with multi-tables will demonstrate how this resolution happens.

Re: the structure - yeah, it's a lot of CTEs right now as that's the easiest way to guarantee correctness - I want to implement a optimization pass that will do some consolidation + predicate pushdown, but it's a balance with readability.

aerzen · on June 20, 2024

So one never specifies the tables/data sources, but refers to columns directly and that triggers the data source to be pulled-in somewhere?

Is this somewhere well defined? Are there docs explaining how it works?

This part of the language seems the most magic to me, and I've found a few times now that things that feel like magic work great, but only 80% of the time.

efromvt · on June 20, 2024

Datasource bindings are explicit, then the resolution to datasources at query time is automatic/consistent for a given query set and semantic model. If you have only one [non-partial] datasource with a column bound for a concept, that table will be used every time. If there are multiple, selection will optimize for the least joins and then the closest match to the target component grain.

The "datasources and joins" section attempts to cover this but I think I need to clean that up a bit!

As long as the semantic bindings are accurate, this can be a performance optimization - ex if it's a reporting query and there's an aggregate dataset that's available that can be safely used.

That's where the challenges will come up - there needs to be easy tooling to vet consistency across datasources (all these tables have the same # of orders) to validate the model, and I'm planning some form of query-level time/refresh hint (as of at least <x> date?) to help avoid resolving stale caches/consistency. These challenges should be pretty similar to what you'd get with plain SQL and the semantic layer hopefully makes the quality checks easier, so I'm optimistic that the tooling can help out here as well.

Ex: this binding from the hello world example says "this is a valid source for the sentence_id", and technically any of the 3 datasources could provide it. (the datasource here is a query not a table for portability)

  datasource word_one(
    sentence: sentence_id,
    word:word_one
  )
  grain(sentence_id)
  query '''
  select 1 as sentence, 'Hello' as word
  union all
  select 2, 'Bonjour'
  ''';

efromvt · on June 20, 2024

Updated the section on query resolution to be more explicit about datasources + join inference - thank you for the feedback, and let me know if that helps! I'll try to get a more detailed section on that further on to really dig into it.