An open-source language implemented in Python that replaces the FROM/JOINs in SQL with a lightweight semantic layer and compiles to SQL against supported backends.
Key features are: Reusability - semantic layer supports full reuse of calculations, CTEs, and even entire models through a lightweight python inspired import syntax.
Simple - writing queries that combines data at different granularity is straightforward and avoids common error patterns.
Easy to refactor - changing the way the semantic layer binds to the database doesn’t require you to rewrite your queries. You don’t need to codemod 100 ETL scripts to deprecate a table!
Related packages [basic, links available from https://preqldata.dev/]:
pypreql-etl -> basic DBT integration. pypreql-nlp -> obligatory GenAI integration. trilogy-studio -> electron IDE for running queries directly. trilogy-public-models -> common repository of public models (mostly BQ public datasets right now) that can be imported/queried.
Differs from other approaches to make a modern SQL I’m aware of (links below) in that it attempts to embrace current SQL syntax, not replace it. Each of these linked projects has a distinct value proposition and is worth checking out on their own!
Note: as you can see by names in the space, it turns out that I was not nearly as creative as I hoped with ‘PreQL’ as a ‘prequel/sequel’ play; PreQL/Trilogy is the intermediate Star Wars pun and it may migrate to just Trilogy (after the sequel!) to avoid confusion.
Other SQL Replacements:
Malloy (full rewrite, semantic focus)
https://news.ycombinator.com/item?id=30053860
PRQL (pipelined SQL alternative, all new syntax)
https://news.ycombinator.com/item?id=36866861
preql (much more ambitious, all new syntax)
https://news.ycombinator.com/item?id=26447070
Things to do:
- Define/implement better null handling
- Rewrite some parsing/CLI in rust
- More functions/backends, more window flexibility
- Performance hooks/optimization
Similarly, the "Concepts" section of the documentation has a diagram about joins, but no code examples.
On a more-technical note, the generated queries seem to use a LOT of CTEs. While they aren't a strict optimization barrier in PG anymore, there are definitely situations where they'll fall back to materialization - and the generated SQL seems to commonly build a "all of the rows" CTE and then apply conditions in a subsequent one.