It looks very important, popular and well established, but what is it? I looked ...

igorlukanin · on May 1, 2023

As part of the Cube team, I have to admit that all descriptions in the sibling comments make a lot of sense. Of course, the "semantic layer" thing is quite known to data engineers/analysts and other data folks in general (they also know things like "metrics store", "headless BI", etc.) but not that well known outside of the data space. Probably, it would be best to describe what are the major use cases Cube is created for.

1. Embedded analytics — you have your data somewhere (data warehouse, database, etc.) and you'd like to embed it into a data app. Cube would provide connectivity to data sources, data modeling to define the metrics, caching to make your analytics fast, and APIs and SDKs to deliver them to the data app. E.g., if you decided to add a chart to your front-end app, fetching the data from the API would be as easy as sending a JSON query to Cube.

2. Semantic layer for the internal BI — you have your data somewhere and you'd like to provide access to insights based on that data to business users. Cube would provide connectivity to data sources, data modeling to define the metrics, access control to make sure only ones who need access to metrics have it, caching to make sure every dashboard loads instantly, and APIs to deliver the data to BI tools, notebooks, etc. E.g., if you want to create some dashboards in Superset, Metabase, Tableau, or Power BI, you'd just need to connect Cube's SQL API as if it was a regular database and start creating charts/dashboards.

igorlukanin · on May 1, 2023

Bunch of links from the website: - https://cube.dev/use-cases/embedded-analytics - https://cube.dev/use-cases/semantic-layer - https://cube.dev/use-cases/real-time-analytics

rendall · on May 2, 2023

That makes a lot of sense to me, and I see why it would be hard to coalesce all of that functionality into one or two sentences that would make sense to a more general, non-data, tech audience.

mbesto · on May 1, 2023

So how does this compare to am embedded analytics service like SiSense, Looker? Is this sort of in between?

ShaunK · on May 1, 2023

My understanding is that it's essentially Looker minus the dashboarding. What you would define via LookML is essentially the "semantic layer" that this is addressing. DBT is attempting to do similar work: https://www.getdbt.com/product/semantic-layer/

igorlukanin · on May 1, 2023

"Looker minus dashboarding plus APIs (SQL/REST/GraphQL) and, subjectively, better aggregate awareness (AKA "pre-aggregations" in Cube).

danieka · on May 1, 2023

Cube has saved me hundred of hours. I use it as backend for reporting and dashboard inside our SaaS. In our frontend I've build a light-version of PowerBI and I use Cube for a backend. Instead of manipulating SQL directly I use Cube's JSON query format. Kind of difficult to explain, but Cube might be the best piece of software I have ever used.

Maybe a good tagline would be "self-hostable Backend as a Service for data analysis"?

ironchef · on May 1, 2023

Let’s say you work for a SaaS doing analytics. Your boss says “hey! We need to start reporting on new logos. Can you snag those from the DB?”

But what counts as a new logo? Does a pro serve engagement that doesn’t use the product count? What about a business using the SaaS but still in a trial period? Etc.

A semantic layer helps provide common agrees upon definitions to the business. So any one looking for common data entities can just look those things up… and can come to published definitions (which are backed by queries to databases, data lakes, etc).

Does that help? Another example of this would be dbt for example

jayatid · on May 1, 2023

It is kind of like an ORM. I find ORM's and semantic layers to be similar in many ways, except that semantic layers are meant for defining metrics too. These metrics describe aggregating data. Like summing order amounts to get revenue, or counting order_ids to get sales.

I wrote a series on semantic layers on my substack, hopefully it helps: https://davidsj.substack.com/p/semantic-superiority-part-1

igorlukanin · on May 1, 2023

I think ORMs have got some bad press because they were intended to be used bi-directionally: map data from the data source to business objects and back. With semantic layers, data is only mapped to metrics and rarely back - which makes things much simpler, IMO.

totalhack · on May 1, 2023

I can't vouch for cube itself as I haven't used it but can confidently say such tools are highly valuable. I built one for use in my own business and have operated other businesses on similar tools.

It brings all data together, provides a consistent interface, and is way faster than writing SQL (though there will still be use cases for that). There is some up front cost to getting configured but it pays off in my case at least.

https://github.com/totalhack/zillion

pastacacioepepe · on May 1, 2023

Say you want to build a dashboard with charts and custom timerange selection using data you already have in Postgres/other DB, without killing your DB under the pressure of queries AND without having to write an additional API?

Cube.js is the tool for that. Handles data modeling (you can define a schema on top of your SQL schema), caching, access control and API for you.

sails · on May 1, 2023

Data modelling is important to highlight, and if OP not familiar with the concept and need then likely won’t see the obvious value of Cube.

jrvarela56 · on May 1, 2023

From what I read it’s a way to expose SQL via APIs (along with stuff needed to do it like auth, perf, query reuse, etc)

Instead of starting from a general purpose web framework+orm you have your data/schema and can query it over http conveniently to build BI/dashboards.

pacofvf · on May 1, 2023

> It looks very important, popular and well established, but what is it?

It's easier to explain what Cube is if we first define what the Semantic Layer(SL) is. In a few words, the SL is the abstract representation of business objects, for example: sales, users, conversion rates, etc. Cube provides the language to define the SL, an API to access it, access control mechanisms and a caching layer. It's important to emphasize that Cube is a stand-alone SL, decoupled from any BI visualization tool. That's the "headless" part, and I would also add that is "feetless" since it supports multiple source DBs. Looker the other big name in the space has the incentive of selling you more usage of BigQuery and of locking you in with their UI, it just recently started to open up to the idea of APIs. The idea is that you have a central place where you define the SL and then you don't need to duplicate the definition on every downstream application, which may lead to errors or inconsistencies.

> Is it that it can perform a single query across multiple databases?

Cube allows you to join data from multiple databases at the caching layer, that's fundamentally differently than a federated query engine. But from the downstream application perspective it has the same outcome. By being done at the caching layer it has inherent advantages and limitations vs federated queries.

I really like these series of articles by David Jayatillake that go into deeper detail:

1. https://davidsj.substack.com/p/semantic-superiority-part-1 2. https://davidsj.substack.com/p/semantic-superiority-part-2 3. https://davidsj.substack.com/p/semantic-superiority-part-3 4. https://davidsj.substack.com/p/semantic-superiority-part-4 5. https://davidsj.substack.com/p/semantic-superiority-part-5