Not Embracing a Semantic Layer is Embracing Failure
As much as LLMs have improved in quality over the years, getting them to consistently write accurate SQL remains an unsolved problem. The BIRD text to SQL benchmark, the most popular in the space, has shown constant improvement in the ability of LLMs to write accurate SQL queries over the past few years. However, accuracy has not grown to the point where a nontechnical data user can ask a question and reliably get a correct answer without intervention. If the maintainer of a dataset has provided enough context to reliably calculate these metrics (say, with a metrics layer), then the likelihood of a correct answer increases.
Tools like dbt have shifted data analytics over the past 10 years, formalizing the concept of transformations and adding rigor. However, too much reliance on dbt, especially for things like rollup tables, can make your analytics inflexible and less valuable. Maybe your monthly rollups are consistent with your weekly and daily rollups, but you lose the ability to aggregate on demand. There's a good chance you now have 3 files with distinct calculations of the same aggregate metrics or best case, have a complicated, brittle jinja setup to calculate the variations of the same table at different time grains. Don't even get me started if you want to join against another record set when doing exploratory data analysis, or if your analysis is even less structured and all queries are adhoc.
While they might initially seem overly complicated, semantic layers are just a mapping of the relationships between the tables in your warehouse and canonical definitions of the metrics that your organization relies on. It's even possible to programmatically generate a usable semantic model definition from warehouse query history by parsing the queries and applying statistics.
Each BI vendor seems to have their own semantic model format these days, each with their own special twist. Some let you maintain ultra-flexible graphs where you can combine metrics across tables without ever checking a join fan out. Others are tightly coupled with visualization layers and force One Big Table style data modeling instead of handling the nuances of joins. One of the core benefits is never having to redefine a calculation again.
Accessing the data from semantic layers has historically been a mixed bag. Obviously anything from a BI tool vendor is generally well supported in that tool. Recent semantic layer efforts have also exposed data in other forms: through REST or GraphQL APIs, through a Postgres wire protocol compatible server, or through SDKs. Another approach is to convert the semantic model definition to a format that can be read directly by each tool. This translation is possible with a number of BI tools, but becomes increasingly challenging when proprietary query languages are used to represent metric definitions.
Sidemantic is an ambitious metric layer from Sidequery that supports metric definitions in more than 15 formats. It supports incremental adoption so you can build on pre-existing semantic definitions that you have. You can mix and match definition formats and join between them: for example if you started switching from LookML to Malloy but are still in progress, those models are interoperable in Sidemantic. Sidemantic includes an experimental Postgres wire protocol server to use metrics in other tools, so you can more or less query metrics from your metrics layer as if it were plain SQL. Also included is a Python SDK & Jupyter widget to make exploring metrics a piece of cake.
That said, semantic layers are not always the perfect solution for analysis. It's challenging to capture all context in a premeditated way. You will inevitably end up writing queries outside the bounds of the semantic layer, and that's actually part of the workflow, a feature, not a bug. You may eventually formalize that adhoc analysis into something that becomes an important metric. In fact, query history can be automatically monitored and new metrics automatically suggested based on query patterns.
You might push back: this sounds like a lot of work. But I'm not convinced humans are even going to be maintaining semantic layers so much as AI doing so. The best engineers are already letting AI write all their code; it's coming for analysis as well, so why not semantic layers too?