#98 What on earth is a semantic layer?! (with David Jayatillake @ Delphi)

Written by Will Hayes | February 16, 2024

The Measure Pod

00:00 / 49:21

In this week’s episode of The Measure Pod we spoke with David Jayatillake, co-founder of Delphi and organiser of the London Analytics Meetup. David spoke to us about the concept of semantic layers, offering a deep understanding of its applications and implications in various technologies. Despite some questionable jokes, the episode is packed with nuggets of wisdom.

Show note links:

Find David on LinkedIn
London Analytics Meet-up
Find our about Delphi or reach out to founders@delphihq.com
Content to read from David’s Substack:

🎥 The podcast is now available in vodcast (video) format! Watch the episode below, or over on YouTube.

Let us know what you think and fill out the Feedback Form, or email podcast@measurelab.co.uk to drop Dan, Dara and Bhav a message directly.

Follow Measurelab on LinkedIn and on Twitter/X, and join the CRAP Talks Slack community.

Find out when the next CRAP Talks event is happening on LinkedIn.

Music composed by Confidential – check out their lo-fi beats on Spotify.

Master Google Analytics 4 with Daniel Perry-Reed on the next GA4 Immersion 6-week cohort training course. Charity and group discounts available!

Quotes of the Episode:

“…I think whilst we’re still maybe in the very early stages of semantic layers and things like that, I think it’s definitely something to watch out for in the future.” – David
“…You need something that’s going to give it context, but you also need something that’s going to give it constraint, and that’s kind of like what’s really good about a semantic layer.” – David

Transcript

Intro | Topic | Rapid fire

Intro

[00:00:00] Dan: What an incredible episode with David. I have had my mind blown slightly. I feel like it’s something that David is so easily capable of doing is explaining what the hell a semantic layer is. I feel like he does it all the time, especially in his line of work. But rather than me taking all the credit, Dara, why don’t you just quickly recap what a semantic layer is for our audience?

[00:00:33] Dara: So a semantic layer is, it’s a layer of semantics.

[00:00:38] Bhav: Brilliant.

[00:00:39] Dan: All right. Well, there you go. I mean, it’s obvious. If you don’t have time to listen to the full interview, then there you go. You’ve got your answer. You can live a happy life. No, but stick it through to the end, hear David talk about semantic layers and every application aspect, he’s got a background of knowledge that extends far beyond anything that I don’t know, it’s just very impressive in terms of all the tools and the technology and the experience with it all.

[00:00:58] Dan: So there’s lots of information there, especially about where to get started thoughts around the GCP and the Google stack other tools that he’s working with and of course, all of his contact details and the company he works for or founded actually is all in the show notes. Is there anything else to add before we head into the show?

[00:01:13] Bhav: I think the only thing I’ll add is to listen out for a couple of poorly timed jokes in there. They weren’t very funny. It’s always nice to throw some jokes in there. No, it was a really good episode, I really enjoyed talking to David actually. I learned a lot. I think he’s, you know, he’s clearly passionate about this as a topic and as a field. And it’s always nice to have people who are passionate about what they do come and talk to us and share their insights and knowledge. And I think whilst we’re still maybe in the very early stages of semantic layers and things like that, I think it’s, it’s definitely something to watch out for in the future. And, you know, I’m keen to see how this progresses.

[00:01:45] Dan: Amazing yeah, exactly. Regular plugs, head to the show notes page to find links to the CRAP Talks Slack community. Find out when the next events are on, but also just to join a bunch of like minded people to chat about analytics, optimization, experimentation, and all the like. Check us out on YouTube, the video of this or on Spotify. If you’re on there, you can watch us if you fancy doing that for all you weirdos out there. And lastly, there is some news. There is change a brewing, so Dara why don’t you tell us what’s going to change?

[00:02:14] Dara: Yeah, so I am going to be stepping back as co host of The Measure Pod. So we’re getting, we’re very fastly approaching our 100th episode and in the spirit of liking to keep things fresh and, and changing things up I’ve decided that the hundredth episode will be my last one as a, as a full time co host. I reserved the right to ask to be invited on again, maybe from time to time. Whether that invitation is accepted or not, I don’t know, or that request. But yeah, I think to keep things fresh, especially now that we have a fabulous new host on the, on The Measure Pod, I think it’s going to be good for the show to just mix things up a little bit.

[00:02:53] Dan: If you weren’t around or listening to our 50th episode, we had a really special episode back then. I highly recommend listening to it. It’s an episode where we put Dara on the hot seat and it was a, who wants to be a millionaire themed episode where I challenged Dara to raise money for charity, which we ended up doing actually, spoiler alert. So that was really good fun. So another special episode coming. Yeah, the town is definitely big enough for the two of you. That’s not the reason why Dara is stepping back. Without further ado, enjoy the chat with David. There’s so much information in there, so much good stuff and reach out to him if you have any questions, enjoy the show.

[00:03:27] Dan: All right, in this episode, we’re welcoming David, who is someone who runs an analytics event called London analytics. And I had the privilege of attending at the time of recording, at least a couple of weeks ago and seeing some really great speakers. So first of all, David, just welcome to the show good to have you here.

[00:03:43] David: Thanks very much for having me.

[00:03:45] Dan: Yeah, no, it’s absolutely our pleasure. So David, what we do on this podcast is I think actually I’m taking a leaf out of Dara’s book here, rather than us really badly introducing someone and trying to dig into your history and understand your experience and where you work now, we give that over to you to do so please tell us a little bit about your background, your experience, where you are now, and ultimately why you’re here talking to us about analytics.

Topic

[00:04:08] David: Yeah, sure. So I think my data journey started out at a company called Ocado. I always ask people, have you heard of Ocado? Because back when I was there in 2010, not everyone had heard of them, but now everyone says, of course I’ve heard of Ocado. And if you haven’t, it’s an online grocer, but I started out there when it was pretty small and I was a strategy and trading analyst.

[00:04:30] David: And that’s where I learned, you know, how to use a database and how to do a load of stuff in Excel because at the time there wasn’t really BI tools available. And that’s kind of like where I started. And then I spent a long time in payments doing things, what you might call analytics engineering or or data engineering today, as well as some analytics work as well.

[00:04:53] David: And more recently I’ve moved back into mainstream data roles. Companies like elevate credit and List and at List in particular, I was looking a lot at e commerce and web analytics data about how customers went through our funnel, how customers interacted with things on our website and, you know, leading to preferences and recommendations and ranking and things like that.

[00:05:17] David: From lists I’ve ended up entering startup land. So I’ve worked at a couple of startups before founding Delphi. I founded Delphi earlier this year with my co founder, Michael Irvin. We’ve both been like long time members of the dbt community, which is kind of like how we met. So we’ve both been part of that since 2018, 19. And so originally I’d reached out to him about writing a blog series because he’s moved from being an analytics engineer to being like a full stack software engineer, didn’t do that, but he reached out to me about the prototype for Delphi he was building. And then that’s kind of like how we ended up hacking on it together and building the company together as well.

[00:06:01] Dan: Amazing. That’s pretty incredible, actually, that’s a really varied and diverse experience across all sorts of different landscapes. So that’s really interesting. I’m really interested in digging into Delphi a little bit, but I suppose let’s talk about the subject at hand. And I think the thing that Delphi is there to kind of service and to kind of help with, which is, and I have titled this episode and who knows by the time that people are listening to this, if this is still the title I have a habit of changing my mind last minute.

[00:06:25] Dan: But I’ve titled this one at least tentatively. What the hell is a semantic layer? If you can help us understand a little bit around, we might’ve heard this term semantic layer. I know it’s had previous incarnations in the past. It’s had many different names, many different labels. Maybe start at the very beginning. What is a semantic layer? What is it kind of formally known as and what does it do?

[00:06:44] David: Yeah so a semantic layer has, there’s lots of people who describe it in different ways. The way I would describe it as is a mapping of real world terms and objects to data structures. So you might have things in your business like customers, orders, revenue, and how they relate to actual data is what I would define as the semantic layer. And that’s really it, there’s not much more to it. And so what that means is, let’s say you have a relational data model where you have to join tables together.

[00:07:18] David: The semantic layer will tell you how to join those tables together and aggregate certain columns in order to derive metrics and information from your data model that relate to things that people understand in the real world. So I’ve been, like, a human version of this in the past. So when I was a junior analyst starting out, people would ask me questions, and I would translate those questions through, you know, asking them follow up questions as well to start with, but then to SQL, and then the data structures that I knew about, and then I’d generate results for them, and so I was acting like a semantic layer in that instance.

[00:07:57] Bhav: So David, I mean, what’s the difference between, like, a semantic layer and a presentation layer and then a reporting layer, or are we just dealing with semantics here?

[00:08:06] David: I think you are probably dealing with semantics. I think a reporting layer is probably like the most narrow focused use, like use case, I’d say a semantic layer probably incorporates a reporting layer although a reporting layer may have some additional features like being able to store dashboards as code and things like that, which a semantic layer may not necessarily have but otherwise they’re quite similar, and maybe some are subsets of the other, really.

[00:08:38] Bhav: Because I think when I’ve worked with platforms like Tableau. With Tableau, I had the underlying data tables that I could access and query and join and do all the things I wanted to within Tableau. But they weren’t the raw data tables, these were the ones that were built by our data engineering team, surfaced to me, and then I could use them as I need to. And similarly with Looker when I’ve worked with Looker in the past, we’ve had like LookMLs within Looker, they, but they weren’t the underlying raw data, right? I know I made a really poor joke about semantics there, but are they the same things? Am I talking about the same things there?

[00:09:12] David: Yeah, I think they’re very similar. So if you think about the data model exposed in Tableau, which you are then, like, using yourself to join different parts of it together but knowing that I have to join these two tables together and sum this column, right? Even though the tables have been cleaned up into views to be exposed into Tableau by your data engineering team, that’s still not a true semantic layer. That’s almost just like an exposed data model. Whereas what would be, would have been the case in Looker and LookML is LookML would have defined how do you join these tables together? How do you sum this column and filter these rows to get to the exact definition of a specific metric. And that’s actually a semantic layer.

[00:09:55] Dara: What format does it take? I’m going to go right back to the basic end again. So you defined what it is. But what kind of format does it take? Where does it sit and can you interact with it? Or is it just like a, like a set of guidelines or a kind of set of definitions?

[00:10:12] David: So I think it’s like, yes, the core of the semantic layers definitions, but it’s obviously useless unless you can interact with it. And so typically the way you’d interact with it is that there’s two ways. Like, if you’re developing, you’re probably writing some kind of LookML or Qube modelling language, or DBT’s new semantic modelling language, which is based on MetricFlow. So it’s typically YAML. Some people even use Terraform for this. And that’s like how you define oh, this is how the data model fits together, this is how the tables join, this is how you sum columns to make metrics. And this is how you parse this timestamp to make a time dimension.

[00:10:55] David: And so that stuff will be defined in YAML, and that’s how the analytics engineer or analyst or whoever’s looking after the semantic layer will interact with it. Then other people can interact with the semantic layer, so it’s typical for these semantic layers to have APIs, which could be REST or GraphQL or even SQL where engineers or analysts can then consume from the semantic layer and so they’ll make a request and they’ll say something like, I want this metric with these dimensions filtered to these dates.

[00:11:24] David: And that will be compiled by the semantic layer compiler into SQL, which will run against the data warehouse and then the results will come back to the user. So that’s like the two different ways of interacting with the semantic layer and you can see like, uses from, ranging from like front end engineers with the semantic layer all the way to just analysts writing like some kind of something that looks like SQL on it.

[00:11:50] Bhav: I’m really angry, David. I need to like share my anger right now, and it’s listening to you talk about semantic layers and describe what they are. It makes me think like I didn’t have to spend so much time writing code and SQL and like parsing dates and cleaning it and doing all the stuff I’ve had to waste my time doing, and I, and I genuinely feel like I, it’s now considered wasted time knowing that someone should have done this for me.

[00:12:15] David: Yeah, because you were basically being the semantic layer, you were being a human one, and I’ve done that for years, right? It is kind of wasted, although the ability to use them is relatively new you know, without being part of some monolithic stack like IBM or Microsoft.

[00:12:32] Bhav: Yeah it’s really funny, it’s just, honestly I’ve had to write, you know, you mentioned, like, you write the definitions for what the metrics would be and how it’s aggregated and things like that, like I recall pages and pages of Google Docs that I’ve written about metric definitions, and I never considered myself a human semantic layer until this, until literally this, this moment in time.

[00:12:54] Dan: Oh God, I can just see that the number of times the joke happens of like, well, that’s just semantics though, Bhav. Like it’s just, you know, we can fall into that trap over and over again. David, when it comes to semantic layers, what you’re defining there, and it doesn’t, it doesn’t feel new, this idea of whether you’ve got a human being or whatever, or some kind of system, whether it’s a reporting layer, you know, building a dashboard and singing from the same hymn sheet and all the other terminology we have for kind of talking about the same thing. So it might just be me or my perception of it, but I’ve just been hearing this term semantic layer so much, especially over the last 12 months. So what has brought this subject back into fashion or what’s brought it to the forefront of conversations so that you know, it’s being talked about in a lot of analytic circles, meetups and conversations like we’re having today. What do you think has brought that attention or refocus back on it?

[00:13:41] David: Yeah, so it’s not new at all right? So it’s at least as old as the early nineties where business objects released their own one. It was much more rudimentary than what we have today but it still did most of the things the ones today do and to be honest given that was actually like the first commercial product I would think that semantic layers are as old as using databases for olap use cases. I think there’s probably been older ones that we never even saw.

[00:14:10] David: So yeah, it’s not new I think why they’ve come into the conversation again is the modern data stack has kind of blown apart, like the idea of these monolithic data stacks, which all had something like a semantic layer. So Microsoft had SSAS, so that’s what I was used to playing around with when I was you know, early in my data career and, you know, IBM had Cognos, business objects existed, most of the Teradata would have had something, so most of these stacks had something which did this, but when the modern day stack came along, all of these ancillary pieces of software that were sold with these monolithic data stacks weren’t then available.

[00:14:53] David: You got all the power of, you know, Snowflake and things like that, but you didn’t have all of the, you know, trimmings that you used to have. So what’s brought it back is people have realised, no, we do need that, it doesn’t matter how powerful this data warehouse is. You know, if we want single source of truth, if we want consistency, we need this semantic layer. And so then, you know, you’ve had startups like Transform, which got acquired by DBT. You’ve got Cube, and Looker, obviously it’s the big, probably the most famous one from the modern data stack, which is a BI tool, but actually, it’s quite a thin BI tool on a thick semantic layer is the way I’d put it.

[00:15:31] David: And so that’s why it’s come back into the conversation. The other thing that’s making it come back into conversation, especially in recent weeks, is how people want to use AI and LLMs to access data. And one of the things that Michael and I realised, you know, late last year was text to SQL is never going to work, you know, if you’re going to get LLM to write SQL based on giving it a schema, even with a lot of context, it’s never going to be good enough, because it’s just too, there’s just too much room for it to go wrong and too much room for it to hallucinate.

[00:16:05] David: You need something that’s going to give it context, but you also need something that’s going to give it constraint. And that’s kind of like what’s really good about a semantic layer. Like if you think about a SQL query, you can write it in many different ways and get the same results or correct results. If you think about a semantic layer API request, that’s not true. Usually you can write it one way and maybe reorder the fields and that’s it. And that is really an LLM needs that to consistently give people good answers with data. So I think that’s why right now it’s becoming even more important than it was say a year ago.

[00:16:42] Dara: Where does it sit? I’m just, because you mentioned there that it can be part of a, it can be part of a bI tool or BI technology. Can it sit on the data warehouse side? Should it sit between them? Does it matter? I mean, where should it ideally sit?

[00:16:58] David: Yeah, so there’s a project called Apache Calcite out there, which is made by someone at Google. And that could easily sit inside a data warehouse. But the problem with it is You’re basically adding a lot to a data warehouse, and it’s not going to be SQL, it’s going to be something else, really. I think it’s going to be hard to write some of the abstractions that you’d want to, like, for example, I’m going to define an entity, and the way I’m going to define this entity is this subset of this data table, and that’s how I’m going to define it, but it means this is an entity.

[00:17:37] David: That kind of abstraction is difficult to write in SQL. You end up just creating other tables, and then a table is not necessarily an entity, and it just gets very messy. So it could live in the data warehouse, actually. But we’re seeing it’s most commonly in a BI tool. We’re seeing standalone ones I think I prefer the standalone approach because what we’ve, from what we’ve seen there, it usually ends up with better outcomes. So certainly better than the ones that we’ve seen attached to BI tools and we haven’t even seen one attached to a data warehouse yet.

[00:18:10] Dan: How relatively modular are these? Because it sounds like there’s a whole ecosystem regardless of where it lives that are being kind of funnelled through this one kind of contextual layer. And a lot of the ecosystems, the digital estates, the stuff that we’re looking at the moment is this, the composable ecosystem, and it’s a kind of plug and play, we can swap it out if we need to, but it feels like this is more of a cemented product. Once you’ve kind of gone in, it feels like swapping out a semantic layer is going to be a real fucking pain to put it lightly. So how composable is this? How modular are semantic layers in the, in the scheme of the modern data stack?

[00:18:46] David: So they can be composable and modular, that doesn’t mean that they’re easy to replace. So one of the things that’s going on at the moment is people are considering whether they want to move off the original Looker product because of what GCP are doing in the space. And that’s difficult, when I was at List we had Looker, we had, you know, tens of thousands of lines of code in LookML, that’s kind of like a monolithic piece of software, really, there, and it’s really hard to move.

[00:19:15] David: And especially when you use all of the corners of the product, like, that LookML has, like, you can have, like, dynamically created views that change grain on liquid based on parameters and crazy things they’re really hard to move, like, once you’ve built those. So, in that sense, they’re not very easy to just replace quickly, that’s going to require some kind of migration project. And what’s true is when they’re part of a BI tool, they’re not very composable or modular, really. They’re really to be used with the BI tool, which isn’t that helpful for anything other than BI. But what you see when you have tools like, Cube, or if you have like the dbt semantic layer, they have a rest API interface.

[00:20:00] David: And so suddenly they are quite composable because lots of people with different use cases can consume the data from one place, which is very good. And it, you know, it’s very much like a similar concept to how, you know, people were saying Snowflake was where the gravity of data was going, where everything was going to go to Snowflake and everything would consume from it. The problem was the lack of definition, the lack of consistency. So this kind of provides like a, an abstraction on top, which then people can not worry about how to define or understand the data, just take what they want, and it’s consistent. So I think in that way they are composable, they are modular, they aren’t very easily replaced because of how much logic you have to put into them.

[00:20:42] David: And the logic has to go somewhere, I don’t see this unavoidable, either they end up in very horrible, complicated SQL queries if you don’t have a semantic layer, or they end up in the semantic layer, there’s just not much of a choice.

[00:20:54] Bhav: David, I was reading your post, the golden leap frog one and you did a survey in there about technology stacks being used. Now, I know you mentioned on your post actually that the responders are maybe somewhat skewed towards a modern data stack because of the fact that they’re part of your meetup community.

[00:21:12] Bhav: But the second survey question on there was do you use a semantic layer? The answer was no. I know you kind of touch on it on your blog posts for users, for our listeners who haven’t read your blog posts, you know, like, why is there such a disconnect between companies having these modern data technology stacks, but they’re not implementing something like a semantic layer to make it easier for users to, you know, to like to query and understand their data and match it with business logic.

[00:21:41] David: Yeah, so I think there’s a few reasons for that. Part of it is skill, so if you think about like how Power BI was the most popular BI tool, even for that skewed audience, let’s call it roughly 50 percent of the BI market uses Power BI. But a very small fraction of those people use Power BI’s like semantic layer, which is MDX. And I think that’s partly because not many people know MDX, you know, whereas if they were to just write SQL and push that into a workbook and then make a dashboard with it, anyone could do that. Well, any data person could do that.

[00:22:14] David: And so I think there’s a skill, a skills problem. There’s also just a knowledge problem. People don’t really even really know about what a semantic layer is. So I recently spoke to a director of the eye working for a pretty big company, they had never heard of the term semantic layer.

[00:22:29] David: So there’s just a lack of knowledge that exists. There’s also then time so, you know, investing in a semantic layer does require time, probably requires money, you know, if those resources aren’t available, it’s not going to happen really. It’s the sort of thing that sometimes younger companies who are a bit luckier and that, oh, we’re going to set this up for the first time. And you’ve seen that there’s a load of startups that chose Looker early on in their journey, and then they invested in the semantic way out front and got the benefits of it. Whereas older companies who already had Tableau or Power BI or something like that maybe wouldn’t have done that.

[00:23:05] David: So yeah, those, those are probably the three main reasons why people haven’t adopted them as much as they could. I think the move to the modern data stack hasn’t helped either where Semantic layers aren’t bundled with your data stack, because, you know, people are moving off Microsoft or Oracle or whatever, which had those things, and they’re moving into these modern data stacks where they’re picking the things they need to do the job, and then gradually building up over time.

[00:23:32] David: Like, that doesn’t help. And then also the level of turnover of people leaving teams and joining new teams, like, they’re not getting to the point of maturity where they’re building things like that either.

[00:23:43] Bhav: There’s a couple of things to pick apart there. First of all, on the skills gap, with implementing something like a semantic layer, is that not, is the skills there not just SQL based skills? Or is it another technology that data engineers would have to learn to be able to build that semantic layer? Because I feel like the LookerMLs that I’m used to having worked with, that were built by obviously, but built by someone else, were all written in like, a form of SQL.

[00:24:08] David: Yeah, I think that is mostly true. It does look a lot like SQL. I think some of the abstractions, like building macros and things like that, those are a bit potentially off putting to people who haven’t seen them before. So, like, I remember picking up Looker quite quickly when the first time I saw it, but I had someone who was used to it I was happy to read docs and learn something new. You know, not everyone thinks like that.

[00:24:35] Bhav: And the second thing you mentioned, so, is that moving from Power BI to something else, you know, you do it bit by bit. Why don’t the platforms you move to contain the opportunity to build a semantic layer? So obviously Looker does it, but not everyone would have that. Would it not be in the interest of the companies who build these you know, data platforms to have the option, you know, the ability to build a semantic layer into it, much like Looker has?

[00:25:08] David: It certainly would be of benefit. I think there’s a couple of problems there. So I think part of it is that when the modern data stack kind of came out, everyone moved to it off the legacy monolithic stacks, you know, they just picked the things they needed. So oh, I need a data warehouse, I need Fivetran on ELT tool to get data into the data warehouse. I need a BI tool. What does it need to do? I need it to consume data from my data warehouse and make graphs.

[00:25:34] David: Some people they’re in a rush, they don’t have much time or resource. So they just choose something. And a lot of people may have chosen tools as that BI tool, which is still why Tableau and Power BI are so popular, that they’re comfortable with and so they chose those tools to put on top of Snowflake or BigQuery or whatever else. And the problem with that is when you’ve taken those individual tools and Tableau doesn’t have a semantic layer, even though they’ve bought a few in the past and not deployed them. You then don’t have it and that’s why those teams end up without one in the stack unless they then go and choose to buy one and then refactor everything they’ve built in Tableau or Power BI already and then reconnect it back to that semantic layer.

[00:26:18] David: So that’s part of the difficulty there. But then the reason why those tools haven’t necessarily offered something, it’s like Power BI has MDX. It’s not an ideal semantic layer by any means but it’s there and you could use it. But I think that’s part of the problem is that because it’s awkward to use and then involves you being very locked into Power BI. I think people are reticent to use it, and then also they have to learn how to use it, which isn’t that straightforward either. But the other reason why some of the other providers haven’t built semantic layers is because they’re actually hard to build.

[00:26:49] David: So like, we work really closely with Cube, which is a really good partner of ours. Cube have spent, you know, four years building what is now probably technically the most, the best semantic layer out there. And it’s taken a long time, a lot of engineering effort to do that. So do those big BI tools or existing BI tools want to, you know rock the boat and build something like that, which not all of their customers may want I don’t know, maybe not.

[00:27:16] Dan: So when you were given the example about how BI is kind of like pseudo semantic layer, that’s okay for Power BI. I think this does not touch back into like something we spoke about right at the beginning of this episode, where it’s like, that’s more of a reporting layer than a semantic layer and actually having kind of unified kind of definitions to be used elsewhere, especially in the world of large language models and generative AI.

[00:27:36] Dan: And actually using the Power BI semantic layer is great for visualisation in Power BI, but not an awful lot else. And the power really comes from having a true semantic layer distinct from that right. As you have, you’ve been saying so I think where my head’s going is like, is it even fair to call that a semantic layer? Like, is it even classed in the same area? Like, it’s almost like you’re going to make reporting slightly easier for yourself, but you’re not doing the job of a semantic layer, right? Like it is purely a means to an end to be whoever the analyst is to build faster dashboards, right?

[00:28:08] David: Yeah, I probably agree with that. It’s close to a semantic layer, but it’s not because yes, it’s locked to Power BI.

[00:28:14] Dan: It defeats the point of it being a semantic layer being open, right?

[00:28:17] Bhav: I’m starting to think that the use of the semantic layer and need for it is really to future proof and take away the need to write codes, SQL code and make large language models compatible with your database in a way that offers data and that is accurate and actually without the semantic layer you, I guess from your point of view, this large language models just wouldn’t be as effective.

[00:28:43] David: Yeah, definitely and it also kind of touches on another point, which is I think companies are productizing their own data more than they’ve ever done before. So whereas in the past, the purpose of a semantic layer may have just been for a BI tool to use, now companies want to use them for so many more things. Like they want to do embedded analytics for customers, they want to do internal as well. They want to use it for some other engineering purposes where they want APIs. So it needs to, I think that’s a part of the rise of again, why semantic layers are becoming more important at the moment as well.

[00:29:18] Bhav: I feel like the only reason that companies want to probably have that semantic layer to productize it is so that everyone can rip off Spotify’s end of year wrapped up thing, right, and have it speak consistently. I think Monzo have started doing a Monzo wrapped, and I feel so bad, it just feels like without the semantic layer, you can’t create a wrapped for all your users.

[00:29:44] Dan: Well, I actually watched my Monzo wrapped this morning and god, it was just such a waste of my time. Like, it was just showing me that I do grocery shopping and spend it and Jesus Christ, I don’t need to be told that. I also don’t want to know that stuff, especially when like my entertainment stuff is just local bars and pubs.

[00:30:00] Dan: Anyway more kind of holistic around semantic layers, David, I just wanted to get your perspective on the future of them. So when it comes to large language models, specifically they need at the moment, they’re quite costly to run. And so what you need is to kind of feed them the right data to get sort of right output.

[00:30:15] Dan: So in a sense, a semantic layer is perfectly positioned to do that. That’s why it’s come kind of super popular again, recently with the rise of generative AI, open AI models, and the kind of Google and Microsoft versions as well. My question to you though, is as we’ve gone from a world where it was ETL, then it was ELT because storage became way cheaper, just throw it in we’ll fix it afterwards.

[00:30:34] Dan: Then we go into the world of things like data meshes and just accessing data ad hoc is when you need them and kind of combining data in lots of different ways because it becomes easier and we kind of get better at doing it. Do you see a point in the future whereby we maybe not need a semantic layer because the processing of things like large language models can be so cheap and so quick and so effective that it will kind of just get it maybe with some simple pre prompts through, I suppose, what we currently think of as a custom GPT in the open IA world.

[00:31:02] Dan: But do you think we’ll get to a point where we don’t need to do this manually set up definitions of stuff? Or do you think there’s always going to be a role for this middle layer to kind of abstract everything?

[00:31:13] David: I think the problem with the case where we don’t do that is essentially we start handing off the meaning of defining the meaning of our data, defining how to use it to the LLM or the AI system to define for us. At this point in time, and having had my experience in business, I can’t see that happening because business always wants to know like the provenance of the charts and data that they’re looking at. And because they want to be able to question it, they want to challenge it. And so as soon as that’s handed away from humans, suddenly people are saying, well, I don’t know, it’s decided to do that for itself.

[00:31:52] David: And it might always be right and I’ve thought about this myself. Imagine if you did have a GPT that was 100 percent correct when you asked it a data question, and it didn’t use a semantic layer. And part of me realised that’s not actually good enough, because the transparency won’t be there. And you know, I think people will need that, you know, for a very long time until you get to a point where the way businesses work with AI is very much progressed from where we are today.

[00:32:25] Dan: So is it a trust element then more than anything? It’s about having confidence and trust understanding that in a sense there’s been human intervention and we’re using this tool in a kind of monitored way. And then until the kind of points of view change and societal change happens, it’s going to stay that way.

[00:32:42] David: Trust is, you know, as always has been like trust is the biggest, biggest problem or biggest issue in data, right? So that’s exactly why we need it.

[00:32:54] Bhav: David, I have to ask, do you think this is a problem that we’re building for ourselves and off our own back? Is this something that the rest of the organisation will even care about? Because if you think about the challenges already facing data teams right now, we have challenges around resources, around prioritisation, around building our own destiny. It’s almost largely dictated by what marketing needs, what product needs, what finance need, we find ourselves in a situation where talking about potentially, you know, now we talk about semantic layer, five years ago, we may be talking about reporting layers and presentation layers, and then trying to carve out time to do this work that we consider value adding.

[00:33:38] Bhav: Do you think this is just something that we’re, you know, it’s always going to be in the data ecosphere, you know, ecosystem, and no one else is really going to care about it outside of the data team?

[00:33:49] David: What I hope is that, and this is sort of along the lines of what we might try and build in the near future, is how can we democratise this responsibility so that other people in the business can also define, because lots of people in the business who aren’t in the data team kind of know what the data means, know how it should be used. And so if you could guide them through almost automatically or conversationally generating a semantic layer, however small that may be, to help them self-serve, like that could be something good, because it’s still supervised, you’re still saying to a human, you’re getting the meaning of the data and how it should be used from them and it’s governed, it’s still governed, right? If that happens.

[00:34:34] David: So how can we make that process streamlined? How can we make that easy? So data teams aren’t spending all the time. Like what if we could, at onboarding, sit a data team member down who’s been there for a while and say, here, spend a couple of hours with us, answer like 120 yes, no questions based on metadata that we’ve gleaned from your data stack. And then we’ll have a working semantic layer for you at the end of it, right? That is like a much lower commitment than some multi month project to go and build a semantic layer, which probably won’t happen or get approved, right? That is doable, you know, in my mind.

[00:35:11] Bhav: Okay, and then do you think this is going to be something that will add the incremental value over not having it, like how much additional value will people realise by implementing a semantic layer versus just having views in Tableau or something along those lines, you know what I mean?

[00:35:28] Bhav: It’s like, is the return on the investment going to be so great that people will, I mean, it’s probably a loaded question, you’re probably going to say yes. But I ask you to wear an objective hat and answer that question and say like, is the additional value for building a semantic layer and, you know, having it fully deployed going to add that much value to the organisation.

[00:35:53] David: So what I would say to that is like Looker already proved that there was value to it, right? That’s why Looker was successful was that they, that was like an empirical proof that semantic layer in between the consumer and the data added value, it made a good chunk, not the majority of business users able to self serve with data.

[00:36:15] David: And so like what we’re then going to see with LLM interfaces like Delphi is that more than just that initial group of quite tech savvy users can then go and self serve because suddenly it’s anyone who can kind of construct a reasonable you know, question. And even if they don’t, they can be guided towards constructing one. And that’s almost everyone in the business who needs to use data. So I think we’re going to see that uplift in the value of having one.

[00:36:46] Bhav: But I would argue that it’s, and Dara and Dan will agree on. I like to ask the difficult questions to our guests. I would argue that I don’t think Looker has proved the use case on it. And the reason I say this is, having used both Looker and Tableau, where Looker does have the semantic layer and you know, all of the bells and whistles, and Tableau doesn’t, they operate purely with views and data models that have been exposed.

[00:37:09] Bhav: And having managed analytics teams in organisations which had two different, you know, for both of those tools, I found the adoption of data and self serve was actually significantly higher at the company that used Tableau because Tableau is still way more user friendly than Looker and I feel like Looker still, you still need to be an analyst of sorts.

[00:37:34] Bhav: So when you say Looker proved, you know, proved the use of the semantic layer, did they prove it at a very global level or was it really just they proved it for analysts or people who have a base knowledge on, on how to work with data?

[00:37:49] David: And I think that’s part of the problem, right? The people who could self serve with Looker well were typically analysts, product managers, engineers, maybe marketers who are, who are somewhat tech or data savvy. Admittedly, the organisation where I found that we had a lot of those people and they probably made up at least a quarter of the organisation, so that’s why I saw that uplift. Whereas in the previous organisation where it was Tableau, we, as the data team basically had to do everything for people other than filtering a dashboard, they couldn’t really do anything.

[00:38:20] David: And, you know, we felt that was you know, the number of times we were just doing a small piece of work just to edit a dashboard. Whereas when I was looking after the data team that had Looker, you know, someone in finance or marketing would just happily add dimensions for themselves to a look to get the change they wanted.

[00:38:38] David: You know, that was like chalk and cheese. Like that was a real, a real step change for me. And I think like what you mean by self serve is really interesting. So if you just talk about consuming an existing dashboard and think of that as self serve then yeah, maybe it’s not. But for me self serve means actually constructing like your own graphs or you know pulling the metrics and dimensions that you want out and for me that’s why for me Tableau doesn’t even really offer that to the non technical users. They don’t have that way of exploration. Maybe it’s a different definition of self serve.

[00:39:12] Dan: Is this where, and because I don’t deal in Looker as much really ever, I mean, the kind of people we work with, I mean, Looker Studio, they lazily call it Looker even thinking it’s the same thing. So I know that Looker released their Looker Studio connector, and I’m just wondering if Google, since their acquisition of Looker and kind of pivoting it to be more semantic layer like and using Looker Studio and thus the rebrand and bringing that into the kind of the brand loop, I’m wondering if that’s going to help that self serve stuff that you saw Bhav and David in terms of like the better adoption in something like Tableau, because now if you can just say to anyone, oh, look, you can build a Looker studio dashboard, use Looker as the data source, as in use the semantic layer, it kind of opens that up to Jesus, everyone I speak to has built a Looker studio dashboard or done something with it.

[00:39:54] Dan: There’s a kind of classic Google thing of making it super easy. I’m just wondering if have Google got this nailed, like are Google the ones to watch in this space? You mentioned things like DBT and Cube and other tools out there. I mean, I’ve worked with Google for my entire career and they have a habit of just coming along second or third place and then just fucking annihilating competition and just do something and just continue to invest and lose money at it until they become the dominant force.

[00:40:19] Dan: I’m just wondering from your perspective is Looker, BigQuery, Dataform, the GCP, is that the stack? If you were to, let’s say I’m bought into this right now. I’m a company that doesn’t have a semantic layer, I get the value, I’m a data team. Is that the place to start or would you suggest exploring options, potentially going down different routes? Where would you suggest someone start if they get it and they want to get started?

[00:40:42] David: I agree that they have the potential to have like an amazing stack for this. The problem is, I feel is, is like execution, because it’s very difficult to understand what their strategy is like among and like ripping parts of Looker out of the original Looker and then making Looker Core and Looker Modeler. Like, we’re on the GCP AI startup program, that’s our cloud provider, and we have access to most of the things early, but even for us we’re looking at it, and it’s hard for us to recommend that someone would choose Looker core, Looker Modeller over something like Cube, which is a full featured, well understood product with a clear roadmap and understanding of what they’re going to do.

[00:41:24] David: And I think that’s part of the problem. And I think there’s even like lack of consistency around strategy inside GCP as to what they want to do with it. Like when we’ve talked to different teams in GCP, they say different things. So, I think that’s part of the format you’ve seen their Duet product, which is supposed to be somewhat similar to ours, right? It works on Looker Data Studio, it connects to their semantic layer, but it’s very constrained. It kind of forces you to write a sentence it likes, and it’s just nothing like the experience that we give people. And I just wonder like, they’ve taken so long, they’ve got so much resource, like, I just don’t know, it’s hard for me to recommend them. So that’s why I don’t often think about mentioning Looker Core or Looker Modeller in the same breath as I would Cube or DBT.

[00:42:11] Bhav: We spoke to a guest recently who was convinced Google are just shooting from the hip. Everything is, nothing is as defined and they’re literally making up the story as they go and making decisions on the fly as opposed to it being a very strategic well thought through timeline which is insane to think about.

[00:42:33] Bhav: If I give you kind of like my definition of what self serve meant at the organisation where Tableau, I felt created more of a self serve environment. We started off with some basic reporting like everyone does, but I think where people felt the flexibility to be able to explore the data themselves, it was around taking kind of some core dashboards and then duplicating, copying, and then playing around with the fields and joining and then creating the graphs as opposed to just purely just segmenting and filtering on prebuilt dashboards.

[00:43:03] Bhav: So I think from that perspective. what I meant was, it’s self service in the more exploratory sense. When we were working with Looker at another company, I think people were just, all they could do was go in, look at the report, filter it and that was it. And that’s where they really struggled because, like you mentioned for yourself, the company you were working at were probably 25, 30 percent more tech savvy than the previous company where you struggled.

[00:43:28] Bhav: So this is where it’s coming from my, you know, I, I don’t mean to be challenging. Look, I like the product, but I like it with the lens of an analyst. I prefer Tableau from the lens, you know, when I’m wearing my stakeholder non technical hat.

[00:43:45] David: I can definitely see that happening yeah. And I think it’s, it shows a lot about, well, what’s the culture of that organisation, right? Are they the kind of organisation that’s curious and wanting to learn how to use things for themselves, or are they more traditional or hierarchical and they just expect things to go to them. So Looker’s a bad fit for that latter type of organisation.

[00:44:08] Bhav: I think you’re right, the organisation which was more self serving, I think there was a more inquisitive nature and a less of a fear of data than there was at the company that was were using Looker. So maybe you’re right, I think some of these and concepts and these discussions aren’t as black and white as which platform is better. They’re sprinkled with sorts of like people’s personal motivation, the how data driven the company is, and all of those types of things. And I think you need all of those things to be really successful so.

Rapid Fire\Outro

[00:44:41] Dara: Okay, David, to wrap up our conversation, I’m going to hit you with our five rapid fire questions. We’re going to put you in the hot seat briefly. So question number one is what, and you can go as broad or as narrow as you like with this, it’s up to you really. So what’s the biggest challenge today that you think will be gone in five years time?

[00:45:00] David: Maybe this is a hope, I hope that like some of the stuff that we do around infrastructure and integrating our stacks together, like that is mostly, that’s hugely gone in five years time and that’s just becomes a lot easier.

[00:45:15] Dara: So there’s always another problem to solve. So once that’s taken care of, what’s going to be the problem to replace it? What’s the big problem in five years time?

[00:45:22] David: So if you think about the way AI is progressing that we’ve already seen it progress in this year. And then what that looks like in five years time. I think the big problem there is how do we use AI, like, and okay, sure, from a data perspective, there’s lots of different use cases for AI, like co pilot type use cases, self serve type use cases using unstructured data, using AI, right? There’s all these different things that we could do but in a broader sense, it’s like, how do we get organisations, like, using AI effectively to increase their productivity, right? That’s the thing that needs to happen.

[00:46:05] Dan: Okay, and if you could bust one myth, what would that be?

[00:46:10] David: Oh that AI, and especially in its current form of LLMs, will take over the world. That’s not going to happen.

[00:46:17] Dara: We’ll come back in a few years time and we’ll see if you were right about that one. Next one’s similar so if you could wave a magic wand and make everybody know one thing, what would that one thing be?

[00:46:33] David: Yeah, maybe what a semantic layer is. That would be nice if I didn’t have to explain that every week.

[00:46:39] Bhav: I was going to say, if that wasn’t your answer, I would have been well disappointed. I’m glad that was your answer.

[00:46:44] Dan: Now you can point them to this podcast episode, so you’ve got that.

[00:46:47] Dara: Yeah, it might not tick the everyone box, but at least all of our listeners will know after this. So the final one, which some people think is the easiest, but others think is the hardest. What’s your favourite way to wind down when you’re not working?

[00:47:00] David: So I, I’ve recently joined this like parents of kids who go to my school who have like a board games club, like that’s great. Like no screens, you just play a board game and chill out with some friends. It’s really nice, that’s definitely my favourite. Play some very complicated board games so your brain is well, well occupied.

[00:47:23] Bhav: I feel like that, when you’re playing a new game you know, reading the rules is actually more stressful than anything else in the world. I don’t see that as downtime frankly.

[00:47:35] Dan: I did exactly that yesterday. I had some friends around, we played a new board game I got, and it was a slog. I’m looking forward to playing it again, but Jesus, that was hard. It’s just called fallout, it’s a fallout themed board game. And it’s so detailed, it’s amazing if you’ve played the games, then you’d find it really enjoyable, but just learning all of the different aspects. Like it just, it’s like every time you start doing something, there’s another feature, there’s another kind of system, there’s another aspect to it and it was just like, we got around like three times before we realised half the rules.

[00:48:04] Dan: And it’s not one to jump into. You’ve got to do some reading and some prep. But it is good fun. David before we let you go the last thing is that if people have questions about semantic layers, they want to find out about Delphi or yourself or anything to do with that. Is there any way that they can get in touch with you? Is there any websites or links that you can point to or plug anything you share, we’ll put in the show notes, of course, but how can they reach you?

[00:48:26] David: So if you want to know directly about Delphi and just want to chat, just reach out to founders@delphihq.com. So Michael, my co founder, or I will respond very quickly and set up a call with you. If you want to know more about like some of the things I write about, some of the things I think about, I write a blog at davidsj.substack.com so you can follow that. I write every week, I’ve recently crossed a hundred posts. I’m on LinkedIn as well, so I regularly post on there so you can find me very easily on LinkedIn as well.

[00:49:00] Dan: Amazing. Well, okay thanks for joining us. Thanks for taking us through stuff and humouring us with our terrible jokes and stupid questions. Thank you.

Tags: AI Analytics digital analytics Generative AI Google Looker Studio Podcast Semantic Layer The Measure Pod

Written by

#98 What on earth is a semantic layer?! (with David Jayatillake @ Delphi)

Transcript

Intro

Topic

Rapid Fire\Outro

Will Hayes

Further reading

Easy ways to prepare your BigQuery warehouse for AI

Data pipeline optimisation with Google Cloud and Dataform

Dataform for BigQuery: A basic end-to-end guide

#98 What on earth is a semantic layer?! (with David Jayatillake @ Delphi)

Transcript

Intro

Topic

Rapid Fire\Outro

Will Hayes

Subscribe to our newsletter:

Further reading

Easy ways to prepare your BigQuery warehouse for AI

Data pipeline optimisation with Google Cloud and Dataform

Dataform for BigQuery: A basic end-to-end guide