About this video
What You'll Learn
- Why high-cardinality workloads made the original InfluxDB index increasingly expensive
- How Rust, Arrow, Parquet, object storage, and DataFusion shaped InfluxDB 3
- Why SQL and InfluxQL are native in InfluxDB 3 while Flux remains unresolved
InfluxDB, a time series database, underwent a major rewrite to create InfluxDB 3.0, also known as IOx. The decision to rewrite the database was driven by the need for strict control over memory management and high performance.
Jump to a chapter
- 0:00 Introduction
- 0:57 Guest Introduction: Paul Dix, InfluxDB Co-founder & CTO
- 1:51 The Decision to Rewrite InfluxDB (IOX)
- 2:00 Rewriting InfluxDB in Rust
- 2:26 Challenges with the Original InfluxDB Architecture (Cardinality Problem)
- 7:30 External Factors Driving the Rewrite (Cloud, Object Storage, Kubernetes)
- 10:13 Why Rust?
- 11:38 The New Architecture: IOX, Apache Arrow, Parquet, DataFusion
- 16:16 The Decision to Build on Apache Projects
- 18:10 The Journey to Naming it InfluxDB 3.0
- 20:34 Observability Data (Logs, Metrics, Traces) and Infinite Cardinality
- 20:45 The Observability Database
- 22:48 Storing vs. Querying Different Observability Data Types
- 28:05 Current Use Cases and Challenges for InfluxDB 3.0
- 33:45 What the Flux?
- 33:47 Query Languages: SQL, InfluxQL, and Flux
- 34:35 InfluxQL and SQL Support in InfluxDB 3.0
- 35:52 The Future of Flux
- 42:14 Flux Community Fork and Language Polarization
- 44:45 OpenSource & Licensing
- 44:54 Open Source Licensing and Vendor Motivations
- 48:25 Industry Distrust from License Changes
- 51:54 InfluxDB 3.0 Licensing Future and Foundations
- 54:57 Conclusion and Wrap-up
- 55:00 Shameless Plugs
Full transcript
Generated from the English captions. Timestamps jump the player to that moment.
Read the full transcript
0:00 Introduction
0:00 Welcome to Cloud Native Compass, a podcast to help you navigate the vast landscape of the cloud native ecosystem. We're your hosts. I'm David Flanagan, a technology magpie that can't stop playing with new shiny things. I'm Laura Santa Maria, a forever learner who is constantly breaking production. Do you want a single database to store high precision, multidimensional time series data that supports infinite cardinality? Well, we're not there yet, but Poladex does share his vision and roadmap for InfluxDB three. Do you wanna hear David not have to talk about Rust to get a guest to actually talk about Rust? Now is your chance.
0:40 In all seriousness, we not only get to talk about a move from Go to Rust, also about observability and how it's changed over time, as well as a little bit about open source licensing changes. So let's Rust. I mean, Go. No. But, really, let's all Rust. Alright. Thank you for joining us, Paul. And for anyone who's not aware, can you please tell us a little bit more about you and what you're up to? Yeah. So I'm Paul Dix. I'm the cofounder and current CTO of InfluxDB. We are the company that make InfluxDB, which is an open source time series database.
0:57 Guest Introduction: Paul Dix, InfluxDB Co-founder & CTO
1:16 It's useful in use cases like, tracking system metrics and application performance metrics, sensor data, that kind of stuff. My background is as a programmer. I started the company in 02/2012, and we went through Y Combinator actually under a different idea, and then we pivoted to this idea in the fall of twenty thirteen. Awesome. Yes. Okay. There there's more. There I I can go on, like, at length, but I'm trying to trying to be concise. It's a fair enough to get it. Yeah. Alright. Well, let's talk about your work on InfluxDB. So over the last, I'm not sure how many
1:51 The Decision to Rewrite InfluxDB (IOX)
1:56 years, you started work on a rewrite of Influx, moving towards using Rust on a project called IOX, which I believe this year, you have now pushed out into production as part of your InfluxDB cloud offering. So I really wanted to kinda drill into that and just understand more about, one, why a rewrite, change in programming language, rearchitecting the system, using Apache Ato, a huge task for a company to take on, and I'd love to know more. Yeah. So I'll just start with the obvious, which is a rewrite is basically, like, the worst possible thing you can do. Like, unless
2:26 Challenges with the Original InfluxDB Architecture (Cardinality Problem)
2:34 you're just, like, a sadist or a glutton for pain, you could should avoid a rewrite at all costs. Okay. That's out of the way. So InfluxDB, the original version is written in Go. And the core architecture of the database is built around what I would call, like, a clever hack. Right? So it's essentially, it's kind of like two databases in one thing. One is basically a time series database that organizes data on disk, as individual time series, which are value time stamp pairs in time ascending order. Right? So as all the data comes in, it tries
3:10 to organize that data on disk in that format, which is basically like a very, like, indexed format. The other piece of this is an inverted index that maps metadata to individual time series. So the metadata is like a measurement name, a tag key value pair, a field name. So just like you know, usually, people are familiar with inverted indexes is from document search where you have a document. You have an inverted index of the words that appear in the document. And then when you search, you find the words and you find the documents that have those words. Right? In this
3:45 case, we say, oh, tag key value pair. Like, host equals, you know, server a or region equals US West. And then you find the time series that match that piece of metadata. Right? So what that means is as you're ingesting all the data, there's a lot of indexing work that happens. Right? So it churn chews through, like, a lot of CPUs and stuff like that, and it reorganizes the data. Now when a query comes in, if the query is for one individual series, generally, queries are very, very fast. Right? This is a system that's optimized for that kind
4:20 of query workload and for that kind of, like, needle in a haystack query, specifically on, you know, the, you know, the individual series and the time range that you're looking for. But, again, like, the the problem becomes as your index, the set of metadata that you're tracking grows, commonly referred to as, like, the cardinality problem. Right? The number of unique tag values that occur. The actual just the number of individual series that occur, you spend more and more time indexing the data, and it just becomes more and more expensive to ingest it. Right? You spend more CPUs.
4:57 You spend more disk space storing the index and, like, all this other stuff. It becomes really painful, particularly as you try to add, like, more and more granularity and visibility, like, more precision into the measurements that you're taking. Right? Like, again, more kind of metadata that you're capturing around those things. And then the other side of this is when you want to do a query, if you have a query that's gonna touch, you know, tens of thousands, hundreds of thousands, or even millions of individual time series, right, you wanna do an aggregation across, you know, all these time series in a
5:33 region or whatever and compute something. Those queries become prohibitively expensive. And oftentimes, you're like, you can't even do them because the engine will just, like, fall over. Right? You just it the the way to do it and to map it onto that index structure is just way, way too expensive. So, basically, that's the version of the database written in Go, version one, version two, same storage engine we created. We're like, we wrote a storage engine from scratch that kind of has this architecture. And over you know, we created the initial version of that storage engine in,
6:09 oh, I lost my screen for a second. I flew over there. We created the initial version of that storage engine in I mean, I prototyped it in the fall of twenty fifteen, and then we had the first release in early twenty sixteen that had that. And we've iterated on it and added to it over time. And, essentially, what we found over, you know, the next whatever number of years, four years, is that people wanted to have higher cardinality data. They wanted to feed in data where they actually didn't have to worry about the cardinality or
6:43 the femorality of the values that they were feeding in, and they wanted to do more analytical queries across it. Right? And all of this stuff was built around our query language, InfluxQL, which is a query language that looks kind of like SQL. So it's kind of, like it's kind of familiar and friendly, but it's for people who really know SQL, it can be frustrating in unique ways because it doesn't actually work exactly like SQL. But for some things, it's like, it's super easy to create, like, a time series query or whatever. So, again, like, there are multiple
7:18 problems we were trying to solve for. How do we solve this cardinality problem? How do we give people a query engine that can be useful for analytical style queries on larger chunks of data? And then this other piece, which was we needed to figure out how to store a massive amount of data at a much more cost effective way. Right? InfluxDB one and two have just a base level of assumption that you have a, you know, a locally attached SSD or, you know, whatever, like an EBS volume, a high performance network volume with provisioned IOPS
7:30 External Factors Driving the Rewrite (Cloud, Object Storage, Kubernetes)
7:54 and whatever. And all of your data is stored on that, and it's super expensive. And for a lot of our use cases, right, people could have a year's worth of data, but 99% of their queries hit basically the the trailing, like, few hours or a few days worth of data. Right? But they they want this stuff available and accessible, but they don't need it available accessible, you know, at the same response times, and they definitely don't need it, like, stored on expensive NVMe drives or whatever. Right? So we needed to figure out a way to
8:29 decouple the compute of ingestion and query from the actual storage of the data. Right? And, obviously, like, all this stuff is, like, building up over the course of, like, you know, 2017, '20 '18 is when I'm noticing this. 2019, it's just becoming more apparent. And the thing is, also, during this time, there are interesting things happening out there in the infrastructure world. Right? The rise of Kubernetes, for example, wasn't there when, when we first created InfluxDB. So this idea of, like, containerized applications and, like, this ephemeral application stack or ephemeral compute stack that you layer on, and then the other thing, the
9:11 rise of object storage as basically, like, a common storage layer. Those things happened over the course of, you know, that decade of, you know, the the tens, the February. And I think one company that really took advantage of that, at least the the rise of, like, object storage and decoupled compute from storage was Snowflake. Right? They're the first company that I think that really commercialized this idea of we can create this big data system that stores data super cheaply and just layers on compute on demand to execute queries on it. Now Snowflake is obviously designed for a
9:48 completely different use case than InfluxDB. Right? Snowflake is a data warehouse, big data at scale, whatever. InfluxDB is about operational data and real time. Right? You need to be able to query it within, you know, milliseconds of it writing it into the database, and you need those queries to return generally sub second so that you can build monitoring learning systems on it. You can build build real time dashboarding. So in 2019, I saw actually, like, in the fall of twenty eighteen is when I started picking up Rust. And I thought, like, again, like, I first
10:13 Why Rust?
10:23 commit on InfluxDB was in 2013, but the basis technology that we built for it was actually we had done and started in the fall of twenty twelve. And Rust was not a place where I would use it then. I used Go because I thought we will be able to move faster creating the database if we use Go as a language. But in 2018, I started picking up Rust and thinking, okay. This is actually interesting. And then in the fall of twenty nineteen is when the async await stuff landed in Rust. And when that landed, I thought, okay.
10:56 This is probably gonna be, like, a very a serious language for building server side software where you have to handle network requests and, like, all this other kind of stuff. Right? That was, I think for me, that was, like, the final, like, piece that I was looking for, that Rust was had actually arrived at a point where you where you you could use it to build a complicated piece of server software, you wouldn't have to build everything yourself. Right? There are certainly successful projects in Rust that started before that. Right? Like, did that. But, that point, I was like, okay. That's interesting.
11:31 So coming into the beginning of twenty twenty, which is when I kicked off this project, you know, I just said, okay. We need a different database architecture. This combination of this inverted index plus this time series storage and the way the entire database engine works is not gonna work for how for the for for what we wanna build, for the requirements we wanna meet. Right? Like, we're using, like, memory mapped files, which is like, again, you're not gonna get that in object storage, and it's not great to use in containers and, like, all this other
11:38 The New Architecture: IOX, Apache Arrow, Parquet, DataFusion
12:02 stuff. So I was like, okay. If we're gonna rearchitect the entire database, that's basically a rewrite of the database. We could do it in Go and the you know, there there's a bunch of stuff that exists in Go that we would reuse that obviously wouldn't be rewritten, the the language parser and, like, all this other stuff. But, basically, like, I I was looking at the project, and I thought, this is a rewrite. Like, if we if we actually try to do make these big changes. And, again, in, like, late twenty nineteen, early '20 '20, '1 of the other things I noticed out
12:40 there in the world was, Apache Arrow. Like, I'd known about the project for a little while. Apache Arrow is like an in memory columnar format specification. And I was looking at that, and I was looking at Apache Parquet, which is, you know, file format for this kind of structured analytical data. And I thought, well, there's I think there's really something interesting here. So, again, like, I wrote, like, some blog post in, like, early twenty twenty where I said, like, I thought that the different pieces of the Apache Arrow project would become, like, a way for companies that are building
13:21 data warehousing systems, a big data systems, a streaming data systems, basically, like, all these, like, analytical systems that are working on observational data of any kind, right, whether it's server monitoring data or sensor data or whatever, those standards would become a way for people to collaborate and build, you know, common infrastructure, but also proprietary solutions. And those will be, like, the touch points in terms of, like, how you exchange data between these systems. So that was kind of the thesis in early twenty twenty. Rust as a language is gonna be better because, one, like, the the multithreading support in Rust,
14:00 think, is just the way it handles it is is way, way better, right, because it's kind of enforced by the compiler. We wanted strict control over how memory is managed and and that kind of stuff, which obviously, like, basically, we didn't want a garbage collected language. Performance was gonna be a critical thing. Like, go super fast. Don't get me wrong. I love it as a language. It's way easier to learn and work with than Rust, I think, in my personal opinion. But I just thought for this kind of software, for, you know, database system that has to
14:30 perform at scale and with high performance, like, Rust just seems like the logical choice. And if it wasn't Rust, it would be, like, c plus plus. But, I think in this day and age, it would be it's just a better choice to use Rust. Yeah. And then, yeah, and then over the course of the next, you know, three and a half years, like, initially, we it was basically, like, me and one other guy within Influx for a couple of months, and then we hired somebody else, Andrew, who's still with us. And these are the three of us kind
15:03 of treated as a research project almost for the first, like, six months. It wasn't like I wasn't I within the company, there's no way I was gonna get, like, people to buy off on, oh, Paul wants to rewrite the database. Like, yeah, let's put a bunch of effort into that. No. It was basically like, I'm gonna do this as a research project because I think it's interesting, and I'm gonna see where it leads. And then by November of twenty twenty, we like, we've had enough of the pieces figured out. Right? We're gonna use Apache Arrow.
15:36 We're gonna use Parquet as the persistence format. Object storage is where all the data is gonna live. Apache Arrow Flight was gonna be the RPC mechanism that has since evolved into FlightSQL, which is a new standard they have for essentially making, doing RPC and SQL queries, in these kind of data systems, and, a project called, Apache Data Fusion. Right now, it's a subproject of Apache Arrow. Its Data Fusion is a SQL parser, planner, optimizer, and execution engine written in Rust. And we're like, at that point in, you know, the summer of twenty twenty when we decided to build
16:16 The Decision to Build on Apache Projects
16:20 around these things, you know, data fusion wasn't even close to as far along as it is now. And we knew, we knew that we'd be, like, investing significant effort. Like, we actually looked at using other SQL engines as the core of the database. We looked at some c plus plus engines just to think, like, you know, may we didn't wanna write our own. But Right. We needed something that was, like, optimized for our use case for time series. And we saw that regardless of what we picked up, we would end up having to do do a lot of work, and we'd
16:53 almost, know, have to take partial ownership of the project. And we thought this umbrella of projects under Apache, under the Apache Foundation, under Arrow, were you know, they were they were early, but they were promising. And if we really put our effort behind it, it would cause more people to also start programming against it and whatever. So in November of twenty twenty, I announced, hey. Working on a new core of the database. It's called IOX because nobody was comfortable with me calling it InfluxDB3.o. Because, again, they're like, you're not gonna rewrite the database. So there's
17:33 no way we're doing this. So I was like, well, we'll we'll just let's just see how it goes. So but, yeah, at that point, I announced it, and still it was, like, three of us working on it. It was still very, very early stage, but that allowed us to it got, you know, a number of people out there in the world interested in the project, and we hired some great people that joined us so that by March of twenty twenty one, you know, we had a team of, I think, nine people. And we spent years
18:05 writing a database, which we launched into production earlier this year. So how long did it take before you were allowed to call it three point o before people stopped telling you, no. You can't rewrite this database. I mean, we we announced publicly that it was, InfluxDB3.0 on April twenty sixth of this year. So inside the company, it took you, like, two or three years to get everybody on board with the idea of, no. Really, we did, and we're about to finish it. By I would say by by probably, I mean, it's basically by, I'd say,
18:10 The Journey to Naming it InfluxDB 3.0
18:48 2021, like, summer, fall of '20 '20 '1, people within the company are like, okay. We need this, like, new core database engine because, like, at that stage, it was obvious what the limitations of the previous engine were. Right? In the beginning, people are like, Paul, what are you talking about? Like, you know, there were some people who who got it intrinsically, and other people are like, we don't need to do this right now. And then by, again, like I said, like, the fall of twenty twenty one, everybody in the company is like, okay. We definitely need this new database
19:18 engine. When's it gonna be ready? And I'm like, guys, you know, we're we're not baking a key share. Like, it's gonna take some time. Yeah. So so yeah. By and then by, I'd say, like, the spring of twenty twenty two, it was like, okay. This is this is obviously what we need to be doing. And then definitely by the fall of twenty twenty two, it's like, okay. We're getting everybody focused on this new database engine, and we're gonna call it 3.o. Then it just became a question of when when we were gonna be more public about
19:52 the fact that it was 3.o. But in my mind, it was always InfluxDB three dot o. Even though it is a total rewrite and the database architecture is drastically different than the the underlying database architecture of one and two. Alright. Nice. I love that we just ask a question and then set you loose, and then you you just go for it. Sorry. Yeah. Go for it. No. Don't be sorry. That's the good part. It's fine. Well, I don't know if there should be more back and forth. No. No. No. That was absolutely perfect. And, you know, you kind of answered my second
20:27 question, which is good as well because we're thinking through the the problems faced well here. I think there's a lot of context about what happened in the industry, what happened with the early versions of Influx, why this rewrite was required, why Rust, why Arrow, all these things now make a lot more sense. Right? But Mhmm. You know, let's reemphasize one of those things you talked about with the twenty tens. It's like this was the decade where containers and cloud took off. Right? People were using the femoral compute, spinning up VMs on AWS, GCP, Azure, launching dozens to hundreds to thousands of
20:45 The Observability Database
21:02 containers orchestrated with Kubernetes, and all of these have their own signals. They have their own logs. They have their own metrics. They have their own traces. Traces are now important because at the same time of this wild cloud container orchestration evolution, people decided to start doing microservices because the technology enabled all these new architectures too. So I spend a lot of my time I'm just gonna, like, set context and not ask questions and then just let you infer the questions. But still, I spent a lot of my time working with companies that are trying to build up platforms. You know? They want
21:34 to make it easier for their developers to deploy to production. And I think one of the challenges I've seen is that people really struggle with, I need a database for logs. I need a database for tracing. I need a database for my metrics. I then need to aggregate the inquiry and all this stuff. And they make it really complicated for what is in essence all the same data structure. I don't think there's that much difference from a metric or trace in the log. It's all a collection of events. The difference between a trace and a metric is
22:00 the is an aggregation of some raw level event. And the challenge has always been the cost of storing it at the super high dimensionality the the super high dimensions versus the easiness of querying that, which is why we probably do metrics and terrible versions of histograms and all this other stuff that we now accept as the norm. Right? Yeah. Does and I'm gonna quote something on your website that you may hate me for. Right? But when we talk about IOCs and d b three, you you specifically say infinite cardinality. Yeah. So the question is, does IOX give
22:32 us or can it or InfluxDB three, sorry, give us a single store for all of these signals, all of this observability and monitoring data? And can you give us a bit more insight into what that infinite cardinality actually means on a practical term. Yeah. So to store the data, yes, three dot o can store all of that kinds kind of data, right, because of the the fact that cardinality doesn't matter. The problem becomes happens when you try to query that data? Pulling it out. Right? And that's basically, like, the the patterns for how you query the data
22:48 Storing vs. Querying Different Observability Data Types
23:12 and what people expect are why you end up with, you know, I I I think, why you end up with three different systems for for storing each of those kinds of data. Right? Traces, metrics, and logs. In my mind, they're all just, like, different views of the same thing. Ultimately, like, if you wanted to, you could just have traces and skip the logs and metrics. Right? You can infer you could derive everything else from raw traces. Because traces, again, like, you could have just a blob string field in the trace that has log info. Right? So but the problem
23:50 is the problems are, like, what happens at scale. Right? And when you start generating a ton of this data, do you end up having to sample it? And then how do you actually access the data? What are the access patterns? Right? So for right now, you know, the the metric access patterns are like, I have this metric, and I wanna look at it. And the idea behind metrics is you're actually metrics are a summary view of some underlying distribution or some underlying thing that you're tracking. Generally, metrics are not the raw high precision view. Right? For example,
24:25 if you want the average response time in one minute intervals to an API call, right, the specific, like, API endpoint. Right? Now the raw view is individual requests. Right? And you log every bit of detail you would want on that request. Right? What host received the request, what user submitted it, what token they were using, what endpoint, the actual data included in the request, the response time, the response itself. Right? You could you can get down to just an insane level of precision. But the problem is to do that at scale is completely unreasonable. Right? Generates, like, more
25:07 data than you ever even stored and all this other stuff. So you end up creating these systems to summarize things. And the problem that people frequently run into is like, well, if you didn't think ahead of time of what you needed to summarize, when you go to look at the summaries, your metrics, the answer you need isn't there, and the the requests already happen. Mhmm. Right? So logs are a way to capture more detail and then kind of, like, try to figure it out after the fact. Right? So the idea with logs is it's it's something where, you know, you're doing an
25:40 ad hoc investigation where you're not continuously, like, looking for some signal that triggers, like, a a problem in the system. But so, ultimately, like, storing all that data, you can use the same format. Parquet is a storage format, for example, could store all that kind of data, but querying it effectively and efficiently is difficult. And that usually requires either secondary indexing structures or other ways to organize the data so that you can actually query it effectively. Now, aspirationally, three dot o wants to be the home for all that kind of data. Basically, for any kind of observational data you
26:20 can think of. Right? And, like, for us, it's not just, like, you know, the server infrastructure monitoring use cases, but also more and more sensor data use cases. And, again, with sensors, you find the same kind of thing. Right? People can deploy more sensors in their environment for the machines they're tracking or the environments they're tracking, but that they can also increase the precision of the measurements. Right? They can increase the sampling interval. They can increase the precision in terms of what gets tracked with each measurement that gets created. Right? And that's all the metadata that you
26:56 could potentially track about something. The sensors, you could be, again, like, all the stuff around the customers or the users. It could be around the location, the, you know, the lat long, like, all that other kind of stuff. So getting there like, we're there in terms of being able to store it. We're not there in terms of being able to query it. We organize data into large chunks, and then the query engine just kind of brute forces those chunks. So I think the it's yeah. That that is gonna take some time to get to the stage where
27:29 we can actually do all those things. I think there's some other stuff around, like, with logs and tracing use cases where the schemas are very, very dynamic, and they're not always consistent. Right? If you try to, like, pull out structured fields from these things, a lot of times people won't have the same field types for something that's named the same thing because they're in different services or whatever. So those are all just, like, kind of, like, weird They're fun. Yeah. Infrastruct like, horrible, like, problems that you just end up having to deal with. So whereas, like, right now, I would say
28:05 Current Use Cases and Challenges for InfluxDB 3.0
28:06 with 3. O, it's better for, like, structured events, right, where you have events that you're tracking and you wanna get high high precision data, right, where you can slice and dice it or any way you choose. Right? So systems where you can think of, like, use cases where you think that'd be good is, like, if you're doing usage tracking and an API, audit logging, any type of individual events. Metrics is also a use case, obviously, that InfluxDB is used for, and this engine is useful for as well. But logs and traces are a bit trickier
28:40 because of the again, like, kinda how flexible the schemas are as people deploy them in different systems. And the thing with tracing that's weird is all the like, I think most of the tracing front ends look at, like, you're looking at a metric view or a log view, and it's like, oh, go look at the trace. So, basically, what you're doing is you're jumping off to look for a trace by a trace ID, right, which is a essentially, that implies what you want is an index which maps an ID to an an individual trace. And, of course,
29:15 time series database isn't really designed to do that, or, like, the way our database is structured is not really designed to do that. Now there are ways ideas we have for being able to layer that in without having to create super expensive secondary indexing structures, but, all of that stuff is gonna take time. So I think with with tracing, it may would would make it easier is when you have a trace ID, you basically always have a a time range as well. So for a system that stores traces at scale, it would be easier to say, like,
29:47 oh, give me this give me the this trace ID, but this is, like, the time envelope that it appears in. At least that's what I'm imagining make it easier. But yeah. I don't know. Yeah. Metrics, logs, and traces is basically, like, the gold standard for how you do observability right now, but I don't think I don't think that's the end state. I think it's not ideal. Like, the usability isn't ideal. It can be painful. Tracing is super like, tracing is expensive from a development perspective in terms of putting it into your code, but it's also expensive in terms of, like, an
30:23 infrastructure and, like, being able collect all this data. And then you get into, like, figuring out, okay. Do we need to do sampling because there's too much? And I don't know. It's all still too difficult to use, which tells me there's a lot of room for for innovation. But the thing is there are a lot of really smart people trying to fix these problems. It's just really hard because of the, like, volume of data keeps increasing, and the demands of the user base also keep increasing. Right? Yeah. I used to work at a logging company and back
30:56 around the same time when you were discussing how do we move this off to other things. And I remember that being part of the discussion that we were handling petabytes of data and how do you handle just that much. But I I find it interesting if you do do you end up with a lot of legacy systems as well? Because that was always the issue with the logging is that to go back and change it from the string that who knows what the developer decided to put in there, if they've decided to put anything in there. And then there's all these
31:27 different levels of logs, and they're all strings that you have to figure out how to parse. Then eventually, people move to structured logging, but that wasn't a thing. So that's a whole another can of worms to open, I imagine. Yeah. I mean, you you then you have to either, like, parse the legacy logs into some sort of structure. Right? I mean, I and, again, like, the structure is really about query performance. Right? Because you don't you could Right. You could create the structure on the fly. Right? You could just store all the logs and whatever and basically, like,
31:59 at query time, parse each log line in in the structure. And, again, like, you could do this on the fly to be like, oh, well, that query didn't work, but I got this error back. So I'll change my the way I'm structured. I'm trying to parse that particular line or whatever. But the problem is that those queries are super expensive to run. Right? They cost a lot of CPU. They cost a lot of network bandwidth. So, really, like, the thing about structured logs and even metrics is about essentially introducing structure and summarizations that make the queries
32:33 efficient enough to run for whatever the use case is. And I think that's also one of the thing to keep in mind is, like, is this a case where you're building for, an a system that does automation where it needs, like, the query to return in, you know, tens of milliseconds, hundreds of milliseconds? Is it something where it's like and that the, you know, the automated systems generally are like, those queries run all the time, right, many, many, many times. Right? Or is it an ad hoc query from a user who's doing some investigation, in which case they're more than willing to
33:07 wait tens of seconds for a return. And those queries are few and far between. Right? You're not doing that many of them. And I think the I think the ad hoc thing is is probably easier to solve with that pattern of let's take these blobs of data, put them into object storage, and spin up compute on the fly to handle those. It's the it's the it's the real time systems where you have to really think about layering in structure and optimizations that you can answer the queries fast enough. Alright. So you mentioned at the start how
33:47 Query Languages: SQL, InfluxQL, and Flux
33:52 you had this InfluxQL in version one. This SQL SQL like language that frustrated people because it wasn't real SQL that they were familiar with working with other databases. With data fusion, do people get access to more traditional SQL? That's part one of the question. Part two is with InfluxDB two, there was a lot of investment into the Flux language with the messaging around how Flux was the purpose built for time series and SQL wasn't. I'm curious if that has changed, and do we feel now that SQL is the right language for time series, or is there still
34:29 a future for Flux within FluxDB three? Yeah. So, first, I'll I'll mention about InfluxQL. So some people found it frustrating because it was unlike SQL, but a surprising number of people have told us they actually prefer InfluxQL to SQL for writing some basic time series queries. Right? Because it's just, like, super easy to write a thing where you're, you know, getting a summarization by these, you know, different, things. Whereas, like, in a SQL engine, you might have to deal with windowing functions and partitioning and, like, all this other stuff. So I think InfluxQL has a place
34:35 InfluxQL and SQL Support in InfluxDB 3.0
35:07 that will continue will continue to serve just because we've gotten that feedback that a number of people actually prefer it to SQL. So with Data Fusion, we have a fully featured SQL engine that supports all of these things, joins, window functions, partitioning, like, this all this complex SQL stuff. We've been adding in more and more functions to make time series stuff, to make it more capable of doing, like, time series specific style queries, and we'll continue to do that. We're certainly not done with that. So the thing with three dot o is, again, it's built around
35:48 this Data Fusion query engine, which is a SQL engine. Now all that's written in Rust. Obviously, two dot o is written in Go, and Flux as a language was we developed for two dot o. And we developed it, so there were a couple of things. One, for the users of InfluxDB one, they frequently had requests that they just wanted to express more arbitrary and complex logic in their queries, than could be expressed in InfluxQL or indeed in in SQL. So we're just like, we need essentially, like, we need, like, a scripting language paired with a query language
35:52 The Future of Flux
36:30 so that people could do more complex things. And it became like, oh, you can use this also to connect to third party systems, to join with your time series data on the fly, or to send data out to third party systems, right, be it another database or a third party API or whatever. Right? So that was that was part of it. Another part of it was I had a thesis back in whatever. Like, I mean, I originally, in the fall of twenty fourteen, I was talking about changing language from InfluxQL to potentially something that looked
37:03 more like a functional language, which is what Flux is. And I decided at that time to stay with InfluxQL. But I I had a theory that, like, oh, I think the for the time series use case, a functional language would be better, would be more, like, expressive and more powerful for working with time series data. And Flux was our attempt at that. Right? And, I think so we built, obviously, the Flux language, the scripting language, the query engine, the planner, the optimizer, everything from the ground up, which is a very large like, that's basically, like, two very large separate
37:42 projects. So, and all of that is written in Go. Now coming to three dot o, we thought, okay. We need a way to to bridge Flux, and we also wanna see if we can bring over InfluxQL. And within FluxQL, we had the idea, well, it looks like SQL. So what we what we can probably do is write a parser in Rust to parse the language, which isn't too hard to create, that will then translate an InfluxQL query into a data fusion query plan. Right? So we had, one person start on that in the summer of last year.
38:22 And, basically, now, you know, this year, we have that actually works, and it works really, really well. Right? Because and it was basically just one person who did a lot of that effort. Last fall, we added on additional people onto the team to build the last, like, bit, which is, like, API bridge, which is, you know, to represent all of that with the InfluxDB one query API. So we're really happy we were able to bring that over, and it gets all the benefits of that data fusion query engine. Right? So when there there are performance optimizations
38:53 or other things like that, they just come they when we pull that in, InfluxQL gets all of that for free. Now with Flux, what we tried to do was, like, the it was, like, the surface area of it is way, way too large to try and, like, create it again in Rust, although I would love to do that. It's just it's just, like, way way too big of a project to do, and we don't we don't have the the time or resources to do it. So what we did was, in our cloud two platform, we had, you know, Flux
39:30 processes that communicated with the old TSM storage engine via this gRPC API. So what we did was in in three dot o, we created that gRPC API, and we connected Flux up to it. And what we found through, like, some production mirroring and actually letting customers test it was, one, that gRPC API was, like, kinda it had some edge cases that were poorly specified. So there were weird bugs that would pop up that we just unforeseen bugs, like, that would surface in the Flux query, things that worked on the previous one that don't work on this.
40:09 But more importantly, the performance of that bridge was not good. Right? It basically, there were there were queries that will work in the old Flux to TSM version that worked in a decent amount of time and queries on the Flux to three dot o bridge that just, like, timed out. And, again, like, one of the things we're trying to do with three dot o, 1 of the reasons we adopted this new query engine is because we wanted queries to be super, super fast. Right? Query performance was a super important thing, and a lot of Flux queries, like, we
40:43 saw there were queries that just wouldn't work in that engine. So when we created this bridge, because of the way that API works, it's literally it's built around how data is organized in that TSM storage engine. And the three dot o engine does not have the same organization. So in order to present the data in that organization, the three dot o engine has to do a lot of, like, post hoc sorting and filtering. And that that sorting is basically you chew up CPU time doing that. And, basically, the it wasn't performance. So, like, right now, we have a a theory
41:22 that there would be a way to update the Flux engine so that it uses the three dot o native API, which is basically FlightSQL, and that that the Flux engine could do the work itself to reorder the data in the way it expects it to be ordered. And maybe that would be performing enough. But for the time being, we're not focused on that. We're still focused on the core query engine, which means InfluxQL and SQL, and adding capabilities and performance to it that that like, right now, it's already faster than one dot o at a number of queries, but there's
42:00 still some queries that's definitely not faster with. And we want to improve those situations and spend our our time on that for now and then see, you know, see later, what we can do with Flux. We are people in the community have expressed interest in actually self organizing to do that work. So we've actually created a separate community fork of Flux, that we're gonna be pointing people to. And that fork will be a place where people can collaborate on this idea without the thing is we can't do it in our primary branch because we run this in
42:14 Flux Community Fork and Language Polarization
42:38 production, in our cloud environment. It's just too difficult to try and, like, pull these changes in as people like, we need to give people the ability to, like, iterate on their own without having to go through our production pipeline. So that's the idea is in three dot o, InfluxQL and SQL are native and supported. Flux, we're still trying to figure out. What we I mean, I will say a separate thing about Flux, which is maybe not obvious to people. Or, for some people, maybe it is. But, the language is highly polarizing. It's a new language, and a lot of
43:14 people are like, I don't wanna learn your stupid language. And I get that. I did not really get that six years ago, but I get it now. And so there's basically, what we found is that a lot of people just didn't wanna pick up the language. They wanted to work with something they already know. And, again, with InfluxQL, it's a different language, but the thing is it looks like SQL. It feels it feels like an old friend. Like, you you know it. So you can pick it up without having to do too many things. But with Flux, it was a serious,
43:53 adoption blocker for a lot of people. But then on the other side of this, there is a slice of people who took the time to learn Flux, and they absolutely love it because they can do things in that language that they could not do in SQL or in InfluxQL. And I think that is kind of a testament to the, like, the reason why we built the language because we wanted something that enabled arbitrary processing inside the core of the database. So, again, it's like one of those things where depending on who you are and how you
44:26 look at it, Flux is either, like, a great thing, and it's like, we need to keep pushing this forward, or it's why these guys build this language just doesn't make any sense. It's it's tricky. Alright. Awesome. Thank you. Did you have a question, Laura? No. No. Just random commentary. Yeah. We're running out of time very fast here, so let's finish up with something a bit a bit different. Recently, HashiCorp announced their license change to the best the Bustle license. And you posted some thoughts on Twitter saying that it kinda gave you a better pause for thought on what source available in open
44:54 Open Source Licensing and Vendor Motivations
45:05 source is or the future. And I'd love for you just to talk about that and maybe even bring it into the context of what's the future for the license on InfluxDB three. Yeah. So so I basically, in my mind, open source, the the the bustle, the community licenses, all those things, those are not open source. Those are basically the new version of shareware or commercial freemium software. Right? It's commercial software. And if you and frequently, they will offer that software to you to use for free under certain conditions. And if you are one of the people
45:46 who meet those conditions, maybe you're happy and you'll be able to use it. Right? Tons of people continue to use MongoDB, the SSPL version. Tons of people use Elastic still after they change their license, Redis, Confluence. Like, they're you know, literally every single infrastructure open source creator has changed their license over the last six years. Like, the I there isn't one I can think of who hasn't, except for us, maybe. So, and I I totally get the motivations to do that. But in my mind, like, open source, I actually don't like copy left licenses. I like AGPL, GPL. I don't consider them
46:27 to be real open source. To me, open source is really about freedom, freedom to do what I want with that code. And if you put any sort of restriction on it, which a copyleft license does have restrictions on it, then that's, you know, restricting my freedom. To me, open source is about freedom. So and, again, like, I think for a company that's producing open source code, you have to be okay with the fact that people are going to do anything with that code, including up to and including competing with you. And if you are not okay with that,
47:02 you should not be putting that code out as open source because that is what's going on. Right? And, generally, the best thing for a company to do that's building products and stuff like that is to only put open source code in something that they wish to have, become a commodity. Right? So the operating system the server operating system that your, you know, your your servers run on, you want that to be a commodity. Many times, want the database to be a commodity. You want these these core infrastructure components that are essentially not part of the value
47:36 that you deliver to your customers. Essentially, you want them to be commodity so you don't have to pay a lot extra for those things. Right? You want the price of those things to be driven down as low as possible. But if you are a vendor, the thing you sell, you don't want that to be a commodity. You want to be able to sell it for the highest possible price. And the so the the problem that vendors have that create a project where their primary monetization path is essentially that project plus something or whatever. It becomes like, well, we're putting
48:13 all this effort into the open source thing, and a bunch of people are using it for free. And there are a bunch of freeloaders, and there are a bunch of, like, competitors who are taking our stuff. And then they decide to change the license. And the the problem I have with so there there are multiple pieces to this. One is, as the creator of a project, if I want it to get the broadest possible adoption, I want more people to use it. I'm incentivized to have that project be permissively licensed. Right? Get all thing all other all
48:25 Industry Distrust from License Changes
48:44 else being equal, a developer or user looking at a piece of software. Literally, if everything else is the same and the difference is a commercial license or a permissive Apache or MIT license, they're going to pick the permissive one. Right? There's no reason not to because you get a bunch of other stuff. So they choose that. But the problem now that I think the HashiCorp thing kind of highlights is it's just yet another vendor in a long list of vendors over the last six years who've changed their licenses. And now, basically, it's causing a a lot
49:22 of distrust in the developer community because they see, like, oh, here's a new open source project, and they look like, is that project by a VC backed company, right, or VC backed startup? If it is, they're gonna be like, well, it's an open source project. Yeah. But I don't believe that it's gonna continue to be open source, which is a totally valid thing to think given how things have been going over the last six years. So, you know, previously, again, my thesis, if you want broadest pop possible adoption, put it in a permissive license. That
49:57 idea is kind of getting damaged by the fact that people keep changing their license from permissive to something commercial. Right? And I don't know if there's a solution to that. I do think separately, I think that that HashiCorp probably made an error here. Like, the thing is the license change only protects forward commits. Right? You can't retroactively change a license. They so they could have just as easily put all their development effort into a closed source private fork and then made the open source piece a downstream, you know, dependency of that closed source fork, and then not tell anybody about it. Right?
50:43 Just be like, yeah. We're just gonna do this or whatever. Or conversely, they could have just said, we're gonna donate this thing into a foundation, but all of our developers are gonna be working on this closed source thing. Right? The effects would have been the same in terms of the end outcome, right, because you have this open Terraform fork. But the difference in everybody's perception of HashiCorp as a company would have been dramatically different. Right? The the the business and commercial effect would have been exactly the same, which is all their r and d, all their development
51:18 tokens are going only into the commercial software, which is fine. That is their right. If they want to do that, they should totally do that. And it's also fine if they wanna change their license. That is totally permissible. Right? There's no just because you create something open source one time doesn't mean you owe it to the world to continue for the rest of your life to put more and more into it. Right? But I think there were more graceful ways to handle it that could have delivered the same business outcome for them at the very least.
51:51 Alright. Let's extend that question by one sentence and just say, like, does the fear that people have now and rightly so about single vendor VC backed open source projects not always being open source, Does that mean that InfluxDB three could be an Apache foundation project in the future, or are you not worried about the perception or the fear of people adopting it because it's a VC backed single vendor project? So I do not think we will put InfluxDB three into a foundation. My goal is to have a permissible license project. But the truth is, like, I don't the
51:54 InfluxDB 3.0 Licensing Future and Foundations
52:31 problem with foundation projects is the the bar is usually too high. Like, InfluxDB doesn't meet the bar for most foundations. Right? There aren't multiple companies contributing to InfluxDB's, you know, version one, two, or three. Right? We're the ones developing it alone. So it doesn't really hit the level of a foundation. Terraform, you certainly could have put it into a foundation. I'm certain a number of foundations would have taken it gladly as a as a project, as a top level project. But InfluxDB doesn't hit that. It's not it's not on the same level in terms of
53:07 the contribution and, like, all this other stuff. There's that. I mean, I think I I I don't know. Like, for broadly for the for the community and building trust with people and stuff like that, I don't know if there's a solution. Right? I think the license itself matters because if it's Apache two or MIT, then, you know, people can do whatever they want with that software for all time as long for for that point. But maybe there's a model where you can commit to, you know, transitioning into open governance over some period of time. I think early on
53:48 in a project, open governance is actually more of a hindrance than, you know, a benefit because you you want, like, a small tight knit group of people that are driving the project forward. But yeah. I don't know. It's tricky. There are so many things I wanna say and argue about here, but I know we're out of time. I I think the one the one thing, I guess, I I will say is the one little challenge, and I'll I'll just kinda leave it here, I guess. Red Hat tried the we're gonna switch the order of things and
54:22 make the downstream the open source part. And I would argue that the community really doesn't like that. So I don't know. There there's a lot of different ways. I admit that I am a huge fan of the AGPL v three, but maybe that's why I I will argue it. But we unfortunately don't have time, and I want to really badly. To be continued. To be continued. That's correct. Yeah. We could have an entire session on open source licensing. I could talk about that for, like, hours. Dude, this this should be so much fun. Anyway Yeah. Well, we don't wanna keep you any
54:57 Conclusion and Wrap-up
54:59 longer. So feel free to give us thirty seconds if you wish just to tell people where they can learn more about Influx, follow you on Twitter, anything like that. Feel free to shamelessly plug in if you wish. On Twitter, I'm at politics.Influx or InfluxData.com or InfluxDB.com. You can find us. 3.o, we have available, as a multi tenant cloud product, as a dedicated cloud product. And soon, open source builds or builds of 3.0 will be available, but we're not again, we're focused on our commercial offerings at the moment for obvious reasons. But, yeah. Alright. Thank you very much. Thanks for
55:00 Shameless Plugs
55:38 joining us. If you wanna keep up with us, consider subscribing to the podcast on your favorite podcasting app or even go to cloudnativecompass.fm. And if you want us to talk with someone specific or cover a specific topic, reach out to us on any social media platform. Until next time when exploring the cloud native landscape on 3. On 3. 1, 2, 3. Forget your compass. Don't forget your compass.
Technologies featured
Meet the Cast
Stay ahead in cloud native
Tutorials, deep dives, and curated events. No fluff.
Comments