Overview

About this video

What You'll Learn

  1. Create Quickwit indexes and tailor mappings for logs and traces
  2. Ingest OpenTelemetry logs and traces through Kubernetes collectors and indexers
  3. Query data in the UI, CLI, Grafana plugin, and Jaeger integration

Francois Massot walks through Quickwit hands-on: spinning up an index, ingesting logs and OpenTelemetry traces on Kubernetes, querying via the UI and CLI, and exploring the Grafana plugin plus Jaeger integration over 500M+ spans.

Chapters

Jump to a chapter

  1. 1:14 Introduction
  2. 1:50 What is Quickwit and Why it was Built
  3. 4:45 Architecture and Use Cases
  4. 8:00 Developing Quickwit in Rust
  5. 11:17 Getting Started: Hands-on Introduction
  6. 12:20 Downloading and Running Quickwit
  7. 14:45 Exploring the Quickwit UI
  8. 16:26 Quickwit Concepts: Indices, Mapping, Retention
  9. 23:51 Creating a Quickwit Index
  10. 29:16 Ingesting Data
  11. 30:46 Searching Data (CLI & UI)
  12. 31:31 Viewer Question: Cost & Observability
  13. 34:46 Exploring the API via the UI
  14. 37:34 Deleting an Index
  15. 38:10 Quickwit Cluster Architecture
  16. 42:45 Data Storage and Caching Explained
  17. 44:23 Beginning the Quickwit Demo
  18. 44:41 Demo Environment: Kubernetes Cluster
  19. 46:26 Working with Large Datasets (500M+ Spans)
  20. 47:29 Searching the Demo Data
  21. 49:05 Monitoring Indexing Throughput (Grafana)
  22. 49:59 Ingesting High-Volume Data
  23. 52:17 Scaling the Indexers
  24. 57:44 Exploring the Quickwit Grafana Plugin
  25. 1:03:00 Visualizing Traces (Jaeger Integration)
  26. 1:04:51 Aggregations and Analytics
  27. 1:06:56 Query Language Capabilities
  28. 1:09:05 Querying Nested Data
  29. 1:11:46 Future Development and Roadmap
  30. 1:12:00 Preview: Application Monitoring Dashboard
  31. 1:16:12 Viewer Question: Query-Time Parsing
  32. 1:18:28 Conclusion and Wrap-up
Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

1:14 Introduction

1:14 Hello, and welcome back to the Rawkode Academy. I'm your host, David Flanagan, also known across the Internet as Rawkode. And today is another episode of Rawkode live where we explore really cool cloud native and Kubernetes adjacent technologies that will hopefully help make your developer and operator lives a little bit easier. Today, we're taking a look at a search and analytics engine that promises, maybe that's a strong word, we'll see what Francois says, But promises sub second latency, and it's obviously written in Rust, which obviously makes it even more exciting for me to take a look at today.

1:47 So let's jump over and meet today's guest, Francois. Hello. Thank you for joining me. Hello, David. Please just take a moment and share a little bit more information about you with the audience. Okay. Hello, David. Hello, everyone. Thank you very much for inviting me. I'm very happy, like, to talk about Quickwit, obviously. So what is Quickwit? That's a company and a search engine. We started it three years ago. I'm one of the cofounder, and we are we started it at free with Paul and Adrian. Paul is in lives in Tokyo. Adrian is in New York and living in Paris.

1:50 What is Quickwit and Why it was Built

2:26 And we are all engineers, and we we heard a lot of complaints about Elasticsearch speeds, especially for big datasets and especially in the observability space. And it happens that one of my cofounder, Paul, created many years ago a library called written in Rust. And this is, like, an equivalent of Lucene in the Java world. So it's a search library, but you it's not a fully distributed search engine. But at least you have the data structure in it to make search fast. And this library was really fast three years ago already. And, like, when discussing with Adrian and Paul, what we were

3:19 looking at at the space, and we we said to ourselves, okay. We can we can do something better. Maybe. And, I mean, a search engine distributed that can be way, way faster or way more efficient than Elasticsearch. And when you when you start something like this, you you you want to be at least 10 times 10 times cheaper or 10 times faster, maybe 100 times if possible. And that was our goal. And okay. So three years after, I'm here to talk about it. We just released version two version zero dot seven two weeks ago, and we have some we we are starting

4:09 to have our big users that are ingesting hundreds of terabyte per days of logs, and it's really, really efficient. They they divide it by five, their compute cost, by by the same number of their storage cost their their storage cost. So we're we're very happy for them, and we think that we can bring, like, a lot of value in the space of of the space. I'm here to talk about it. Awesome. Thank you so much. Let's cover a couple of things that you've mentioned then. I've I've been a developer for a long a long time, over twenty years now. And

4:45 Architecture and Use Cases

4:53 everybody's always said Elasticsearch is amazing, but everybody hates. And by everybody I'm gonna say I and hopefully others. We hated the fact that it was on the JVM. It's notoriously difficult to to operate. It consumes a lot of memory and CPU. It doesn't run well in containers and on Kubernetes. But people always said it would be impossible to build it in any other language because of Lucene. And what you're saying now is that we it's not impossible. We have Lucene available in another language, Rust. I think there was even a Go one as well at one point

5:23 and but I'm not sure what happened to it. So that's that's awesome right off the bat. Right? Just the fact that this library now exists and allows people to be free of the JVM, Elasticsearch, and other things and still get that power that we get from that type of search is awesome. Wonderful. You then spoke a little bit about you and your team building on Quickwit and sub seconds is something we've said three times now. You know, why or like what are the use cases people could be doing? You've you've mentioned observability. Right? But Elasticsearch is obviously very broad in

5:55 this application. Is that true for Quickwit as well? What do you see people using it for? Yeah. So that's that's a very good point. So we chose to a subset of all the search space. The search space is huge, especially today with all the AI stuff. So we chose a subset, like, the observability space, but it's not only. It's append only datasets. It's coming from a choice that we of architecture. So very early on, we chose we chose to totally change the architecture of search of our the search engine. Like, Elasticsearch is based on a shared mapping

6:37 architecture. So all the data is on the disk, on the local disk. It's very fast. As you as you said, it's it's a powerful engine. It's many many it has a a large community. It's a great it's a great tool, really. But when you want to scale, when you want to, like, separate concerns, compute like, indexing stuff versus search stuff, it's really nice to start separating compute storage. And that's that's the starting point of Quickwit. We wanted to decouple compute storage, to decouple indexing and search, and, like, seamlessly scale from one to 50 nodes for indexing and for such. Like,

7:25 in a a couple of seconds, you should be able to do it without impacting anything, without struggling with yellow or head cluster. I I used a lot Elasticsearch before, and it was horrible to operate. I'm sure they improved that today, but at that time, it was really horrible. And you don't have this kind of mess when you take a. You start you you that is safe on something like s three. You don't have any issue with that with that. Nice. So I'm curious then. What your background before you started working on on Quickwit, were you already a Rust developer, or

8:00 Developing Quickwit in Rust

8:05 is this your first exploration of writing Rust code? So for me, it was my first experience. I was mainly a Python programmer and Java programmer before. Luckily for us, my cofounder, Paul, who created Tonti, was a very good REST developer. So he let he led us the the right way to code in Rust, and so it was awesome because we improved very fast. Like, the team the team was able to improve their lot their skills. So that was really precious to have him in the team, of course. But it was my first time in Rust,

8:43 and it was really painful at the beginning. The the first three months were very hard, and then it was a revelation. Like, no bug in production. We we we at the very beginning of Quickwit, we made a proof of concept of Quickwit, the idea of separating compute and storage for search. And it was based on the common core demo. So common core is a dataset with billions of web pages, and we index everything and put the data the index data on s three. And you can go on common like, we have a commoncall.Quickwit.i0 website where you can just search few things,

9:26 and we will return a word cloud of the most probable combination of words. So if you start typing Obama is, you we will give you, like, the all the most common adjectives for Obama is something. And yeah. So and there was no bug. Yes. There was one bug, but it was in the Python code because we we have to use Python for some kind of an analytic language analytics because we we were analyzing if we were retrieving nouns or adjectives. Mhmm. And the bug was there, but not in our code. So it was really a revelation, really. So I'm very happy to

10:11 code in Rust today. Yeah. Me too. I I pretty much write everything in Rust by default now or TypeScript if I have to do any sort of web stuff. But but even though I have been writing Rust code for a while now, there's always that weird lifetime bug or that I just don't understand. Or when we get nested generics, I end up just pulling my hair out. And I I really actually spend more time going deeper with these things, but I just I've not had to deal with them often enough to force myself to just learn it

10:40 properly. I I'm in the same situation. This kind of lifetime issues or trade issues are really that's horrible. I think it's a it's more a problem of it's two variables. You have too many error message is too complex. So maybe voice team can improve that. I hope one day they they would do. Yeah. The minute you just you I don't wanna go into the weeds of rest for everyone watching, but, you know, once you start having, like, dynamic traits passed around with late times, it's it's it's so painful. But that's a lesson from another day.

11:15 Alright. Well, thank you for sharing all of that. I'm really excited about this project. I can't wait to actually see it and experiment with it and play with it because I think that everyone who has anything running in production has a use case that Quickwit can come in and help them with, and I wanna try and show that to people today. So the format for today's episode is we're gonna run through the getting started gate like we always do. And then Francois has a demo where he's gonna show us a bit more of a deeper use case for Quickwit.

11:17 Getting Started: Hands-on Introduction

11:41 Please remember, if you have any questions whatsoever, you can drop them into the comments. And that doesn't just mean now as we are live. If you're watching this after the fact, keep those comments coming, and I will always pass them on to our guest to hopefully get you the answer. No pressure, Francois, but that means for the next twenty four months, you always have to answer my emails and a timely response. That's alright. Alright. Here is my terminal. I also have a browser where we have the Quickwit homepage. So if anyone wants to find this in their own

12:13 time, you can find us at quickwit.io. We're gonna go straight to the quick start guide. Nice. So we're just gonna curl bash which I'm always happy to do. Don't you worry. Doesn't scare me whatsoever. And this is gonna get us which I assume is just a static binary because you've written this thing in Rust and we should be able to get started right away. That's one of the other nice things. Like, when you get a JVM artifact, a jar file, it's just always like, well, what do I do with this? Awesome. I'm gonna move this to here.

12:20 Downloading and Running Quickwit

12:56 Two mod plus x Quickwit and then run. Okay. So there's also docker image, which is nice. Is there a Helm chart? Just out of curiosity. Like, if I wanna ship this into production, is Helm charts the best way for people to you know, if anyone watches this and goes, wow. That's amazing. I want it. Helm chart. Is that what they do first? Yes. So there there is a a ham chart. And, like, I I will make a demo on it. So and it's available on the GitHub repository. Alright. Awesome. So in order to get this Powerhouse of

13:41 a search engine running, we do Quickwit run. I don't need to talk. So I gotta work this out easy. Uh-oh. Yeah. I was expecting that. So as as you move, like, as you move, like, the binary outside the the directory, you need to specify you can specify, like, the config file. It it needs to find a a config file. That that's it. Oh, that's alright. I'll I'll take the ownership of that one. So okay. I didn't realize we had a Quickwit data directory, and I don't have my alias configured. Let's just do this. We have a config file. So I'm gonna

14:21 take a really look at that. I'm always curious about configuration. We got YAML. And Vexors and that's it. Okay. We'll maybe take a look at that in a bit more detail later when we start asking some of the questions about what's actually happening with the system here. But if we run this as it is, we get a whole bunch of log output. This output looks like the tracing library. Would that be Exactly. Yes. We are always using it a lot. Yeah. I think everybody uses the tracing library for for output these days. It's it's it's wonderful.

14:45 Exploring the Quickwit UI

15:01 Alright. Do I have a search engine? So what you can open yes. You can open your web browser. Like, you can you can click on Oh, yeah. Just above there is a link if you want. Alright. So we just got a little bit of There is a solution. There is also a small user interface, so it can be nice to just hit the root URL. Yeah. That's it. Nice. Wasn't expecting that, but I'm happy. So you can you can get the list of indexes on the left and here. So out the box then, Quickwit is sending

15:43 its own logs to itself. Is that correct? Not by default. So what you can do very simply is to send the traces. Like, traces are more powerful, so we so if you use, like, two environment variables, you can send tray Quickwit traces into Quickwit. But that's very common setup. I can give you the environment variables if you want. But so that you can have something already in your Oh, so it just creates the indexes up front, but there's not actually any any data at the moment. Exactly. Or indices, I guess, I should say. Correct myself.

16:23 Although it says the index is here, I'll just take it. We'll go with that. Alright. So let's cover this up with the terminology, and I'm sure this is covered in the docs. But, you know, let's go off script a little bit. We pulled up this UI. We've got the query editor, but we don't have any data yet. So we have cluster information and config here, and we can see notes. So this thing is obviously horizontally scalable. We'll talk about that later. We've got clusters, node info. We've got the API documentation, which is always nice to have embedded, and we

16:26 Quickwit Concepts: Indices, Mapping, Retention

16:51 have indices and indexes. So what is the vocabulary that people need to understand when they're working with Quickwit? What is it index in a specific implementation? Yeah. So like in Elasticsearch, you define you you don't have tables like in SQL. You create your index and you put, like, your JSON documents in it. That's it. And for each index, you have to define a mapping. So if you if you click, for example, on the trace index here, you have a small summary, and then you can go to the dot mapping tab where you will see that we will prepare

17:33 fine, like, all the fields needed for a trace model. We base this mapping on the open telemetry model because it's commonly used. We did the same for logs. And here, you have a bunch of fields, like, the same way as Elasticsearch is a bit different. There is one big difference on the mapping part because one thing where Quickwit is quite powerful is on the part. You don't you don't have to define your schema. You can just define one timestamp field, and everything can be dynamic. And you don't you you won't have the issues like conflicting types, for example, because we we distinguish

18:21 in the in our database. We distinguish, like, integers, strings. If if there are two type different types, that's fine. We can agree do aggregations on different types, so there's no there's no problem, no conflicts. So it can be very flexible for it and especially for in the log space where you don't know your schema, it's it's pretty efficient. So here for OpenTelemetry, you have the full mapping, can be useful. And for some mapping, for example, if you look at attributes, like, resource attributes, if you go down in you just have to go down in the

18:57 map. You if you look for resource attributes, you will see that it's a JSON field. And the fact that if it's JSON, you can put everything in it. Like, for example, links, it's an array of JSON. And you can put every field in it, and you will be able to search through them. Oh, very nice. Alright. Yeah. I was gonna ask about that if I've ever been had to have a schema. So I like the the flexibility that it can just pull in stuff. Can I mark an index as strict and that it won't accept anything that doesn't directly

19:31 Exactly? Okay. Yeah. Exactly. You you you have this strict mode to avoid. Oh. You're a big a big mess. Alright. Awesome. Nice. Alright. We have a comment saying hello. First time catching one of these live. Well, thank you for joining us. If you're saying if you want to see with Quickwit or you have a question, please feel free to follow-up in the comments. Alright. So we got index settings, search settings, retention, splits, and sources. So with retention settings, I'm assuming we can configure the time to live on the record so we can auto expire them

20:07 after twenty four hours, four days, whatever. Would that be correct? Yes. That's more or less correct. What we do so we now we I need to enter a bit how Quickwit index things to explain this. When you push documents into Quickwit, Quickwit will create what we call splits. So it's a small piece of index, basically. In the it's independent, and Quickwit will just create a bunch of splits, maybe a few hundreds or thousands. And, like, it's a big equivalent of segment in Elasticsearch. And and Quickwit will also do some merging to make search more efficient. So if you

20:53 have 10 small split, it will merge them into one. And what people generally do is they will just remove those splits if they are older than thirty days. So we don't use it's not a per document retention setting. It's a split it's on on the split. So it's really efficient. We don't have to rebuild the entire split if we want to remove only 10 documents, for example. And generally, it's sufficient because you would have a few split per days. And once once they are over older than thirty days, it will be deleted. They will be deleted.

21:40 Okay. Are there, you know, any cardinality limits or concerns when we start to index things within Quickwit? Like, if I have a a million or 10,000,000 or a billion series, like, have you stress tested it to that kind of level? Billion what? Bit documents? Yeah. So when typically, when you when you index stuff, right, you're you're indexing on certain properties within the document. And then if I write one document where there's a a value of country and then I insert a thousand countries, we have a cardinality of a Okay. Okay. And then if that explodes, then the query

22:24 performance can essentially be crippled because the cardinality is too high for the indexes to optimize for. Is that something that's been challenged or fixed with with Quickwit? So, generally, with search engine, the nice thing is that you have those inverted index indexes. So it's really, searching to find the needle in the stack is working pretty well. What's happening is that if you have a high cardinality fields, indexing is a bit slower. You will see that it is it will get it will get slower because we will like, the index would be bigger, so you need to allocate more more memory for that.

23:06 But it will work really well. Like, I tested, like, I tested with fields, with millions with the cardinality in millions, more than millions, 10 millions, and, like, there is no issue doing that. Like, a search engine is can handle this pretty pretty efficiently. Yeah. And I think the reason I asked is that, obviously, we have got the index settings, and then we've got the retentions. And if I've got a low retention threshold where the index has to be purged regularly with a high cardinality, I assume those could be conflicting. But I don't wanna create problems that don't

23:46 exist, so we'll come back for that later on or another time. I think we should create an index and write some data, which I'll assume is where this is going to go. Yeah. So in order to create an index, we need to send an index con oh, we have to download an index config first. Alright. Okay. Which I'm assuming is just this one here. So let's copy this and get a new tab. I'm gonna call this index config dot YAML. And all this is doing is configuring some fields, so title body creation. It's an RFC3239

23:51 Creating a Quickwit Index

24:29 for the dates. And then we've got the precision sentence and a second. I'm assuming we can change that to milliseconds, nanoseconds. What what's supported here under precision? Yeah. So so so so we store, like, we we you have different storage here because it's it's written fast precision, and it's so we need you need to understand that you have different storage parked in Quickwit. So there is the document storage where we store, like, the data as it is given, or there is an option to ignore and not store in the docs documents. Sorry. So it's it's a kind of host store. So

25:11 if you have a document ID, we will be able to fetch, like, the old document from from the the database. And then we have also columnar storage, but we can we we call that fast fields. Mhmm. And in this case, depending on your data, you don't it's not necessary to to keep, like, the nanosecond precision, for example. And being at seconds level, it can be sufficient if you just want to filter by seconds. So it depends on your use use case. Generally, for upsides the use case, we use milliseconds per sessions. Okay. Nice. I see that we're using the default tokenizer

25:58 for the title and the body. What other tokenizers exist within Quickwit? Yeah. So the default tokenizer is rather simple, and it it's mainly for log stuff. Mhmm. And what you will do, we just split, like, every time it enters a space or comma or, like, a punctuation mark. It will just split the text. You can you you have an English tokenizer where you have some stemming. You have also some we have a multilanguage tokenizer too. You have an tokenizer. You have a a bunch of of tokenizer. You can come you can also build your own tokenizer your own tokenizer if you want.

26:47 And for logs and, generally, you don't want to make things complex. You want to keep things very, very simple, so to be fast at indexing. Alright. Let's get us an index then. So let's copy this. So that's it's just using the Quickwit command, but there is a current post issue, which is cool. So let's just run. Oh, I'm not in my directory. Alright. Let's do this split. That keeps me in my directory. Now where did I put that fail though? That's a good question. Cool. Alright. So we're going to ask Quickwit to create our new index using this config.

27:38 Let's jump back to the user interface, And now we have our stack overflow index. Nice. So this this now has like, I did notice on our YAML that we applied, we didn't set the mode. So the default mode is dynamic, but you can you can set it to strict. So I'm I'm I'm very curious now because now that we have an index, if I come in and change the index, can I reapply it over the top? Currently, we don't support updates. Okay. So it's it's on the road map, but you have to to delete it right now.

28:24 I well, I I I don't need to. I was just curious because I like to ask questions. But I'm assuming if we remove the title in the body because the default mode is dynamic, I could apply this, and it would essentially be the same. Is that a fair assumption? So you need to understand, like, the default settings of a dynamic mode. And the dynamic mode, we want to organize the text by default. Right. So that's why, like, usually, what we do when you you identify, like, the few the fields where you need to to tokenize the text, and then everything else is

29:06 is dynamic. Okay. Yeah. I guess that definitely makes sense. So alright. Now we're gonna add some documents. So let's download our Stack Overflow post. And I won't bother looking at this JSON. Let's just get it ingested right away. What's the force for here? Out of curiosity. Yeah. Like, as it has we have the same mechanism as in Elasticsearch is, generally, if you want to be facet indexing, you have to batch things. So batching, like, you can wait twenty seconds or thirty seconds before really committing and writing the data structure on the disk. For this index, by default, it's ten seconds.

29:16 Ingesting Data

29:59 Like, we commit every ten seconds. It can be five seconds, but under, you will see some performance. It will be less performance, basically. And so here, force is just saying to Quickwit. Okay. Just commit immediately so you can search it immediately. Nice. You don't have to wait ten seconds. Alright. Let's try throwing this reject here When we get documents back. Very cool. I'm assuming I could just replicate that search on our user interface. So let's pull this up. Search and engine. Run. Nice. I like it. So that it's even things are just simple. It just work. It's like it's just nice.

30:46 Searching Data (CLI & UI)

31:08 It's nice that that's all I have to do. Very cool. Alright. Let's see. So and I want us to understand that we can okay. So this is now getting into how to use fields type searches. This is explicitly looking on the title. So let's look at this one here. Yes. I assume there is an interesting question for from a some from Blackmob band. Sorry? There is an I I saw that there is an into interesting question for Blackmob. Right. Was too busy to play with Quickwit. Alright. Let's see. Okay. From a cost perspective, it seems like traces and logs play a

31:31 Viewer Question: Cost & Observability

31:56 special role. Is this actually the case? Yeah. Sure. Like, he's right. So it's and, like, the the the basic example is to is that Quickwit has some OpenTelemetry compatibility. Like, we gRPC we we we are compatible Quickwit is compatible with gRPC open telemetry app API, HTTP API too. So it's really easy to send your logs directly into it. And I I would talk about it in my demo, so I don't don't want to talk too much of a right now. Yeah. So I think, you know, based on some of the things we've covered so far, and

32:42 I don't wanna put words in your mouth for anything like that. Right? But we did talk about how the use cases at the start. I don't know. And I and I said, are there more beyond the observability use case? Like, from what I've seen so far, I mean, if I really wanted to, I could throw the entire IMDB database into this and search for movies and actors, and it would work just fine based on what I wanna do. So it's just one of those yeah. It's a search engine where you're just you're focusing on a particular use case, but

33:08 it does adapt very well, I think, to other use cases. Would you agree with that? Yep. Alright. Of course. Now there's a challenge here already, and I don't know if this is part of the dogs or not. Right? But we have title body creation date, and there's no title or question in these imports. That's true. That's fair. That's So I mean, the next stage was to do a title and that this should return. Oh, it did get one. One question had it. Yeah. Oh, because you I I guess some documents don't have titles, I guess. That's it. Okay. Oh, okay. Right.

33:52 So now I don't want to find this, but I wanna find that question another way. So I'm gonna switch to body and this time we will do a n t l r just because I see that on the thing. Oh, we got three questions. K. Cool. With this being the first one and then this one. Would this also work for integers? So if I do one six nine three. Yeah. Cool. Very nice. Let's see if user one zero one has answered more than one question. Yeah. Alright. I'm having fun here. This is nice. So okay. So we executed a query and

34:40 all of this could be done with Carl, with a JSON payload. That makes sense. I'm curious. Can I if I come here, let's say I wanna create an index? Can I just submit from here? Is it, like, one of those? No. Okay. It just shows me. K. You can Oh, yeah. I can. Yeah. I'm not using it, but I I people use use that. I'm only using this to discover the endpoints and then that's it. Yeah. Alright. Let's create an index this way. I I don't know. I just I like pushing buttons and see what happens.

34:46 Exploring the API via the UI

35:20 And this does have an override flag, actually. Or maybe I can update or maybe it's not an update. Does it do a delete and a create? That's a good question. So here, I'll write let me think of it. So it creates an index. Yeah. Exactly. So it would it would just delete the previous one before creating. So that's it. Okay. So let's create an index where I can see the mapping. We've got a modal lenient. I'm not sure how that's different from dynamic, but maybe we could talk about that. This one is using the multiline tokenizer

35:56 and index. Okay. So I have to fill in some of these. Let's just call this Rawkode. I mean, I guess this isn't actually a valid document. Right? I'd have to change quite a bit here. Yeah. What what you can do is just drop almost everything. Keep the if you really want to to do it, the best would be dropping, like, most of the part. I think what I'll I'll just copy this one and make a few tweaks. Like, I don't actually need to Oh, wait. This is no YAML. Maybe you can change, like, the request body

36:34 type. Maybe you can change application to this is there is a Ah, yeah. Yeah. Yeah. Yeah. Okay. So it's it's it's not available. Right? So let's let's K. Good good luck with that. There isn't an application YAML, though, is there? Either maybe there is. Maybe, like, text YAML or something. Let's just try and break it. What's the worst that can happen? Right? So we already have Stack Overflow, so we'll call this Rawkode. And Will it work? Okay. It is generating, yes, for bad stuff. Right? Yeah. Okay. Yeah. But it has given me the curl command,

37:24 so I could adapt the content type that way. But it's it's not important. I just I see something. I need to click it or play with it. So alright. So now we could delete our index like so. Run this. Let's see. Yes. And then we get a t l d r with the commands and stuff like that. So that this is really cool. It's super easy to work with. We got an issue added to the search. All the API documentation is embedded into the UI, which is awesome. I mean, really, for anyone watching this, it's

37:34 Deleting an Index

38:02 like, that's interesting. I wanna play with it. You've made it super simple, and that's fantastic. So good job. Quickwit team. Thank you. Now there's obviously a whole bunch of configuration. There's clustering that's available. I think you're gonna be doing the data with doing a demo with slightly more data, which is really cool. We already talked about deployments. You're gonna show the Kubernetes one. So I guess the last question from my side before we kinda talk about your demo and, actually, I'll just move it back to to back base mode. What what is the how does the clustering work with Quickwit? Are

38:10 Quickwit Cluster Architecture

38:42 you using, like, you know, Raft to do a consensus? Are you doing something completely different? Maybe you could share a bit more details. Yeah. Okay. Well, that's an interesting question. Raft, like, it's always, like, painful to do to do these kind of things. So we we avoided that for now. We don't like we managed to scale the Quickwit cluster to fifty fifth between 5,100 nodes and in ingesting more than, like, several gigabytes per seconds, and it was it's working well. So right now, we don't need that. We may need one later for replicating the metastore.

39:30 So currently, let me so we don't use Rawkode at all. The cluster formation is ensured by a gossip algorithm that we implemented called Chinchot. Mhmm. So so once the cluster is is is formed, so you have different roles. You have, like, the metastable, which is the central piece where we store the index config, the splits metadata so that we we are for an index, we are aware well, how many splits we have, how many documents in those splits, and a few bunch of meta metadata. This is very small because, generally, you have for 10,000,000 documents, you have one split. So it's

40:19 pretty, pretty small. And you can choose to host it to host this to to to back this metastore component by a file on an object store or PostgresQL. So once you have this metastore, then you have search of roles and indexer role. And thanks to the decouple architecture, like, you know that your data is on your object storage, for example. And you can start one or 10 searches, and they all fetch data from the object storage. That's for the read part. And on the right path, you will have those indexes who will receive some documents through the ingest API or from

41:06 a Kafka source or Kinesis or Google Pepsis, whatever. We have we have native integrations with several distributed queues. And those indexes will just prove the data from Kafka or will just take the data that you send in through the ingest API, write index files, split files on the disk, and upload them to the object storage. And that's it for currently, it's working. It's it is scaling parallel like this, and you have totally different write and read pass, which makes it very simple to manage. You have a a last component called that is doing very a few things only.

41:56 Typically, it is handling the retention stuff. So if you every hour, I would it would he he the janitor will check if he can delete some speed files on the object storage. And it can also execute some delete request because we have users who may want for g GDPR reasons. They want to delete some data in in their index. So it's possible to delete data in the in Quickwit, but not at a frequent rate. So Quickwit is not made for updates typically. But for infrequent deletes, it's working it's working well, and the generator is ensuring

42:38 this part. And that's it for the for the architecture. Yeah. Okay. So that delete constraint is why append only data, like logs and observability is why it's such a good use case for that, and that's why you're pushing on that. So that makes sense. Now the blobs, the objects store stuff, right, for your meta store and for all the data itself, Do you support, like, tiered storage? Like, can I keep the last six hours of logs and traces on a fast NVMe disk and then push everything older to object store, or is that something that may come later?

42:45 Data Storage and Caching Explained

43:16 So, yeah, object storage is our primary site. So it's not the it's it's the primary store. Right. Okay. So that's the first thing. Now if you have a lot of queries, you may want to use some caching. And it's we so we added this feature in the last version of Quickwit. So what will happen is that on the searcher side, only on the read pass because it's where you have your queries coming in and you have to respond super fast, for example, or you have a lot of QPS. In this case, you want to keep on your local disk

43:50 the data that you already downloaded, for example. So those split files, you want to keep there keep keep it there. So we added this possibility to to reduce the cost of the object store storage because, as you know, you when you with a storage like s three, you will pay the get request. And this will this can become pricey if you have a lot of. K. Awesome. Alright. Are you ready for your big skiddy demo? Yeah. Sure. Alright. So just preparing my screen. Yeah. Take your time, man. All good. Okay. I'm I will share my entire screen.

44:41 Demo Environment: Kubernetes Cluster

44:41 Let's go. I'm just checking if I can connect. So so here, I'm I'm I'm showing you a quick the Quickwit interface on the cluster that is available on on the Google Kubernetes engine. So it is hosted on Google Cloud. What can I show you? So we I have a bunch of node here. So contrary to the Quickstart, here we are we we have a cluster deployed with a hand chart, and we have, like, several nodes with different different role roles, and I forgot to talk about the control plane, which is quite important, actually. So the control plane here, I will explain

45:34 it in a a few seconds. It is handling, like, how we distribute the indexing task to indexes. So it will look at all the indexes you have, and we say, okay. This indexer will handle those 100 indexes, and this second indexer will handle those two those 10 indexes those 10 remaining indexes. You have one searcher node, one indexer here, the generator, and the. You have also dead nodes because previously, before this this demo, I started 10 nodes on this cluster, and they are all dead right now because I shut them down. So, yeah, also on this cluster, I have

46:26 Working with Large Datasets (500M+ Spans)

46:26 a bunch of indexes. I will show you during this demo the trace indexes because I will ingest a lot of trace in in it. I already inserted almost 300 millions of traces. So it's not traces. Actually, it's spans. When you when you collect when you when you instrument your applications, your application will send span. So you need operation in in your program. There are there are, like, 65 splits, and it's it's represents in a very compressed size is 200 gigabytes, and Quickwit managed to compress it to 35 gigabytes. And it's on it's not on s three, but it's on

47:18 Google Cloud Storage. So we are compatible with different different object storage. Nice. So one interesting thing I can show you, like, the the query detail. So so here, I have only okay. I have also logs here. So I I will show you in. But I have only one searcher, and so I'm able to search, like, inside millions of millions of spans. So in two seconds here because I'm searching through all the documents, so it's not very efficient. But I can just for example, let me choose probably yes. Searching through only the service name Quickwit because on this cluster, I'm

47:29 Searching the Demo Data

48:16 indexing Quickwit on traces. So the fact here, I'm just looking into 1,000,000 1,700,000 spans, and it's way faster. Mhmm. And everything is on object size. But for Quickwit, it's it's not it's not a big dataset here. If you have 200 millions, that's okay. That's really easy to handle. So what I will do is we will index more more that into it. And I will also show you our graph on that plug in because we implemented, like, a graph on that plug in to to search through logs and traces. So what I will do now is that

49:05 Monitoring Indexing Throughput (Grafana)

49:05 I will I will start indexing data into Quickwit a bit more than currently. So here, what you see is a dashboard to monitor Quickwit with metrics. You have, like, the classic endpoint slash metrics to gather metrics generated by Quickwit. So here, I'm looking at the in the indexing throughput of hotel traces index. And you can see that you have already a bunch of documents that are indexed. Those spikes are due to the fact that I ran some queries. And so when you run queries on Quickwit, it will generate more logs, and that's why there is more more

49:50 spans here. But but it's almost nothing. So what I will do is I will use a tool called. So this so is a tool is a tool that allows you to generate a lot of spans and send them to one sync that is compatible with your open telemetry protocol. So for example, here, I will generate I will just create jobs several jobs that are sending spans to Quickwit. So here, I'm using the OpenTelemetry API of Quickwit. So you are seeing that Quickwit has indexer service, and we will send just boost tray boost traces to the indexer service. And then the

49:59 Ingesting High-Volume Data

50:51 indexer service, if you have several indexers, it will just forward those spans to whatever indexer are present. So right now, we have only one indexer, and I would just create this first job. Fraction. I'm gonna check if everything needs I started. So okay. We have, like, those different jobs that are created, and they they will send, like, thousands of spans to Quickwit. Let's see how Quickwit responds on this. K. So you you see this this spike. So, normally, you you should expect Quickwit to index at between 20 megabytes per second and 40 megabytes per second.

52:00 For for very simple documents, you can go up to 40 megabytes. So here, we should expect something around 30 megabytes per second, and we will see how which kind of spans are sent by. So yeah. So we have this this first indexer that is running, so we're happy with it. But if we if you want to index more data, what we can do is scale the indexer. So in Quickwit, you have in the deployment, the hand chart that we provide. We we added two stateful sets for indexer and searcher. And I'm using here the common line tool

52:17 Scaling the Indexers

52:44 or k nine that is very, very cool to play with Kubernetes. And I will just add nine more indexers to to index more data. So, normally Yeah. So I added, like, nine indexers, and they're they're already ready. So I'm normally, I should be able to start another job that is sending even more traces, and I have a second job Second job for that. This one has even more jobs, and we we it will send more traces to Quickwit. So let's create it and see what's happening on Quickwit side. But here, we have those two indexes

53:53 from the metric. I will just refresh just to make sure that we are taking into account all the pods that are running. And we will wait a bit to see how indexing is scaling. Okay. So here, you you can see that we are already at 162 megabytes per second. I will jump to the graph and I plug in interface of Quickwit so that we can we can have a look at the volume. Can I ask a question? Yep. Sure. Can go you go back to that index and dashboard? Yep. So something I noticed that was quite

54:38 interesting there is the resident set size of indexer zero dropped. Is it rebalancing across the other nine indexers? So probably here I I don't know what why he did that. Why Normally, what's when you will see such a drop, it's just that it is indexing less documents. And so the indexer is taking so probably, some documents went to another indexer. That's the most probable thing. So even though the data is stored in object store, did the indexer store everything in RAM, or are they paging it in and out? Yeah. So one one indexer has, like it needs to it needs

55:27 some RAM to build the the data structure. And we have what we call a wall so that to that is saved on the disk So that when you restart, it will if you if you have, like, an issue on your on on your pod, it will all start with at the same position. So it will keep, like, a buffer here. It's I think it's twenty seconds for this index before it is uploaded to to the object storage. So the this time is used for the wall and for building the the data structure the index data structure. Yeah. So that drop

56:09 then may it's just been a flush in the wall to object store potentially. Yeah. And, also, what I would expect is that it isn't this particular node is indexing less document than others. If it's lower, like, we can check. For example, if we look at the first one, because this is the first one. So we can see that it's indexing only on 20 megabytes. And let's have a look at another one. This one is indexing more. So what I suspect is that the configuration of is a lit little bit different for the first job. I don't know. I'm I'm not I'm not

56:53 sure here. Let me check. Let's see. The configuration seems I don't know. No. It's I I don't think it's coming oh, that's interesting. Here, we are using gRPC, and here we are using Uh-huh. HTTP endpoint. So there is a this small difference. Okay. Okay. May maybe I I should look into that to see if it's coming from from that, or maybe it's the node that is already running the jobs from from. I'm not sure. Yeah. Okay. Thanks. Okay. So nice thing is that you can see that we are indexing a lot of traces, and you

57:44 Exploring the Quickwit Grafana Plugin

57:51 what now we want to search to to have a look at this volume too, and that's the nice thing that we have. So I'm using here the Quickwit plugin. And so it's like we it's a that you can use, like, any other plugging. Actually, I'm not on the right. I think I should be there. Yes. Because I have two I have two instances. So one on port three three thousand. So I must stay on this one. So for this demo, I created two two data source, one for logs and one for traces. To create a Quickwit data source, you have to specify

58:49 the HTTP endpoint. So here, we we are just pointing pointing to the Quickwit searcher service. And then you have to specify an index ID. The message field name, log level field name are optional. You can also link to trace ID, and I will show you what's the purpose of that. It is when you want to go from, like, a log line or a document and see the whole traces the whole trace with all the spans and show it with with with user interface. So let's go back to the explore view and have a look at traces.

59:36 So here, we have some logs because I'm I'm using OpenTelemetry collector to send logs into Quickwit. So it's a classic setup for a Kubernetes cluster. But let's have a look at traces. So here we have one searcher, but we have a lot of logs. So that's why it is taking some time. And, actually, I will probably reduce the number. I will stop I will stop this because the index would be a bit too too large. I don't want to we we we can support this, but I think it's it's useless now. So okay. It is taking a bit longer

1:00:30 than I would expect. So sometimes I have some issues with my with my instance. So I have some packets over at all lost. So I would just restart this portal one to make sure that it it's working. So we have already, like, several hundreds and of millions of traces. So we have only one searcher that is fetching data directly from the pure object storage, and we are doing here two query. Two queries. One query is for it's a data. You you have to so it is taking, like, several seconds, but you have to see that here we

1:01:17 are making an aggregation over probably now let's let's have a look at this. Yes. We we have almost 500 millions of documents. So it's normal with one searcher to to take a few few seconds to just do this data histogram. In one minute here, you can see that we have 27,000,000 spans, so that's a lot. But Quickwit can handle it with only one searcher. And, of course, if you look at the only fifty minutes, it should be faster. But you you still have a lot. Yeah. So here we are looking at traces, spans to be

1:02:08 accurate. So if we look at a a span, this one is a Quickwit span, like service name Quickwit. So if if I'm looking so I can play with this and search for span like this. So if I'm looking only at quick quickwit service service, you have a lot less faces spans because most of them are generated by trash gen. Something like that. So, yeah, so here, we have we have all of your we our traces on traces job. And if we look at Quickwit, what can be interesting is that if you look. So here, the interface is not very nice,

1:03:00 Visualizing Traces (Jaeger Integration)

1:03:05 but we can have a look at the what's inside the span and also link it to the trace. So here, for example, it's a trace it's a search trace from Quickwit. So you can really see what's happening inside Quickwit. So here we have a leaf search. So leaf, it's because it's happening on one one node, and we are searching through one single split. And then you have the those different steps. So you you can dig into that. Can I ask a question? Yep. Is on the right hand side, right, the split the split to show the trace, is

1:03:51 that actually creating Quickwit, or is it creating Jaeger? Okay. It's here. It's create we are I'm using a Yeager data source. Mhmm. And it is this Yeager data source is directly hitting the rest API of Quickwit to gather the spans here. Okay. So you support the Yeager API for the benefits of this Grafana plugin and other UIs, I guess. But Yeah. Exactly. So I just showing you how I configured, like, for Yeager data source. So data source I like that. Is just so you have to to give, like, the standpoint. And because plug in is using the rest API, which

1:04:36 is covering, like, spans in a certain way, and we implemented it we implemented it at the end point. And that's it. What else? So one interesting thing that you can do with Quickwit is also aggregations. What else can I show you? For example, here, if we look for traces, one thing that you that is nice to do with traces so I will look at Quickwit traces because they are more relevant than try the problem with is that they are a bit dumb. I will show you some of them a bit later. But one thing that you can do with

1:04:51 Aggregations and Analytics

1:05:23 Quickwit is do some analytics on your traces. So for example, if you want to do, like, some kind of term aggregation on span name, that's typically what you want to do. And what you want to do is to to know if which span is, for example, taking longer than you than it should. In this case, you can have a look at the average, for example, of the span duration. So here, we have the output of an aggregation that is giving us, like, all the spans that are available on the Quickwit service and, like, the average

1:06:08 duration of each span. So it can give you, like, a good idea. Generally, we want to get the percentile. Monitor, like, when you monitor a service, it's it's nice. So you can you can get the the percentile to to see there is a degradation in your service. And I will I will assure you a real a well world application monitoring just after that. So yeah. So you can do this, like, on really on really big datasets. So that that's that's one of the advantage of using Quickwit. And So how how much of Lucene's search and operators do you support?

1:06:56 Query Language Capabilities

1:07:00 Because right now, we've we've been doing very basic searches. Right? We haven't done any term boosting, fuzzy matching. Yeah. And does it support any of that stuff? Yeah. So it supports what we support currently okay. We don't support the fuzzy part. We support prefix queries. So here, if I'm starting, like, quick with an AMD ending this with a star, it will return the response. We don't support suffix suffix query where you would start with the star. So fuzzy fuzzy query does not work. We don't support currently even if, like, we we could because it's fuzzy

1:07:50 and queries are supported in. So we could add it. What the issue what we have with this kind of queries is that you would we need to cache more things. So we don't we did for for now, we we don't we did not added it added to to Quickwit. So, yeah, currently, what you can do is this kind of stuff, like, for example, if you want to specify a span name You can do this Boolean queries. Yeah. That that's mostly the the the main part that you can you can do. You can do some phrase query

1:08:40 on the full text search. What I I can show you probably this on the Quickwit log index. So here we have logs, so you have more text. So it's nice to to have a look at this. Let me find, like, this. Would you be able to do a sorry. I'm gonna sort of span out of work. Yeah. We have some nested JSON there with the resource attributes. So would you be able to find logs where the namespace or node name is that JSON value? Yeah. Yeah. Exactly. So that that's one like, everything that is not

1:09:05 Querying Nested Data

1:09:25 declared in the schema or declared you can you can do this kind of fancy stuff. So you can you you can write this. Oh, great. Okay. Oh, so it just expands that out for you when that JSON artifact exists on the payload. Yes. So it's the Quickwit the Quickwit plug in is expanding this. So it's so the the region is there, so you can have it. That's a that's a real document. And then it's is exploded for UI reasons. Okay. Cool. So what you can do, but not not on this index because it's not possible as I with I don't

1:10:10 index positions for logs, but you can do fresh queries also. So for example, imagine, like, if we take, for example, this message, like, merge. You you if you want to match exactly those two consecutive words, like, it's in body message. But you you won't be happy because I did not index the positions. But for for some use cases, it's nice to be able to do phrase queries. When I say phrase queries, it is that we expect a document to documents only with those two consecutive words. Nothing between them. And we have, like, with slope. Slope, it means that we tolerate, like,

1:11:09 a change of positions of one element, and and that's it. So we we we support those kind of queries too. But generally for calls them proximity queries. Right? Or you want two words within four words of of each other? In Elasticsearch world, it's phrase queries with slope, I think. Oh, alright. Okay. Great. So current it's not working here because I have said we I did not index, like, the the positions. For logs, generally, it's not necessary. Okay. Cool. Awesome. Yeah. So I think that's it for the demo that I can show you today. I have, like, a preview of what you

1:11:46 Future Development and Roadmap

1:11:53 can expect in the in the coming weeks or months. Because here, one drawback here, as you I think you you will note you noticed it, is that it's it's nice to be able to search through, like, this amount of logs. It's nice to be able to to to build this kind of statistics. But you can you can go you can do more. And I prepared, like, an application monitoring dashboard built with Quickwit. So here, you can imagine that you have an application. Like, here, it's a first API application, and I'm running, like, on my local instance,

1:12:00 Preview: Application Monitoring Dashboard

1:12:43 some script that is making request on this API. And this application this past API application is also instrumented with OpenTelemetry, and it is sending traces to Quickwit. And so what I did here is saw a bunch of aggregations with what's available with Quickwit currently. And I just, like as I showed you the latency of span name before, Here, I'm just showing you the average latency of some service name. And here, I'm using a span attributes, not not a HTTP content length, but here we are we are looking for the the root. So if you you look at the root,

1:13:38 you can, like, build your application monitoring dashboard because OpenTelemetry giving is giving you, like, the nice attributes that you can use to understand what's happening in your application. Nice. So here it's a it's a preview because currently, I'm still working on this. It's working I'm using the zero dot seven version, so it's you can already do that with with Quickwit. I'm just not totally satisfied by when you need to compute some rates. Like, Quickwit is nice for computing percentage, average, mean, max, this class boost classic statistics. But we are new not good at computing rates

1:14:36 rates, for example, which is more classic when you when you are doing some time series analysis. So we hope to provide with those kind of aggregations in the coming months and so to have a to to be able to to build even better dashboard than this one. And that's that's our road map currently, so we want to to provide this kind of stuff. What else? Yeah. We one thing that we want to work on is also the plugin as you saw it in the explore view. Here, when you need to build your query, it's you you need to you you need to

1:15:29 your to know your mapping. So it's not always easy to know which field to query, so we plan to improve this experience. And we have a bunch of developments that are but we we are currently working on it to improve globally the plugin experience. So we hope to to deliver more things on on the user interface side. And on the back end side, we plan to work also on on metrics, which are not supported currently in Quickwit and also to provide a type query language. So we we are we have a lot of work for this for this year.

1:16:10 Yeah. There's a question that we have from Blackboard Ben asking, is it oh, I'll pop this back over here just there we go. Is it possible to parse log bodies at query time to support cases where log records weren't properly parsed at ingestion time? So, you know, where we have that JSON inside of the source of the body that hasn't been expanded or exploded is their ability to do that. Like, you know, the way I don't know if you're familiar with locating and LogQL, but it has the ability to pipe into a parse JSON and then start to query and

1:16:12 Viewer Question: Query-Time Parsing

1:16:44 work with that JSON. Is that something Quickwit would have on a short map? So I'm just trying to understand what what do you want to do to do do you want to do some, like, some count on it, some aggregation on it? Like, you I'm not sure to understand the the use case. Yeah. Maybe there's a log client that has adjacent string where they want to parse that and filter based on one of the properties within the JSON and then aggregate that. Yeah. Okay. So we we we do we do not support that currently. K. Is that something that you're looking to

1:17:18 support in the future, or are you just focusing purely on on metrics at the moment? That's a good question. Like, currently, don't know what we will support on this part. Mhmm. What what we want to provide is a a pipeline wage so that you can indeed filter on whatever field you want, then pipe it and count and then pipe it again and do some stuff. I don't know yet exactly what we will do. That's all. That's the plan, but I don't have details. Do you think you'd be likely to support LogQL rather than inventing your own?

1:17:58 Yeah. That's one of the question we have. Like, I think we want to, like, we want to support to be well integrated in the ecosystem. So we plan to support Kibana, and we also we support, obviously, Kafana. So we really want to it's actually being compatible with existing language could be really nice. Alright. Awesome. Well, thank you so much for joining us today, for sharing all the information about Quickwit and for the wonderful demo. I was getting a little bit nervous with the how much data you were putting through it, but Quickwit handled it just fine. So

1:18:28 Conclusion and Wrap-up

1:18:42 that was awesome. Yeah. And usually, start several searches, but I need to start several searches when you have billions of documents. Like, 500 millions, that's okay. Child's play. Alright. Awesome. Well, thank you again. Thank you very much. Anyone watching after the fact, please feel free to add any comments that you have or questions to the comment section, and we'll do our best to get back to you as soon as we can. And Blackboard, Ben, says thank you, folks. Yeah. Thank you for watching. Thank you, Francois, and everyone have a wonderful day. Bye. Thank

Technologies featured

Meet the Cast

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Comments, transcript, and resources

More from Rawkode Live

View all 173 episodes
Kubernetes

More about Kubernetes

View all 172 videos
OpenTelemetry

More about OpenTelemetry

View all 4 videos

More about Grafana

View all 20 videos
Jaeger

More about Jaeger

View technology
Rust

More about Rust

View all 22 videos