Overview

About this video

What You'll Learn

  1. Build Cortex from source, run a single binary, and wire Prometheus remote_write for first ingest verification.
  2. Scale a local Cortex deployment with Docker Compose and MinIO, then observe how distributors route metrics to storage.
  3. Compare single-binary and microservices Cortex modes, including ring-based placement, component responsibilities, and scaling tradeoffs.

Ganesh Vernekar joins to walk through Cortex: building from source, wiring Prometheus remote_write, scaling it with Docker Compose and MinIO, the single-binary vs microservices architecture, and how it compares to Thanos.

Chapters

Jump to a chapter

  1. 0:00 Holding screen
  2. 1:25 Introductions
  3. 1:26 Introduction and Welcome
  4. 2:44 Guest Introduction: Ganesh Vernikar
  5. 3:45 What problem is Cortex solving?
  6. 7:18 Getting Started: Basic Single Node Setup (Hands-on)
  7. 7:20 What was prepared upfront?
  8. 8:15 Building Cortex from source
  9. 8:44 Building and Running Cortex Binary
  10. 14:30 Running Prometheus with Cortex
  11. 15:35 Configuring Prometheus Remote Write
  12. 18:41 Checking Basic Setup (UI & Metrics)
  13. 23:41 Verifying Metrics Ingestion
  14. 25:14 Horizontally Scaled Demo Setup (Docker Compose)
  15. 25:20 Slides and demo - Cortex architecture and scaling
  16. 30:56 Scaling Cortex Instances in Demo
  17. 32:00 Querying Scaled Cortex via Grafana
  18. 45:00 Walking through the demo ourselves
  19. 54:00 Cortex architecture
  20. 54:04 Cortex Architecture Overview & Components
  21. 57:01 Scaling Strategies: Single Binary vs. Microservices
  22. 1:01:32 Storage Backends (Object Storage)
  23. 1:02:20 Caching Strategy
  24. 1:05:00 When to Adopt Cortex: Pragmatic Advice
  25. 1:06:26 Cortex vs. Thanos (Brief Comparison)
  26. 1:07:11 Final Thoughts and Conclusion
Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

1:26 Introduction and Welcome

1:26 Hello, and welcome to today's episode of Rawkode Live. I'm your host Rawkode, and I am very pleased today to be covering the Cortex project, a time series horizontally scalable time series database, which is part of the cloud native foundation as an incubating project. I wanna first thank my employer, Equinix metal. They give me the time to invest in making this show and producing content that we can all learn together with. So thank you very much to Equinix metal. They have given me a code, Rawkode dash live. You can use this to sign up for the bare metal cloud and get $50

2:04 of compute time. Now you can use that $50 you know wisely and use smaller boxes and have lots of compute time Or you can splurge it really really quickly in a few hours with some massive bare metal machines. So you know, pick your poison. We have a Discord community where we allow people to chat about these episodes afterwards. So if you're not watching live and you have questions, feel free to drop in and chat to us there. Of course, if you are watching live, please get involved. Leave us comments in the YouTube section, the YouTube comment section

2:35 and we'll be happy to tackle them as we go. As I said, we're gonna cover Cortex today and I have a fantastic guest with me. So let's bring in Ganesh Vernikar. How are you Ganesh? I'm good. How are you, David? Yeah. I'm very well. Thank you very much. So you are an engineer at Grafana, a maintainer on a Prometheus project and a contributor to Cortex. Can you just but maybe add a little bit of flavor to that. Tell us a little bit more about Grafana, please. Yeah. So I'm a software engineer at Grafana Labs, as David said, and I have been

2:44 Guest Introduction: Ganesh Vernikar

3:09 a maintainer of Prometheus project since one and a half years. And I maintain the TSDB part of the Prometheus and sometimes work on the PromQL part. And since I joined Grafana, because we have Cortex as our back end for the hosted solutions, I also contribute to Cortex regularly. So that's about the background that I could say. Nice. I mean, not just a maintainer or Premiere, but the actual TSDB section as well. I mean, it takes a very special kind of person to build a database and work on a database. So let's say, I'm glad it's you and not me. That's for

3:42 sure. So can you tell me just, you know, when it when it comes to Cortex, right, there there are many flavors of Prometheus kind of, I don't know, evolutions, I guess. You know, other projects that are using Prometheus as a core and API and and adopting some of those best practices that Prometheus has kinda brought forward as we all adopted cloud native and and Kubernetes. So can you just tell us why Cortex exists? What what space is it trying to fill inside of here? Yeah. Definitely. So, this Cortex or any other project like Thanos, m three, Victorimetrics, etcetera, they don't aim to

3:45 What problem is Cortex solving?

4:19 replace Prometheus as is, but they extend what we do with the Prometheus data. Before we can talk about why we need Cortex, we let's talk about what's lacking in Prometheus and what this gap is in. Actually, made some notes so that I don't forget telling something in this. Yeah. So first, talking about the global view. Let's say you have Prometheus in multiple Kubernetes cluster, and you want to have a global view of all the Prometheus that you have installed in multiple clusters that you have. That's actually doable in Prometheus, but there is something called Prometheus Federation

4:58 where you run another prem Prometheus which scrapes the existing Prometheus and you scrape out of Prometheus. But still, are limited by what Prometheus can do in a single machine, which is super high, but still there's some lacking. But let's say you want to do So you have two Prometheus scraping the same stuff, and you have a federation on those two Prometheus to have a global view and multiple other Prometheus. But, again, if you have a single Prometheus for federation, that's single point of failure, and you don't have So we'll have for Prometheus. So you can see

5:33 where it's going. It's getting kind of messy handling multiple Prometheus. So that's one of them. Cortex acts as a centralized database where you have all the data of all the Prometheus of your cluster, and you have a centralized view. Another thing, multitenancy. So when you are a large large organization, you would like to have diff you like, you have different teams, and sometimes you might want to give access to a team for some part of the return, not everything. And Prometheus is not, multi tenant natively. So Cortex is multi tenant natively where if you've you can say that some part some

6:13 group of Prometheus servers are of one tenant, some are other tenants, and everything is again in a centralized store. And, yeah, Prometheus scales very well vertically. And but there is a limit of how much memory or how much CPU that you can have. So Cortex fills that gap by having a horizontally distributed database where it can scale horizontally unlimited metrics. So those are a few things that I can see right now. Yeah. Awesome. Those are things that I think, you know, as, you know, as so many people are adopting Kubernetes, horizontal scalability is now almost just something that

6:53 we we we take for granted. Yeah. And well, at least in some applications. But when it comes to databases, this is a completely different domain, a completely different problem and a really good space that Cortex is trying to kind of fill a void and and solve some challenges. So I'm really excited to see how we can get started with Cortex and how we can use it for all of our our metrics. Yeah. Okay. So we're going to I think today's a little bit different from normal. We're gonna split this into multiple sections. So we're we're gonna cover the installation of

7:20 What was prepared upfront?

7:26 of Cortex and some hardware that I preconfigured. So Let me get my screen on. So this is Equinix's little console. I have spun up two bare metal machines, Cortex one and Cortex two. I don't know if we need them, but that's alright. I'd like to have them available. Now I believe we're gonna run through getting started on one of these machines. Yes. And then you wanna cover a little bit of the architecture, which I'm super excited for. And you think you've got a demo and then we're gonna try and rebuild that demo ourselves on this computer. That's

7:57 an array. Okay. Awesome. So should we Or maybe we could we could be bold and just do all the demo in your machine itself. It should be simple. I mean, I I do like to be bold. Yeah. Definitely. I've broken a few things on this show before. I'm not I'm not scared of doing that again. So Yeah. Shall what do wanna do first? Shall we get Cortex installed on one of these machines and go through the Yeah. Let's, yeah, let's get started with the doc. So in the document, you can click on documentation and and

8:15 Building Cortex from source

8:29 click on getting started. We have all the steps that we can do. And click on block storage, which is going to be the next default storage, which does not involve an index tool. Okay. So First thing, yeah, first thing is to clone the repo. The RubyFace repository? No. Cortex. Cortex. Got you. Got you. Yeah. Yeah. You can clone both of them. Like, Cortex is a long term storage solution for Prometheus. So Prometheus has to push data to Cortex. So we will have both Prometheus and Cortex running. Alright. Perfect. So let's get this clone done. Good. It's fast.

8:44 Building and Running Cortex Binary

9:22 There's always that doubt in your head. You're like, oh crap. How big is this? What's the connection gonna be like? But when things just work, I'm a happy man. So Yeah. Alright. So this is the Cortex cloned based on we've just seen here. I'm assuming you just want me to go ahead and just try and and build this. Yes. That's all. It's it's simple. It's just a single binary. Whether it's scaled horizontally or want to run on a single node, it's always a single binary. Nice. And how long does the build take? It should be pretty quick. Like, the first

9:57 build is going to be slow. Yeah. It's done. It's alright. I think I should do a fair bit of a blah blah blah blah blah. Words. I think I should do a fair bit a fair amount of hardware on this. Yeah. We've got 64 cores and, you know, 400 gig of RAM. So it should be snappy. So I'm assuming now I have a Cortex binary, which I do. Yes. And you need to pass some flags, like you need the config that's there in the getting started. Ah. Yeah. So that was You have a That's years ago. You

10:27 have a config already given to you to get started. Alright. Do you mind if I just pop that open and we'll take a quick look? Yeah. Sure. Okay. This has few stuff which are not required right now, but still it's there in the getting started. Yeah. Just always nosy, curious. I like to kinda see. So Yeah. It's it's alright. Okay. Yeah. Cool. So once we are done with, like, the hands on, I could explain what's in the config and the architecture, what's running inside. That should be useful. I I yeah. I expect this error. So

11:06 if you if you can see what network interfaces you have with IW config, think we need to change something in the config, or we will hire we can use LO, the local host. If you open the configuration file, we got to change a small thing. Okay. So If you go to the top, yep, there is yeah. In like, bottom of this section, there is ingestion life cycler. Just below the commented out address, you can add us no. You can comment it out. You can add a section called interface and assign it to LO. Just assign it to LO, and it should

11:51 be all good, I guess. Like, by default, it is in. Yeah. Okay. Yeah. I hope it works right now. Yeah. I probably need to know. It's going make it special. Let's just make sure I have have a workaround for this. Hang on. So what is the why is it what's the problem? I don't think I understand exactly what's going on. It needs a network interface to talk through. Give me a second. Why do I not have that handy? Oh, it's just technically. Yeah. It it I I've often I have I had a comment and oof,

12:47 big machine. Yeah. I always seem to over provision, but I think that's just one of the joys of working for, like, a cloud company that focuses on bare metal. Like, they thought their smallest machine is is still quite a large machine, and I just I love I know what they can show that off a little bit. Yeah. So I'm just sharing one more small flag that you can add Yeah. Okay. Yeah. Let's see. Along with the config file thing, you can add this following config line. Okay. So we're just gonna add this ingest or dot life cycle dot interface.

13:20 Yeah. So is that just overriding what's inside the YAML with Yes. Yep. It's just overriding the defaults. So I'm gonna remove that lane from the config then because it seems that that's causing I'm I'm I'm guessing. Feel free to correct me, but it seems like it just doesn't like that. Yeah. Where is that low? I think it's a different file. There are multiple files. Yeah. Sorry. Config blocks. Alright. This time, it's gonna work. Right? Fingers crossed. Ta da. Yay. So I guess if you could verify the first log, I guess, it's exposing the port nine zero zero nine, if I'm right.

14:13 If you go a little dot up, it will be useful to Yeah. 9009. There we go. Yeah. Yeah. So in the next step, we can start running Prometheus. Okay. No worries. Let me just So that we can push some data. Grab this IP address again so that we don't have to I guess I could adjust the background data, but we'll leave it running so the logs are visible if needed. So the Prometheus now, are we building that from source? Or Yeah. We could do it from source or we could just get the binary. It should be so it should be

14:30 Running Prometheus with Cortex

14:47 faster. Yeah. Probably. I mean, don't even know if Prometheus is available on on the Bintu repository. We just It is. Yeah. Then that means I can just start it with system d. Okay. Let's do that. Oh, okay. That might be too outdated hours. Yeah. It should be fine. But it would be simpler if we do why this was because we could change some config and other stuff. Alright. Okay. I'll we'll we'll go with your judgment here. Mine is always wrong. So Yeah. Because I'm familiar with the repository stuff, so let's go with that. Alright. So let's just quickly build that. Hopefully,

15:30 it won't take too long either. Yes. Until then, we can see the config that we need to change. Like, there is a default config where we need to add few stuff. Okay. So right away, I'm looking at this bit of YAML, and I see something that I'm vaguely familiar with, but it'd be really cool if we could take the opportunity just to kinda make sure my knowledge here isn't wrong. Now I see remote right and this is something that a lot of people in the previous community and users are always talk about premise usability for remote reads and writes. Can

15:35 Configuring Prometheus Remote Write

16:04 we just cover how Cortex is integrating, or is that part of your architecture thing? And I'm just jumping the gun a little bit. Yeah. So when you talk about all these long term storage is it has to get data somehow from the Prometheus. So Cortex uses the remote right where as as and when Prometheus scrapes the samples and store it stores it in its own TSB, it pushes to this URL. So this URL turns out to be Cortex here, so it pushes the samples to the Cortex. That's how Cortex gets all the samples that it stores.

16:35 So when you have multiple Prometheus instances, all of them point to the same Cortex and all of them just push to the same Cortex. Okay. So does that mean the Cortex is just a backing store? So Prometheus is still doing the scraping Yeah. To build Prometheus. Fetching the metrics. Okay. Yeah. Prometheus is fetching it. Cortex is like a storing thing, and you can also query Cortex. Right. Excellent. K. I think I actually understood that. So prometheus.yaml. Yaml. And Yes. Through You can just add the thing. Just add it. Right. Yeah. Just add it like that.

17:17 You could make one more addition. Like, in the script config, you can also scrape Cortex. Like, just there, you can see scrape config for Prometheus. Yeah. We could do that under the name of Prometheus. So you were gonna add an you wanted me to add a new job called Cortex rather than adding it as a new endpoint or target to the premier Yeah. Yeah. That would that would be clearer because you Yeah. So you gotta keep me right because I'm I'm a bit of a cowboy when it comes to this stuff. I'll just well, I I just wing it. Alright. Cortex.

17:52 9009. Copy to one to many. And if you go to the top of the file, just just so that everything is quicker, you can change the scrape interval to two seconds maybe because you don't have wait fifteen seconds for every sample. Yeah. And everything should be fine. Alright. I trust you. So we have cloned Prometheus. We built it from source. We should have the Prometheus binary Yes. Which we do. And then we're gonna run it with the configuration which is modified. Yes. I like it when things work. So Okay. Now now it should be pushing the data to

18:36 yeah. It's remote writing to what is it? Cortex right now. So if you could open the Prometheus UI or even if you want to spin up Grafana and then we went through that, we can see if it's actually scraping Cortex and see the metrics that are ingested in the Cortex. Yep. So the by default, this Prometheus binds to all interfaces or tried? Or do I have to do some asset compilation? No. It should have been all fine. But maybe we have to do asset compilation. We can do that. Yeah. As I get because all we did

18:41 Checking Basic Setup (UI & Metrics)

19:15 was build the Go command. We haven't actually did any of the React stuff. Yeah. You can do a make space assets, and it should be and rebuild the Prometheus. Hold on. This is taking a second. Yep. I think it's trying to exit nicely. Yeah. It's probably trying to end the remote writing and then come back. Oh, I broke Cortex with that. Alright. We'll get it back up in a second. I'm not sure what exactly happened there. Prometheus is ignoring me, so I am gonna just p kill. Yeah. Something has to happen in my file. There

20:06 we go. Right. So make asset. Asset. App install. Yeah. Yarn won't be available because it adds a special command. It's weird on the I've been to debian packages. Yarn is like this random thing. So I just need to grab the command over here. I guess I should actually probably just do that here so people can see. That's unfortunate. It's alright. It's a nice quick fix. So oh, there is an ABIM two version. Don't don't give me your nonsense. Do you have like, could if you could install NPM, you could install it. Yeah. Yeah. Let's do that. So

20:48 install NPM. I typing is not my strong suit ever. That was quick. Yeah. Yeah. The the network connections are pretty solid in here on these machines. So it shouldn't take us too long. Once NPM installed, yeah, we'll just do the yarn way of doing this then which was the there hasn't been too packaged though. But I'm not gonna fight with that today. We'll do that. Is that available in my No. No. Let's do x path equals path colon and user local lib modules. Damn it. I got bin. No. That is there. Right? What did I get wrong?

21:57 L s user. Oh, yarn is a directory with a bin. Alright. Let's just really fudge this for now. Yeah. Good. It doesn't have to be pretty. Right? We just need to get it working. So Yeah. There's definite there isn't a bin to app repository, you know, people shouldn't be expected to follow these horrible steps. But Yeah. Alright. It's not really people after this. Okay. Now one thing I can't make I can't throw hardware and NPM install, unfortunately. I think this might take a few seconds. Yeah. Not too bad, actually. Yeah. We don't need much of UI for

22:47 Prometheus. It's pretty simple. Ironically, it may have been faster just to app get a Stoker fan and now that I think about it. Oh, well. Right. And that's I mean yeah. Sorry. On you go. Yeah. The getting started has a step for Grafana. Oh, it does. Yeah. Oh, well. We live and we learn. Even though we have all these codes available, I'm not I don't even think that Yarn particularly or NPM or Node particularly uses them. So do I need to do a rebuild? Yeah. Yeah. This is okay. And run the committers again. I'll start Cortex first

23:34 and then start Prometheus. And then by magic today. Yay. We got there. Okay. Yeah. We can maybe run a quite like, you can say rate of yeah. Cortex underscore On the rhythm. Yeah. Cortex. First. Ingested underscore ingested, yeah, ingested samples total. Ingested samples. Alright. And we want to rate that. Rate or maybe ten seconds. Ten seconds? Yeah. The ten seconds should be for the metric and not the rate. Oh, I'm feeling it from q l. What did I get wrong? Like, the ten seconds and the square bracket should be inside the circular bracket. That's for the metric, and you can see

23:41 Verifying Metrics Ingestion

24:27 the graph. There's a graph tab. Yeah. You can change this time to five minutes, like, the range of the query for five minutes. And they're not there. Yeah. Yeah. In below that. Yeah. One hour is too much. There we go. You can see some samples ingested. And, yeah, it's it's coming from core it's going to Cortex, and Cortex is storing everything. That's sweet. Excellent. Good. Yeah. It's as simple as running a binetal, like, both Prometheus and Cortex, and you have a Cortex running. Okay. The Cortex runs. Prometheus handles all the scraping. It's remote writing into Cortex.

25:07 Yes. That's it done. It worked. Right? Yes. It worked. Okay. What's next? Yep. Maybe I can explain a bit of the not exactly the entire architecture. We could go back to the previous plan that we had. I can show a demo of horizontally scaling Cortex with other components involved and then talk about the architecture, and we could replicate the thing on your machine. Let's see if That's good. Yeah. How that works. That's good. That's good. Yep. Now I'm gonna share my screen. If Emma Watson has any questions, feel free to leave them in the comment section.

25:20 Slides and demo - Cortex architecture and scaling

25:47 Yeah. There we go. So I'm reusing the slides by Marco who who is also an engineer at Grafana Labs and also Cortex and Thanos maintainer. So I just took his permission to use his slides, so I do repeat the work. So this is the bare bone Prometheus, which is scraping from service, and you can query. And I will jump to what we just did. By the way, here's Marco. Yep. What we just did right now is this thing. We have a Cortex, and we when Prometheus is just pushing samples to Cortex. And when we want to query, like,

26:26 right now, we queried using Prometheus. You could in the Grafana where you have a Prometheus data source, you could point it to Cortex and insert query that. We skip that step, but I will include that in the horizontal scaling that we're doing. So right now, we just had one Cortex. What we're going to try now is this thing. We will have multiple Cortex and Prometheus pushing to them. And for the object stores, we'll be using MinIO because we don't want to host, use the AWS or GCS anything right now. So I will jump to my terminal for horizontally

27:05 scaling. Could you just zoom in a little bit on that? Sure. Perfect. Thank you. So I am using like, under my account, I'm using the demo. In the development branch, we already have, like, prebuilt, like, pre specified Docker configs, our Docker Compose config that you can just run a single command to run the single binary and with the horizontal scaling. I just created another demo with a little more simplified stuff with all the things that we don't need. So I'll first run the demo, and we'll go through what's there inside. You have to ignore the

27:59 all this part thing because it's just a mess that I ended up in my environment. The only thing that's there here is compose up before that. Let's see what all do we have in this directory. We We have a compose up compose down where compose up just brings up the docker compose and spins up multiple stuff. So I'm just gonna run the compose up. Okay. Before while this is happening, we can go through the config that we have here. Can you just amend a a little bit more on that? Yeah. So let's see what all we are running

28:45 in this particular thing. So this is the Docker Compose config. We are running MinIO to mimic the ES three or GCS, and then we are running the Cortex itself. We are and I said we need console here, but why we will get to that in just a second? We have a load ban, so that will be the ingress, and this will pass on all the things to the Cortex. And we are running two Prometheus just to mimic the stuff, which will be pushing data to Cortex. If I'm going fast, just stop me and tell him to reexplain stuff.

29:25 We have node exporter so that we can get some more traffic. Like, Prometheus will be scraping Prometheus itself, like meta monitoring. It will be scraping Cortex and node exporter just for some additional data, like, yeah, additional data. And we are running Grafana so that we can query Cortex. And we are running console. Like, we don't need console right now, but we are anyway running. Yeah. And let's and one small detail to know right now is we are running with a replication factor of three, which means in this horizontally scaled stuff, whenever we get a single sample,

30:05 it doesn't just go to one single cortex. It goes to all three. So when you ran the cortex in your machine, it was running with replication factor one. And when we are running with the replication factor three, we need to have a quorum of two, like, two in two Cortex out of the three should say that I have returned the sample successfully. Only then the request can succeed. So if I move back to my terminal, you'll see lots of errors which say at least two replicas should be required, but it could only find one. So what we will

30:47 we'll see what's running right now. Yeah. We have a single Cortex running. So let's scale it to three so that these errors go away. Yeah. We have three Cortex running and the logs it's getting to a scene state right now, I guess. So, basically, what's happening right now is let me go to the beginning where Prometheus is scraping and how it's pushing data. Okay. Alright. Before they're doing anything, let's see if it's working properly. So this is the barebone Cortex UI that we have. There is nothing special. If you go to the ingest to ring

30:56 Scaling Cortex Instances in Demo

31:42 status, it tells you that there are three Cortex running here, and all of them have almost equal share of data that it's incoming. So that's all we need to know right now. Let's go to a fresh Grafana that's running inside. And we'll add a data source. So Cortex is % Prometheus API compatible even on the read path. Like, read path and the push path, everything is Prometheus compatible. Oh, nice. So we will choose Prometheus as the data source. And because this Grafana is running inside the Docker containers and we have a NGINX load balancer. So we can point to that.

32:00 Querying Scaled Cortex via Grafana

32:41 And the Cortex the Prometheus API start with this prefix, so we mentioned that here. Let me just see if the data source is working. So I'll run the same query that you ran right now, which is Cortex ingestion. And we we see the samples being ingested here. If I just go to last five minutes, something broke. Where did that? Okay. We have some samples being ingested, like, 3,000 photon samples per cortex. So this is one cortex. This is second cortex. This is third cortex. So we have some samples incoming. Let's see. Let's try to scale up, and,

33:40 it'll redistribute the vocation swing. Maybe I'm going too fast, but I'm just going to verify from this scraping everything. It is. No. You're okay. It's all good. Let's scale the Cortex to five. We add two more. We got to wait for a few seconds, and we can check this ring status, which is available here. And we have two more incoming. All of them having almost equal share of 20%. Let's do last Can can we explain some of the vocabulary here then? Like, what is a a context ring? Yeah. Yeah. We will go deep into that.

34:39 Oh, okay. There are lots of thing to yeah. As in our time permits, we will go deep into that. So we we were writing from two Prometheus replicas, but and we are actually running two Prometheus replicas, which means it's But Cortex also has the support for it where it deduplicate stuff. So if I just quickly go through the config, we are saying that enable and we use console only for the where it elects one of the Prometheus as the leader and only one of them is scraped. And if you go to the Prometheus config, it has a label called replica

35:28 and a cluster. Replica and a cluster. So for common cluster, it selects one of the replicas. That's how it works. And if you go to the tracking status, you can see which Prometheus was selected among the two. So we are actually demoing like, these are two Promethe I going to the load balancer, getting deduplicated here, and we're just, spreading out the Cortex. Now you said you we will talk about the ring. Why not? Yeah. So you need some kind of logic so that it can spread the rights equally to all the Cortex. We actually have that explained

36:15 in one of the slides here. Why don't I reuse it? So let's say we have some random ring like this and instead of 10, when we have three court five cortex right now, we will have five entries in the ring saying cortex one to five. And in the config we mentioned, we want to have number of tokens as 512. So it'll randomly select some number between, zero to the int max 64. And whenever a series is incoming, we hash the series, and the hash is between zero to int max 64. And based on, like, which ingest through holds a token for

37:15 this particular hash, that sample goes to that particular ingestors. Did that make sense? Yes. Let me can I try and then repeat that back to you in my really naive understanding and we'll see if Yeah? Sure. Can we go back to the diagram with the load balancer and the Cortex in it? Yeah. I'll go with this one. That one. Yeah. Right. Okay. So we're I I just wanna make sure I got this right. We run multiple Prometheus, each scraped in the same target. Let's assume it's a bit on a Kubernetes cluster so that we have higher Yeah. High availability. If

37:51 one of those Prometheus goes away, it's still being scraped by at least one. Yes. Those prometheus are being consumed by the Cortex. And the Cortex, it's not actually if if I understand it correctly, it's not actually doing deep duplication explicitly, but it's actually doing deep duplication by hashing the metrics and then it's charting them across the many cortex. Is that right? I think I think I got you confused here. So the when Prometheus is pushing samples to the cortex, it has some labels additional labels which says I am from replica one, I'm from replica two from the same cluster,

38:26 and we have configured Cortex to enable HAD duplication. So even before we do that hashing thing, Cortex elects one of the Prometheus as the leader among those two, which we see in the tracking status. So it will blindly discard all the sample sets coming from one of the Prometheus and clone the accept samples from second Prometheus and remove the replica labels that we have. So after that step, we do the hashing. Ah, okay. Okay. I gotcha. Okay. Yeah. Now we can go ahead with what you're explaining. Okay. So there is the de duplication based on the labels that has

39:08 been added by the Prometheus, the it's then hashed and then assigned and then replicated across the Cortex is based on the replication rules that are in the configuration. Yes. Alright. I'm with you now. Nice. So let's say among this five in just us, the hash went to, let's say, the Cortex number three. Let's say we have only five right now because we ran five. Let's say the sample went to the cortex three, so it will be given to cortex three, cortex four, and cortex five. And in case a car in the cortex has to crash, let's say four is crashed,

39:48 it will still try to write to three of them. But once it writes to two of the cortex, it will return the request saying that, hey. I have I'm done with the quorum of two out of three, I can return. But it it will still try try to write with try to write to another Cortex, which will be the next Cortex, which is one. So no no matter what, it'll try to write to three Cortex, which is the replication factor, while satisfying at least the Quorum. So that's how the replication works. And when you are querying,

40:22 it'll actually like, there is something called query component. It'll query all the Cortex for the samples because you don't know where you might have gaps because of maybe cortex rolling out or some cortex is down. So when you hit a query, it goes to all five cortex. All five cortex get like, you hit a query to one of the cortex. It will send a query to all five cortex for the query, and the sample goes to one of the Cortex where it originated. It makes and matches the duplicate sample, and you finally get back the data to your Grafana.

40:58 Awesome. Very cool. So see the the usage of console here, is that because you're running this in Docker Compose? Like, I'm wondering if if Cortex is running Kubernetes, does it use the least capabilities there for the leader election without console? That's that's a good question. So console yeah. Cortex either needs console or HCD for this d d like, this tracker status, whatever you are seeing here right now is from the console. So for to elect a leader among the H a Prometheus, you need either console or HCD right now. And okay, I'm just hitting wrong tabs.

41:43 And this ring, initially, we used to store even this ring in console, but now it's it uses you can use gossip. The way we ran right now, the Cortex, it uses gossip to communicate among the Cortex, and the ring data is just propagated lazily to all the Cortex so that you can do the replication properly. Yeah. Console is only required for h a u n. If you are not using h a, you don't need console. Ah, okay. Perfect. Nice. Yeah. Do you want to dive more deeper, or do you want to get hands on first

42:23 on your machine? Okay. Let's I mean, I think the problem with diving deeper is that we're limited by my knowledge. So why don't we get hands on? And if if anything that people have in the comments, we can try and tackle that. Now we have had a which has been tackled by I'm assuming one of your colleagues. Rick asked, how does the member list ring works with Cortex processes split out? Where should the join members value be set? Contacts using help chart supported there. I mean, I don't understand that question, but if you do, maybe

42:56 you could try and explain that. So should I point join members to all Cortex or I think there's a config somewhere here. Yes. So in the member list, we are saying get the DNS records of Cortex. So it asked the DNS for all the entries of Cortex. And on the seven nine four six port, like, it is the default port that we use for member list. You can change it in the Cortex config. Okay. I don't think I was sharing my screen. My bad. Let me go through that again. Yeah. I was just about to ask.

43:36 Yeah. I was just explaining. There you go. Okay. Yeah. So in the config, we have this member list stuff where we are saying, take the records of this Cortex because in Docker Compose, we are running it as Cortex. So the DNS center will be entry will be as Cortex. And in the config, we say, take the DNS records of Cortex and do all the gossip things that you want on the port 9746. If you are wondering why 9746, let me go to the configuration. Let's go to configuration file. I can just search 46, and it just happens to be the default

44:21 that we used. You can change this member list con config to bind to any port and then use that port here. So that's how you specify. So that DNS plus Cortex, is that doing, like, a a headless lookup fetching all the IP addresses that resolve to Cortex and then discovering them that way. Right? So okay. I'm doing gossip for this port. Okay. Got it. I hope that answers. Yeah. I I I I understand that. I I don't know if it answers the question specifically, but, Rick, feel free to to get back in touch. And I know that has

44:57 also replied, so hopefully that covers that. Why don't we get a hands on just hands on again, and then we'll we'll take it from there. So I should be able to replicate this. Right? Yes. That's that's what we're gonna do. So Yeah. You might want to now clone my demo because the one in the upstream is little different. Alright. Let me SSH onto the first node again. And we wanna clone h two c s github dot com code sum/Cortex. Cortex. Right. And a Cortex code sum. Alright. Yeah. You go to Cortex. Yeah. Just always my typing that lets me down.

45:00 Walking through the demo ourselves

45:51 Okay. So You can go into development and then oh, why do yeah. You should fetch the demo branch. My bad. The master sync with the upstream. Demo. Right? Yeah. Okay. And you can just do dot slash compose up, and everything should be up and running. But we have to scale up Cortex as we saw before. Okay. And that's just gonna run that's which we've already went over. So it's gonna run many of which gives us an s three compatible blocks blob storage on this. We run Cortex with the configuration. We're running a load balancer for the Cortex.

46:39 Yes. And then we have our two Prometheus which is our h a scraping. We've got a node export of which I'm assuming we're just grabbing some metrics from. And then we have Grafana and we have Console. So we we covered all this. We can run compose up. And it's just gonna pull down some docker images. But again Yeah. It should be pretty quick, I hope. Yep. And I'll just put that repository back on the screen. So if people wanna play with us own time, you can clone that repository, check out the demo branch, and run this

47:09 yourself. Yeah. Also, if you want to just run the upstream demos of con Docker Compose, you could also do that. It's straightforward. It It explicitly mentions Cortex replicas, but in this demo, you can dynamically scale up and scale down. That's the only difference. Okay. Cool. So is the there we go. So the ports are being used. So I should probably shut down the other stuff. Yeah. Yeah. Yeah. It's using the Prometheus port nine zero nine zero. Cortex. Yeah. Cortex was already done. This is. Yeah. Okay. Cool. Yeah. So let's just run that again. Alright. And it's up. Yes. So we want

48:06 should be getting some error message. Yes. We want to scale up the Cortex. The error message here is just saying that could only find one and we need at least two. And that's just so that we have a core among the Cortex cluster. Right? Yes. Yeah. So if I go to the Cortex code some development demo, and then we could do a docker compose scale Cortex five. Right? Yeah. We can do five. It should Cortex equals to five, I guess. Ah, cortex equals five. And then we should see I guess it might just take a few seconds, but the

48:43 error message has gone away. We can see now other things and stuff happening. So Yeah. Yeah. And to see everything in action, you can open this IP on port a t t on a browser and see all the Cortex running and the status. Oh, why is it back gateway? Can I see the terminal if it said something? We're getting our connection refused upstream. No. That's wrong. I just ran that. You could try again. I'm not sure. Let's let's take a look. Right? So we could do docker compose p s. Let's just make sure everything is running.

49:35 Yes. Everything is fine. Just see. Load parents search. Yeah. Yeah. It's definite let's see. I'm usually good at fixing broken stuff, so let me see. So is there something to worry about? Should we just is this a red heading? But we have No. That's fine. That's totally fine. K. So the only problem we have right now is engine x. So let's take a look at what's going on with this load balancer setup. So if I do a docker compose e x c load balancer bash. We'll take a look at the engine x config. And let's see. Cortex dot YAML.

50:23 Of course. Let's just. Yep. NGINX config. Yeah. NGINX config should be. Yeah. It's pointing to Cortex. Okay. So it's just an approximate pass to Cortex as a DNS name. So let's curl Cortex. And that looks okay. Right? Yes. If if you do Cortex eight eighty. No. Cortex should be on port 80, but Yeah. So from the hosts, let's try local host eighty eighty, that gateway. That's interesting. Because we are pointing the load balances 82 local host eighty eighty. Yeah. Okay. So why is NGINX unhappy even though it looks like to me it should Yeah. Be working

51:26 upstream. That is really weird. I'm gonna restart NGINX. I'd I think we want to compose down once because it failed initially or did it not? I don't remember. Yeah. I guess it yeah. Let let's do that then before I start just kicking things. Okay. Yeah. You can do dot slash compose down. Yeah. Dot compose down. Yeah. I'm trying my best. My brain says one thing, my fingers type another. It's a common problem. Alright. Let's run this back up again. And we have to scale up again. Hopefully, everything is fine this time. Yeah. Let's scale this up again.

52:20 K. Cortex. Quotes on development demo. Oh, let's just scale. Cortex equals five. Alright. So we should see those adders disappear again. This looks good. There. Yay. Transient error. Yeah. Alright. So what we're seeing got ingested ring status, we'll show you that we are running five of them. Alright. I mean, that's pretty simple then to scale horizontally. Like, I mean, can I take any number there? Like, if I'd scale this let's say I do 50. I I that's probably a really stupid number. I'm no instantly regretting that. Yeah. Let's not scale down now because scaling down

53:13 requires you to scale down one at a time. So that's going to take a lot of time, so let's play with 50 right now. Okay. Sorry. I should've run that past you first, but, I mean, that's pretty cool. We've just scaled up our Cortex to 50 notes. And they're they're slowly coming in active. So Yes. Nice. Okay. So yeah. If we submit a little bit. Alright. Good. So I think still a couple pending, but it's it's getting there. So Yeah. We have 50 Cortex running. So that's pretty cool. Like Yeah. I could just scale that to the and what are

53:52 the scaling constraints on Cortex then? Like, when do I scale up Cortex? Is is it dependent on how much data I have coming in? Does it ingest control, flow rate, like, cardinality? What are the the constraints there? So there are a few things. Like, one is the right, which the like, I can actually go through the architecture right now and talk why when would you split it into microservices mode and what component is hit by what. Yeah. Go somewhere. That would be great. Yeah. I'll actually share my screen. It will be easier to show the architecture with that.

54:04 Cortex Architecture Overview & Components

54:29 Yeah. Am I is my screen visible now? Yep. It is. We can see the hash string. Yeah. So where is the overall architecture? Sweet. Yeah. So when you've ran a single Cortex binary, it is actually running all these components inside a single processor. So these are all the components that you have. So in the right path, we encounter distributor and ingestor. The ingestor is the final place where their data goes. It stores it in the memory. Ingestors have the tokens, etcetera. And distributor reads the ring, determines which ingest us it has to go to, and then it just sends the data to

55:26 the ingestor. So if we take the link page when we have five Cortex, or let's say now we have 10 Cortex, when you hit a Cortex for ingesting the sample, you hit the distributor component, and it selects three of the Cortex, and it goes to the ingest component of the Cortex. So that's the right path. And when we go to the read path, the query front end does some query optimization of, like, splitting queries into multiple queries to parallelize the queries and all other stuff. And then sends those parallel queries to a component called Querier.

56:05 So Querier actually directly uses the PromQL engine that we have in Prometheus, so it's % PromQL compliant. So courier ask for yeah. When ingestor has data which is old, it pushes to the or the objects that they have. So Querier gets the recent data from the ingested old data from the store gateway. Store gateways design the component to interact with the store the object store. Querier gets the data, runs the queries, sends back the data to query front end. And because query front end gets the data back from multiple queries, it, again, merges the data and sends back to the

56:46 querying tool of your choice. Compactor is just another service which merges all the data here. So now talking about where would you like, all that I explained right now, does that make sense? Yeah. I mean, I have one question looming in my head, and it's that we're scaling Cortex as a single binary. Is that how I scale it? Or can I build those components in isolation? Do I scale them all together? Yes. I was going to get to that thing. So let's say what are the bottlenecks when you scale, like, run Cortex at huge scale, and how

57:01 Scaling Strategies: Single Binary vs. Microservices

57:21 do we split this into multiple things. So in the right if you have only a right heavy Cortex, then the distributor and ingestor will be heavily loaded. And if you query anything because it's a single context running, your queries might suffer because of lots of rights. Now let's see the other way around. You have lots of heavy queries running, expanding all days, touching all the Cortex instances. And then because of that, it takes lots of CPU, your rights are suffering here. So you could scale out her Cortex horizontally with the single binary as much as you

58:01 can as long as both of these are working fine. But when one of the read path or write path is affecting the other, there, she would want to split this into multiple components. So what's beautiful is you use this single binary itself to run multiple components. For example, I have Cortex. By default, it has something called target equals to all, which is the single binary that we ran right now, which runs all the components inside the single process. So you could just say just run the ingest. Ah. And you don't don't want other components. Nice. That that way you can

58:51 yeah. I think we have yeah. That way, this is, again, a Cortex binary Cortex binary Cortex binary, but you are running distributor in just a separately. And you get to one of the distributor. You have a load balancer in front. Like, you can have this is a multitenant system, but it Cortex does not itself implement any kind of authentication, so you need some kind of authentication in front. You go to one of the distributors, and everything is like before. You can consider this as one of the Cortex, which takes the sample, flushes the blocks, it goes to the main

59:28 IO. And in the query path, again, this is a Cortex binary, but target as query front end. Another Cortex binary target as courier. So and once you have this architecture up, if you have if you are getting more and more rights and not much reads, you could just scale up these distributors and these ingestors. And one rule of thumb that you could follow is what we follow is you can limit the amount of memory each ingest is taking or the amount of series each ingest is handling based on your machine size. So whenever you are hitting that limit, you scale out the

1:00:07 investors. And if distributors are not able to send samples to investors, you scale out the distributors. And query, yeah, query it. The more queries you are getting, you just scale out the quarriers query front end, you can just do it to write on. We cannot scale out query front end right now, but we would be able to do it soon, I guess. But I I guess that's pretty standard practice. Like, if I sync up at all the production systems I've operated over the years, like, the right path is where the traffic is. You know, the queries of the dashboards

1:00:38 that we open when things go wrong and not necessarily something we'd we'd have to, you know, scale independently. Yeah. Yeah. But when what happens is when you have this long term storage, a few folks like to have a dashboard open all the time, which might span multiple days. So it's constantly hitting the server with heavy queries. If you are running if you have one of such users, then you could see yourself splitting. Like, you could even have a binary running both distributor and ingest. It's not like you want to run a single where if you want to split, you don't need to

1:01:13 run a single component in in a binary. You could have distributor ingest as a single binary and only split out courier outside. You can also do that. Okay. So can we there's there's one thing that we've not really looked at yet, but we are running many. Right? So all of our data right now has been stored on an s three compatible store. Right? Yeah. If I said it stays in the industry, in the memory, and the writer log. After some time, yes, it goes to the s three compatible storage or GCS or anything else. So stays on

1:01:32 Storage Backends (Object Storage)

1:01:48 the right head log until it has to be flushed to disk, and that flush to disk is entered to s three compatible store? Yes. Okay. Cool. I like that. It's very cool. Yeah. And what's beautiful is we are reusing the Prometheus data storage. Like, Prometheus has blocks of data. So we are having a we are recreating the exact same blocks in these ingests, and we are just flushing those blocks in the as compatible stories on GCS. Okay. And as Cortex like, on the query side then, is that doing any sort of cache to avoid lookups

1:02:20 Caching Strategy

1:02:26 on the on the store? Correct? Yeah. So the query front end has cache, which cache cache is based on the time range and the query that you have. So it can also do partial querying, like, for in Grafana dashboards, you can have yeah. You can when you refresh after every five or ten seconds, you don't need to query for the entire range, but you just need to query for the last few data points. So this cache even takes care of that. So we just query the only thing that we require. Nice. That's I like it.

1:02:59 Yeah. And you also cache, like like, couriers needs to fetch fetch the data from store gateway and ingestor. So courier also caches all the data that it can fetch from these. It query front end caches the final result. Couriers caches the raw data, so we just cache everything that we can. Cool. Excellent. Is there anything you wanna cover then before we Yeah. I would yeah. For users who want to try out, like, I will just switch to master just to show what's there in upstream. If you want to, like, run these docker comp docker compose stuff in

1:03:46 your own machine, in the development, we have three things already set up for you. I'm not really sure about this thing. But the single binary mode will run the individual cortex in a horizontally scaled, which you will have, I guess, two or three cortex running, but in a single binary mode. And if you want to run this setup where you want all the individual components separately, so you choose this. There, if I want to quickly have a peek, Yep. Here, you will be turning the distributor and just everything else separately. So if you want to play around with

1:04:36 single binary or even the everything split out version, you have it ready. Yeah. Alright. I guess that's it. Nice. So let's let's let's finish with like a I'm trying to think like a pragmatic approach to people with their their metrics. Right? The vibe I'm getting from the conversation and what we've seen here is that it's probably realistic that people are gonna start with Prometheus in their cluster, And then there's gonna be a certain point where they want to then bring in Cortex for the Yes. The durability, the high availability, the multi tenancy, which is really cool Yes.

1:05:00 When to Adopt Cortex: Pragmatic Advice

1:05:15 And all these things. Like, so is that the best approach for people to take is to, you know, run Prometheus and then when x y z happens, that's when you go right. You you need to start bringing in Cortex now. Yeah. If you have small Prometheus installations and like, if you care about centralized view and all the things that we discussed right now, Only at that point, you may want to start looking at Cortex. Until then, if the scale is small, you could be all good with Prometheus because Prometheus scales really good. But but but vertically. Right? That's when we

1:05:48 talk about Yeah. We're talking about Yeah. You know, vertical. So if you can pull loads of really strong firmware hardware out, then cool. Right? Exactly. This is gonna get you a long way. Exactly. Nice. Okay. And the does it make sense to just even adopt Cortex purely for the s three compatible back end as well? Like, you know, I'm assuming that just make them enough of a selling point where people wanna be able to rotate the hardware and still have that back end data So s three could be a really good stop gap for that too. Yes. You

1:06:19 could just have Cortex just as a, you can say, backup data which takes the data. But you could also have a look at Thanos. Like, if you just want the data to be backed up somewhere, Thanos has more simplified thing where it just shifts the block to your s three compatible storage. And if you want to query later, you can just add one more component. Wow. That's that's completely serendipitous. Yeah. That's a is that I have Thanos one week today on the show. We're taking a look at that there. So, you know Yeah. People can come and take a look

1:06:26 Cortex vs. Thanos (Brief Comparison)

1:06:53 at that. Yeah. Both have different slightly different approaches and different goals and you have huge scale and want proper centralize with multitenancy Cortex makes proper sense there. Awesome. Any final thoughts before we finish up for today? Just try it out. It's pretty cool. Like, the initially, when Cortex came, the everyone saw the architecture diagram, they got scared. It's pretty very complex. But if you saw it here, you could just do dot slash Cortex and you have Cortex up and running. Yes. There I mean, I'll personal experience here. There is an intimidation factor when you're so many of the talks

1:07:11 Final Thoughts and Conclusion

1:07:32 are really you know, they go deep into Cortex and all the different components, and it can be a little bit intimidating, especially when you see people talking about the Cassandra components and all this other stuff that can come in. But I think what we've seen today from your demos and from what we did hands on is that that's all optional. Like you can get started by running the single binary and running for me Yeah. Something that we're all comfortable with. And Yeah. You can then layer on those extra components. You can bring in console when you need h

1:07:58 a and d duplication. You can bring in s three when you want extra storage options. You can bring in Cassandra when that is available to you. So I think Yeah. You don't even need Cassandra. You don't at all need Cassandra with us. Right? Yeah. I think I'll avoid that whenever possible, to be fair. So Yeah. Alright. That was awesome. I I'm really impressed. I think it's a fantastic project, and I'm really looking forward to to seeing what comes next. Thank you for joining me today, Ganesh. It's been an absolute pleasure. I hope people take a lot of value from this and have

1:08:30 a great day. Yep. Thanks for having me. Cheers. Adios.

Technologies featured

Meet the Cast

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Comments, transcript, and resources

More from Rawkode Live

View all 173 episodes
Prometheus

More about Prometheus

View all 26 videos

More about Grafana

View all 20 videos
Thanos

More about Thanos

View technology

More about Docker Compose

View all 13 videos