Hands-on Introduction to KRR (Kubernetes Resource Recommendations)

Watch / Rawkode Live Live

The embedded player needs JavaScript.

Open the video stream (HLS) Download captions (VTT)

Expand player Shrink player

Overview

About this video

What You'll Learn

Use historical Prometheus data to recommend CPU requests, limits, and memory settings.
Simple strategy uses peak memory and the 99th percentile CPU usage.
KRR flags missing requests and suggests concrete values for each workload.

Natan Yellin from Robusta walks David through KRR, a CLI that queries existing Prometheus data to recommend CPU and memory requests and limits for Kubernetes workloads. They compare it to the VPA, demo the simple strategy, and discuss future directions.

Chapters

Jump to a chapter

Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

2:29 Introduction and Guest Welcome

2:29 Hello, and welcome back to the Rawkode Academy. Today is an episode of Rawkode Live where we are taking a look at a new open source tool from the team at Robusta to help you improve your Kubernetes resource optimization. To guide us through our demo today, I am joined by Nathan. Hello. How are you? Good. It's a pleasure to be here. Yeah. What is this? Stream number is three together, I think. So, you know, five months ago, we looked at oh, what was it? Kubernetes monitoring. And a year ago, took a look at robusta. So pleasure to have you back.

3:06 Pleasure is mine. Yeah. I it a lot has happened since then both in terms of Robusta and also my personal life. I just have a newborn baby at home now, so it just feels like it's been so much longer. Well, congratulations. It's a wonderful thing. A hard thing, but a wonderful thing. Yes. Thank you. For anyone who is you know, hasn't seen any of our previous episodes as in familiar review and your work, which, you know, shame on them. You're doing some great stuff right now in the Kubernetes space, and people should definitely be checking out your

3:30 Introduction to Robusta and its Mission

3:39 blog and stuff. But do you wanna give people the TLDR and share whatever you'd like to share? Yeah. So what we do at Robust or rather, I'll start with the problem. The problem is that there's two problems. The first problem is that people have all this observability data. Mean, sometimes terabytes of observability data that's sitting there. But when there's an incident and when something's going on in production, then finding the right thing to look at finding the right thing to look at fast is often very difficult. So what we do at robust is to try and help you take your observe observability

4:12 there and your existing observability data in Prometheus and other places and to extract from that meaningful, and actionable insights. And there's several ways that, like, that comes about. So we have an open source project, that takes your Prometheus alerts and then, adds on extra data and ties it to logs and so on. Then there's KRR, which we're gonna look at today, which is about taking your existing Prometheus data and using that to now extract insights about efficiency and about CPU request to memory. And, of course, we have a commercial platform as well. Awesome. Well, I think we have Pavan and Elbe

4:49 Challenges with Kubernetes

4:53 at the chat. Elbe says congrats, and Pavan says hello, who I believe is on the robust dot team. Right? Awesome. So today, as you said, we're taking a look at KRR, a tool for Kubernetes Resource Optimization, trying to help people with scaling workloads within our cluster. But maybe before we dive into, like, the demo and talk about what KRR does, we can kind of understand what the challenges are right now with workloads with Kubernetes and, you know, resource requests and vertical pod auto scaling. There's a whole bunch of stuff that isn't really there by default. Right? It's

5:13 The Challenge of Kubernetes Resource Optimization

5:28 not like you just create your deployment, you get your pods and and things magically work. I mean, they could, but as your cluster grows and more workloads come on and the older your cluster is and the more teams you onboard, there's a plethora of challenges that are inevitably gonna come your way. So, you know, what's your experience of working with Kubernetes, onboarding new workloads, and where do things start to click and fall apart? So there are a few stories here. The first story is that we we had this. You'd expect this as we move to the

5:59 cloud. Right? And we adapt all this great technology, and we adapt Kubernetes, and we adapt all these things. Then you'd actually see that we're as, like, an industry, we're becoming more efficient, and we're becoming more reliable. But in fact, if you look, like, at the past year, then we've only seen the opposite happen with incident after incident after incident. Just last week, Instagram just this week, I think Instagram was down for a couple of hours. And, like, what you see is that we're not like, we'd expect that we're adopting all this great technology. You're going to Kubernetes. You're

6:28 getting all these great things, and then your system is becoming better. But instead, there's all this added complexity. And I think a lot of that complexity is just ifying. It's good. You're getting these benefits. You're getting, like, all those scaling. You're getting all this stuff. But to set it up and to configure it can sometimes be a little bit bit challenging. And then if you, like, double click on that and you zoom really in, like, specifically on Kubernetes and scaling, then there are two things that you're always trying to balance. So on the one hand, you wanna have something

6:54 that's really efficient, so you don't wanna allocate all these resources that you don't need. On the other hand, though, you wanna guarantee reliability, and those always come hand in hand. And there's, like, a trade off between them. And what I mean by that is let's say you have one application that requires, like, 10 c that requires one CPU, and occasionally, it requires two CPU. Right? Then if you give it two CPU, then you're gonna satisfy the reliability. Right? And maybe sometimes it speaks up it's can spike to three CPU. So give it 10 CPU. Give it a hundred CPU. Right? You're never

7:24 gonna have a reliability issue, but you're gonna have a cost and efficiency issue. On the other hand, if you take that same workload that requires one CPU most of the time, you just give it one CPU. Now you're gonna run into reliability issues when it needs two. And all of the different technologies involved in, like, an HPA and the VPA and scaling and cluster other skin and all that is about trying to balance those constraints, and you balance it each time in different ways. But there are always, always trade offs, and I think that's important

7:52 to understand that that you will never have a % efficiency and a % reliability because by definition, there's a conflict between the two of those. Because with reliability, you're looking at the 1% edge case. And with efficiency, you're looking at the 99 case. Yeah. I mean, I guess to try and summarize that in, like, a single line. Right? To understand that trade off is that you you could get really good resiliency and redundancy and all of that stuff on the happy path, but it's the expensive path. Like, you over provision 300, four hundred, six hundred percent,

8:10 Tradeoffs

8:25 whatever you want. If you've got the money and you wanna throw it away or set it on fire, sure. Go for it. Like, yeah. You're you're gonna have a better time. Maybe not an easy time, but a better time than most. However, most organizations can't just over provision to that kind of level. They wanna reduce costs when they migrate to the cloud. In fact, cost is usually a portion of why they're doing this. You know, they're making the swap for operational expenditure versus capital expenditure or whatever that term is. I'm not a business person. So then

8:55 what they wanna do is, well, they set all these resource requests so that they say, okay. I expect these workloads to do this. We have a buffer of 20%, thirty %, whatever that is for your team. And then you try to scale within that and scale your cluster and your pods at the same time. But that's the tricky path. That's the hard path because it's your workloads constantly move. They constantly change. They constantly nodes or ephemerals are not always gonna be there. All of these things really start to compound and provide a pretty strong challenge

9:22 for your platform or SRE team to keep the cluster healthy, the workload's healthy, and your customers happy. Yes. So you can choose two out of three. Right? It's always choose two out three. So you have reliability, efficiency, and simplicity. And I say you can choose two out of three because you can get close with efficiency and with reliability and, like, kind of trying to have both of them, but then you go up in terms of complexity. Like, if you think of a think of your cluster as a pizza party, then either and you're throwing a party and you don't know how

9:49 many people are gonna show up. So either you order more pizzas and you end up throwing some of them away, that's the trade off with efficiency, or you order, like, the lower bound of pizzas, and then people come there hungry, so that's your liability. Or you do something complex where you're, like, you're counting how many people come in the door and you're looking at pizzas, then you have delivery guy coming back and forth every hour, and that's the trade off for the simplicity. But let's take that now into Kubernetes, and, like, let's zoom in on Kubernetes.

10:10 Understanding Kubernetes Resource Requests and Limits

10:14 So I wanna take an example. Actually, I wanna take some statistics from a recent cystic study, and they looked at all the people running Kubernetes in production that they have access to based on data center system. They looked at all these giant environments and lots of clusters, and they categorized it by, like, big business and small business and enterprise. And if they're all these companies, what they found is that 69% of all the CPU that people are paying for is actually wasted, and that's an incredible number. And then to translate this, though, people don't see that because,

10:48 like, you open up MENS or you look at your cluster, and you don't see you're not you're not seeing anywhere 60 69% waste. What you see instead is you go and you look at a node, and you see, like, your node is full. And there's no room on your node, maybe you have pending pods. But then you, like, look into the node, you see the CPU is at 10%, and you have low utilization. So to understand all that, I think you need to start at the basics, which is setting just two numbers on Kubernetes. Everything comes down to ultimately setting

11:19 two numbers, arguably four numbers. But we'll we'll start with the two basics. Right? And that's your CPU request and your memory request. And the way to think about this for people who aren't familiar is you're telling Kubernetes in advance how much CPU will you need for this pod and how much memory will you need for this pod. And that's just the request. It's just used by the scheduler to decide which node it should put your pod on. Nice. So so so the reason why we're here today and what we're doing with with KRR is we're just trying to help you set those

11:51 two numbers. Oh, it's a noble a noble mission. I'm curious, though. Right? Like, what where were you setting? What were you doing when you were like, alright. Let's write a new open source tool that goes and does what KRR does. Like, what was the motivation? How how how did you decide this is my next thing that I'm gonna build? So it came up because we heard again and again from people who were using robust and were using other stuff, or other platforms. And they came to us again and again. They said, okay. How can I set CPU requests?

11:58 Motivation for Building KRR

12:26 How can I set memory requests? And what should they be? And the reason it came up is because we have alerts in robust about reliability issues. We didn't do anything related to efficiency, but we have these reliability alerts. Like, you're having CPU throttling because your request is wrong or because you set the limit that's blocking that. So we started getting reliability questions, and very often, the answer to that reliability question was you need to set their you need to set your allocations differently. You need to, like, allocate more CPU up front for this, or you need to allocate more

12:55 memory up front for this in order to make sure it runs smoothly. And we tell people that, then they come back and say, okay. Well, what should I put in for the CPU request? What should I put in for the memory request? And our answer was always like, okay. Well, look at the vertical pod autoscaler. Look at these these other systems. And we kept on hearing from people, no. That's not working for me because of x, y, and z. That's not working for me because I can't actually use the VPA because I'm using an HPA at the same time. Or

13:20 I wanna use the VPA, but how do I know, like, when there's a new recommendation I should apply that's off by 20%? Or I wanna use the VPA, but I wanna get recommendations that are stable over time that aren't constantly changing. So we started hearing about this, and the first thought we had in our mind was, okay. We're gonna take the VPA. For people who don't know, the VPA or the vertical pilot autoscaler, was in I think, like, until KRR, I think, just the probably the only real way at scale to determine this stuff in your cluster determine

13:51 request limits. I'm probably forgetting something. I shouldn't say that. I'm probably forgetting other tools out there. But it was the number one way. Yeah. I mean, there are other tools out there that don't actually integrate natively with Kubernetes like Parker, which can profile our application and tell you what values are over time. But you still have to do the analysis. You still have to kinda look at that and work it out and configure it on VPN and stuff like that. So definitely, there's a lot of room for tooling to improve and make this easier for

14:01 Proprietary Tools

14:21 teams. And there are proprietary solutions out there as well, I should say. There are proprietary solutions, I think, specifically for Kubernetes that also can do it. But there there was no other than the VPA, I think, like, there was no open source way that that we like to do it. And then the first thought that went through our head, and I said to the team, like, we're not developing this from scratch. I I was pretty stubborn for a very long time with the team. The team said we need to develop an alternative to the VPA. I said,

14:47 we're not developing an alternative to the VPA. Take the output of the VPA and add on reporting features, give a better experience, but we are not developing an alternative to the VPA. It doesn't make sense. And, eventually, I caved because we looked at it, and we really looked at it from the product requirements and, like, from the product spec and what we wanted to accomplish with this, and we discovered we couldn't do it on top of the VPA. So I was kinda dragged kicking and screaming and, like, objecting the entire way and said, okay. You know what? We will do our

15:17 own recommendations engine. Okay. So I'm not that familiar with KRR. Is it a replacement for the VPA? We do. That's Sorry. We No. Go on. Yeah. Sorry. I I like so I I seen the announcement. I went on the GitHub page. I kinda give it a scan. You know how I like to do these, James? Like, I I like to come out with fresh eyes and really learn from from you, right, rather than trying to make my own assumptions upfront. But from what I've seen is is a command I run locally, and there's nothing in the cluster. So I

15:20 KRR vs VP

15:50 I I didn't come into this. It's back not to replace the VPA. I saw it to be something that gave me, like, a VPA starting point or something. I wasn't I wasn't actually that sure. So maybe you can correct some of my assumptions then. So the VPA has two modes. One mode is it runs in cluster, and it's constantly monitoring your pods. It's very heavily weighing its recommendations towards the last twenty four hours. You can tune that, of course, but that's the default. And then it's constantly, like, updating your pods. And that we're not doing. Like, we decide,

16:22 at least in phase one, we are not we are not modifying anything in production. We are read only. We will give you a recommendation, but it's up to you to apply that recommendation to review it. And this is a human in the loop process. So we're not competing with the BPA that runs in cluster. But we actually did some studies. Like, I did some surveys on LinkedIn, and we had a valid user interviews. And we discovered that most people aren't running the VPA in in, like, the automatic mode. Most people are running the VPA in your recommendation mode where the VPA gives

16:55 recommendations, and then the human looks at it. And then you go and you apply that. So for that mode, we are competing with it. We just don't make you run something constantly in your cluster. Ah, got it. Okay. Nice. So that that comes back to also, like, why we had to build this ourselves even though I was very very much against it at the beginning. To use the VPA, you have to install something in cluster, and you start getting recommendations in the future. Whereas most people already have a cluster. Like, I have a cluster here set up locally

17:28 now. It's had Prometheus forever. I'm when I show KRR in a few minutes, I'm just gonna run KRR on the historical data, and I'm gonna get the recommendations immediately. And that ties back to what we wanna do with Robusta, which is you have all this observability data. I wanna unlock, like, the goal that's already hidden there. So we wanted to make it as easy as possible. We wanted to give you a command you run locally. We didn't wanna make like, add on any friction for running something in your cluster because maybe you don't have permissions

17:56 to do so. Okay. Well, I guess we can dive straight into the demo, or I can guess what we're gonna see. It's entirely up to you. But based on you've you've mentioned Prometheus. You've mentioned KARs or CLI tool. I'm assuming it's a nice way of running queries against workloads on Prometheus to get top, middle, bottom values across in a certain amount of time and then make recommendations based on top of that. Yep. That's right. So let me just let me clear my screen, and I'm gonna share my screen. Let's see. Yeah. Go for it. And

18:00 KRR Command Line Demo

18:37 let me blow this up. Can you see my terminal window? I am just going to make that happen. Alright. Now we see your terminal. But can you make the font a little bit bigger, please? Yes. I will. There we go. Perfect. And can you see I wanna get over here. Let's see if it follows. So this is just the GitHub. So first thing I wanna do, just to give you a teaser, this is what we're gonna get. What we're gonna get in just a minute is we're gonna get these recommendations. So for everything that's running in my cluster,

19:16 I'm gonna find out what should be the CPU request, the CPU limits, memory request, and their memory limits. And this is based on the historical Prometheus data. So let's go back, though. So the first thing I'm gonna do is I'm just gonna install this. And I can install this a few different ways. I can either install via Brew, or I can install by just cloning the code. So I'm gonna just clone the codes. And there we go. And I'm gonna go back now, and I'm gonna install the requirements. And this is with Python. Of course, you

19:53 don't have to have Python. You could just use you could just use, the prebuilt binaries, so you don't actually have to have Python to run this. Whoops. Clear. And I'm gonna go back. Finally, I'm just gonna copy this command. And I believe in my case, it has to be Python 3.9. And I'm not gonna get anything by default. It's gonna ask me to choose a strategy. And right now, we have one strategy that we call simple, and this is the strategy for how we calculate what the CPU and the memory should be. So the default strategy and the first strategy that

20:30 we implemented, it will find the maximum amount of memory, and it's gonna use that as the baseline for how much memory you your pod should get, the maximum amount that they've used in the past. And for CPU, it's gonna go, and I believe I will have to zoom out to get something formatted nicely. Yep. So let's just run that again. And for CPU, it's gonna go and it's gonna take the 99 percentile. So what we're trying to do here is we're trying to understand based on historical data, how much CPU and memory your pods need. And then by

21:00 setting that number right, then we're going to be able to reduce costs while still maintaining reliability. And that is really all there is to it. So I just ran that, and it's connected to Prometheus in my cluster, found it automatically. So I didn't need to point out Prometheus. I didn't need to do anything. And it went and it looked at each of the workloads running in cluster, and then it queried them in Prometheus. And it found here the historical data, and it's telling me what I should set based on that historical data. Nice. So if I take an example here,

21:33 Analyzing KRR Recommendations

21:37 then here's an example, the Argo CD application set controller. And right now, that has no CPU request, so it's telling me, one, I should set a CPU request, and two, based on historical data, the right amount for this would be five milli CPU. The memory request is on set, but based on the historical data, what this needs is 20 is 23 megabytes of memory. And then for limits, for CPU and memory limits, and there are different approaches to how you set that. And there is some controversy around that around different opinions. But we have some options also that we're going to

22:12 expose for how you can control that. Nice. So it scans that cluster. It gives me recommendations. I can pump them straight into my GitOps pipeline, and off I go. I got the happier cluster. Right? Yep. Yeah. I think what's interesting here is, like, you know, I work guess I'm in a a unique position, right, as a consultant these days. Right? I don't work on my own cluster. I spend a lot of my time working with everyone else's clusters, which is a blessing and a curse. But what I've seen constantly is that people really overestimate their CPU requests on all of their workloads.

22:17 Overprovisioning

22:50 Like, you know, usually, they're straight in there with the 500 millicores, you know, half of half a core. And you're like, like, why? Like, that's that's a lot of CPU to be thrown into your, you know, three meg go application that runs once every fifteen seconds or whatever. And I think just having tools like this that can get people visibility and to actually what the average application really needs from a CPU request kind of view is is much less. Like, I I don't think I've ever seen someone with a cluster that applies five m as their CPU.

23:20 And I think that's very eye opening. I hope it's eye opening for people that you don't need to go nuts with this. And even where it's not 500, they set it to 200 or they set it to, like, one fifty. It's like, it's always very high numbers. And there's I'll call out GKE here for being notoriously bad. But if you run, sorry, GKE. But if you run an autopilot cluster, it actually does, like, automatic VPA for you. And, that over provisions by a factor of 10 based on what I've seen as well. Like, it's it's it's hugely over provisioning.

23:50 Yep. We've gotten some feedback from people that we should over provision a little bit more, to give a little more buffer. So we're actually considering changing it from, like, five m to 50 m. But if you look at the big scale of it, it's not even the 50 m versus the five m. Right? Because how many applications do you really have with five m or 50 m? It's with those 500 megabyte or those one core machines. And if you go back to that cystic study, sixty nine percent is over provisioned on CPU. So take a typical company that has a

24:23 cloud bill of, let's say, don't know, let's say 5 k a month on GKE, right, or on AWS or on Kubernetes. So if you're paying 5 k a month and you're let me just do the math now. And you're 69% over provisioned, then you're paying almost 3.5 k extra a month. So more than half of like, that's slicing more your cloud bill in more than half. Yep. Yeah. I don't know. I I don't think I would go from 5 m to 50 m even if people are calling. I need to speak higher. Like, I would encourage people, you you know,

25:03 run KRR, but then back it up. Run cube control top pods and see the low numbers for the current utilization of CPU. Like, it's never that high unless you've got a Java workload in there in which you've got other problems. I'm really sorry. But, you know, you you don't need high levels of CPU requests for most of them. So this is what I'm actually the most excited about in terms of what we're working on. You get a recommendation like this. The first thing that you're gonna go and do afterwards is you wanna validate this recommendation.

25:20 Visualizing and Validating Recommendations (Upcoming)

25:32 Right? And you're gonna go and you're gonna look at historical graph of this data. And then based on that historical graph, you're gonna, like, draw a line across that and say, okay. How would I perform on this recommendation in the past three three weeks? Right? So we're actually it's, like, being worked on the right right now that I really wanted to show on the show today, but it wasn't on time. But just any day now, we're gonna have this out, is to be able to also show you the justification for these recommendations to, like, show you it on a

26:01 graph and then to draw a dotted line across, say, like, okay. Here's what we're recommending, and then this is how that would perform in historical data. And so that's one of the things that we're working on right now that I'm really excited about. Because like you said, you're not gonna just go and apply this recommendation probably. You wanna see why. Right? You wanna visualize it. You wanna go through and do this whole process. And then when we think about, like, how we how we give you recommendations, it's not just about giving you the recommendation and the number.

26:30 It's about giving you the confidence to apply it. Yeah. Definitely. And, like, if we go back to what you said at the start of this demo. Right? Like, you're doing Prometheus request. You're looking how long did you say the status over? Like, what? Three months or longer? Is it all time? So it depends on your it depends on your prometheus. Yeah. Yeah. So if you got a retention policy of, say, three weeks or maybe it's thirty days, it doesn't really matter. I mean, this is the max value over that retention policy. So but I think that's that's really cool

27:01 for people just to really get that front and center. Like, you maybe don't need to draw a graph in the terminal to even just pop them a link and say, look at this and yourself. It's like, go and take a look and see. And then maybe what maybe in some cases, you know, we do see a couple of large numbers there. You've got your platform relay using 261. Right? That could have been a spike for thirty second window two weeks ago. Like, you really don't You wanna see it. Yeah. You wanna see it. But so I'll tell you something

27:32 that's very interesting. When you look at what's the maximum, actually depends a lot on what time periods you're looking at. So imagine that you say, I'm gonna just go into Grafana, and I'm gonna look at a two week period. And then in that two week period in like, if I'm looking at Grafana, I'm gonna take the maximum, right, of CPU. But it actually like, what that maximum is actually depends on what your interval is that you're sampling at. Because imagine you have a pod that, it, like, runs and it's doing nothing all the time, and then it spikes

28:06 for just one second, and it goes all the way up to 10 CPU, and then it goes back down. And now you take it and you graph this data over two week periods. And when you're graphing over two week period, you're, like, taking one hour intervals and the average over each hour. So you're never gonna see that spike on that graph. It's just gonna be a flat zero the entire time. Yeah. And I think that's still okay. Right? I mean, these TPU requests are a scheduling concern. They're not a runtime concern. It's not like your application can consume

28:26 Scheduling Concerns

28:38 above these requests. So they're they're there to help us schedule or so you don't over provision nodes and you reduce a class of problems that probably don't really need to exist. Well, considering the cost factor of just a finitely scaling the number of nodes that you have. We have a question from Russell in the chat. So I'm gonna pop that up. But Russell asked, does this tool or could this tool, I guess, have a report that shows the idle time of each node? So, you know, really trying to hammer home that point about that 69%

28:56 Q&A: Node Utilization and Idle Time Reporting

29:09 nonutilization. Like, can you show that on a node basis or pod basis for these workloads? I think that that's a good question. Yes. So so not yet, but, yeah, I mean, it's also coming. So when you think and it also matters so it's both on the node level and the pod level. Like, on the node level, you wanna understand, like, what's the utilization? What's the idle time? Right? Like, how well am I doing or is my company or team doing as a whole when it comes to efficiency? Like, am I doing nine out of 10,

29:39 and then I'm not gonna worry about improving this as much? Or am I doing, like, one out of 10, and then I should really go and focus on this? So we don't yet show the nodes, but it's something that's being planned. And then to tie into what you said about the pods, it's not just the number here. Like, if this number matters, it actually is the difference here times the number of pods. Because think about it. Like, if you have an application, a replica set that has a hundred replicas. Yeah. And you have a hundred megabyte difference

30:10 on all of them, then it's actually a hundred times, like, a hundred megabytes. So it comes up to be bigger. Yeah. That those number could get crazy pretty quickly, actually. You know? So you mentioned that this is a simple strategy and that more strategies are gonna come. I'm assuming you're looking into things like linear prediction, whole winters, other statistical proxies for understanding growth over time, averages over time, etcetera. Maybe you could talk about what's coming, or what ideas you've got for other strategies. Yeah. Okay. So imagine I'll I'll give you a scenario, mate. And don't say OpenAI chat GPT. I don't

30:26 Future KRR Strategies and Auto-Configuration

30:52 wanna hear that. I'm not gonna say it. I'm gonna say it. Then I just see Russell said here in the chat that you wanna see over time so you can see your nose on something like 30% average. And then on times, can go up to 70%. So, yeah, absolutely. I don't know if that's I don't know if that's something we're gonna expose in the CLI or if it's something we're gonna show in the SaaS platform. Maybe I can just share my screen for a second. And I still wanna answer your question about the strategies. I mean Yeah. Of course. Let me see.

31:28 Alright. So I'm gonna answer the question about the strategies after first, and then maybe I'll show something else afterwards. So imagine you have a workload now, where you're running this workload and, like, every weekend, it does nothing. Right? And then the entire week, it goes and it's, like, burning CPU or it has some psych thickness. Right? So what you wanna do is you wanna apply an algorithm, let's say, like, winters. Right? And you wanna understand the, like, the you wanna understand that there's some psychic behavior here, and, like, there's some period periodicity. Then on certain days, it performs a different

32:05 way. But you can't go and just set a request and then limit because when you're sending a request and then limit, you're sending a flat value. So in order to do this properly, then what you have to do is you have to have two parts that go hand in hand together. On the one hand, you have a better algorithm that understands the historical data. But on the other hand, your output, it has to be richer output than just getting, like, just getting the request in a moment. And then there are two approaches you can go with this. Like, the first

32:35 approach is, okay. We're gonna run write this really complicated Kubernetes operator, and we're gonna apply all these different stuff dynamically in cluster. We're gonna do it time based, or we're do it based on, like, what's coming into Kafka now. We're gonna do it based on the CPU loads. But, actually, all that already exists. Like, if I wanna do it scale based on time, then KEDA has this really cool thing called, like, a time scaler. And I can say with CADA, this patch run however many replicas on this day of the week. On this day of the week,

33:04 it shouldn't run at all. On this day of the week, it should run five times as many replicas. So what we wanna do with other strategies is to have strategies that have better algorithms. And in terms of the output, though, we're not giving you just a request in limit. We're giving you an HPA config. We're giving you a KLA config. We're giving you like, we're generating the KLA or the HPA config for for you based on that historical data. Yeah. I think that would be really powerful. You know, go back to the the example that you were talking about. Let's assume your

33:36 ecommerce store that your traffic sets at five m for eleven months of the year or ten months of the year. Right? But then at Black Friday and over Christmas, you've got these incredible spike in traffic and sales and you don't wanna lose track you don't wanna lose anything. Right? You don't wanna lose money just because your infrastructure can't cope. This is for a whole bunch of some of them really excel because they can detect that the cycle within the data and what's the what the effect and other factors are if you have that kind of data. So I

34:05 think it'd be really cool to have that kind of thing in KRR. And, again, this comes down to people having retention periods and doing downsampling of their metrics, which I think is a whole other world of stuff that people need to get better at. I think Prometheus has has just taught people to have a thirty day retention window, only look back thirty days. But, actually, you need to be done sampling that they unsore in it for long term analysis over twelve, twenty four, thirty six months. And really, if you wanna get good at the stuff, that is, of course. But

34:33 having KRR be able to understand that, work with that data, and create a KRR profile would be really, really cool, especially for these use cases where there are a lot of ecommerce companies where they have to scale up at Black Friday and at Christmas. There are seasonal companies like holiday travel agents that get really busy when people want to pick their summer holidays. There are Scottish people who just drink a lot of whiskey at New Year, and, of course, that's gonna take traffic somewhere on the Internet. Maybe Twitter. Who knows? But, you know, this opens that all up. So that's a

35:01 really cool enhancement. I'd love to see that come. KRR. And especially, who can indicate that because KRR is a really cool project in this space as well. So I can say that, like, as a now as someone who used to be a developer and someone, like, who occasionally has to set the values in these. There's, like, a black art to how you determine the right k to configure, how you determine the right HBA scaler. And what we wanna do is we wanna make it all, like, really simple. So if I I take it back to, like, where we

35:30 started this conversation and people are moving to the cloud and you're moving to Kubernetes and the cloud is supposed to bring you all these awesome cost savings. And then you look at it, and you have these nodes that are at 30% utilization. Right? Mhmm. It didn't deliver on that promise. And a big part of what we're trying to do is with open source software and with your existing observability data, let's say, okay. Yeah. There's some added complexity here with Kubernetes and, of course, with the cloud and with Keta and with HPA. All these things add complexity,

36:02 but we can get you to this maximum that is way, way better than where you were when you were running on prem where you had your own data center. But you have to be able to take that data in to be able to extract meaningful insights from that. So is this that's something kicking around in my head. And I know you've said, right, that you were dragged kicking and screaming into not working with the VPA. And you've also told us on the stream that, you know, you don't it's read only. You don't want to modify a cluster.

36:32 But I'm sitting here and I'm looking like, I have a robusta in my cluster already. Right? You know, there's there's an operator that does things. Would you not would you not look at just kind of broadening the scope a little bit and trying to who can to, like, create in these horizontal pod things, updating objects in Kubernetes cluster. Is that not something I mean, you're you're you're right there. Right? It's like open that door. Are you gonna open that door? You gonna just keep it shut? I'm not sure. Okay. So the big question here is GitOps.

37:07 I mean, but I will put GitOps aside for a second. Okay? We're open to generating these HPA like, to generating HPA or to generating a KDA config. We're open to creating, like, the what you need in order to scale. What we don't wanna do is we don't wanna be the one who's, like, constantly watching that pod and then scaling it. We wanna reuse what's already there in KTH or HBI. Okay. Cool. I think that was closer to a yes than a no, but I'll leave that up to interpretation of anyone watching. So I Did you sorry. On you go.

37:47 No. I there's is there some lag here, or is it just on on my end? I'm curious. No. I've noticed a little bit of lag too. I don't know if it's just the Internet. I don't know if it's the streaming software. I don't know if it's the weather. Who knows? But so far, I don't think we've dropped anything, so I think we're okay. You said you were possibly gonna show something else. Would you like to do that now? I mean, so we're also sending this data to the robust SaaS platform. And, I mean, I just wanna show a few

38:09 KRR Integration with Robusta

38:16 other ways that you can use this actually. So I wanna start with stack reports. Are you sharing your screen? Sorry. Yeah. I'm about to start sharing it. Alright. Okay. So I wonder if the DAG might be on my end because my browser is freezing up. But Let me see. One moment. I'm just gonna pull this up. Yeah. Of course. Okay. So I wanna start, with another use case that's come up that we've heard from a lot of people, which is okay. I ran KRR, and I, like, I got a report once. But what happens if two weeks from now,

39:41 some new applications, like, is deployed to the cluster or something goes way off, two weeks from now and something changes in one of my applications? So what we've actually done is we've done an integration with our other open source software for robusta. It's, like you said, already running cluster. So we can then do a scan on on a specific schedule that you set, like, let's say, a week. And then we generate a PDF report for that, and we just send it over to you in Slack. Can you hear me, David? Yeah. I think you may be dropped off

40:22 for a second there now, but you're back. Okay. So you generate a PDF report and you send over? Let me let me refresh my screen for one second. I might drop out of the call, but I think that'll fix the issue. Is that okay? Of course. Okay. Welcome back. Yeah. Okay. Am I back in the livestream? You are indeed. Okay. So let's try that again. So, yeah, what I was saying is, one thing that we want you to do is we wanna help you put this on autopilot and then kind of forget about it until

40:59 there's something else that requires your attention. So one thing that you can do that I don't think we're highlighting enough in the main repo yet is you can configure a scan that runs periodically, like, once a week. And when that happens, then you just get a report here in stock, and you can go over that and and then see any new recommendations. Oh, nice. Okay. So Robusta and Cluster does the KRR scan for you, and then you can get a nice little report whenever you want, showing you how good or how bad your current setup is. Right? Exactly.

41:35 So we're sending that to two different places. The first is to stack for people who are just using the open source side. So we wanna make sure they have a way to get periodic weekly recommendations. And then the other thing we're doing is for people who are using, the UI, which is part of our SaaS offering, then we're also sending stuff, over here. And here, you can see the same recommendations, in the SaaS platform. Oh, cool. Nice. And there's value add to the SaaS platform now then. Yep. Mean, I think, like, philosophically, then we wanna make sure that we're always

42:12 balancing the two things. Like, on the one hand, adding functionality to the SaaS, but on the other hand, making sure that people also have a workaround and they can still get this in a pure open source way if they want. Yeah. And I think it gives people the ability to kinda onboard themselves to this flow at their own pace. Like, if you just want to run KRR once a month from a developer's laptop, sure. Right? But once you get a bit more, I don't know, confidence and you wanna start pushing that more through more automation,

42:25 Scaling KRR

42:42 you get the report. And then, you know, I'm sure we'll be chatting in six months time when robust is just hooking into Kira and doing other cool stuff too. So looking forward to seeing that in the future. I wish I wish we were there already, but it's gonna take some time. But I it's one of the things that most excites me being able to, like, even just speak to people at conference and just having a casual conversation. Someone says to me, like, okay. Well, how should I set like, how should I scale this microservice that's new here? And just

43:11 say, like, run one command. You're done. It'll check, like, five different strategies. It'll tell you what's most optimal. It'll give you the menu. You can look it over and choose, or you can take the first one if you don't wanna think about it. I wanna be able to say to people, like, you're now speaking as someone who's, like, an advocate for Kubernetes, I wanna be able to say to people, like, scaling is a no brainer. Like, there's you don't even need to think about it. Like, you wanna configure all the scanning? Just run one command. Don't think about it. You wanna

43:39 configure KDA? Just run one command. Don't think about it. Nice. Awesome. So I've finished with one last question. Right now, we then actually dive into, like well, we did. Right? It's it's looking into the metrics of the I guess, this is from metric server from the Kuplet where it's looking at CPU utilization and the memory utilization. Is there a future where it tries to go beyond those metrics and actually, like, take a look at the it should be latency to understand, well, actually, you're only using this much CPU, but if maybe that's because there's other things

43:50 Future: Beyond CPU/Memory & Capacity Planning

44:17 in place, maybe the CPU limits or whatever or contention, but your latency is quite high. And it's like, well, okay. Well, you could probably bring this latency down by allocating more CPU or maybe even it's running on a different type of hardware. Like, maybe how maybe there's cloud awareness that comes into KRR that says you're running on, like, a XS tiny box on Amazon with, a one megabit network card. Maybe you need to start looking at bigger ones. Maybe you need gigabit A four nets. Maybe you need multiple A four nets because you've got a lot of contention on a single

44:46 box. Maybe you need to switch to ARM process. I don't know. Like, there's a whole wealth of stuff out there. Like, scheduling isn't just taking the hardware that we have. It's capacity planning and looking at how do we improve that. And I wonder, maybe it's not in scope now, but maybe it's something in the future that could be a value proposition for KRR and robusta too. So you're the second person in two days to say that to me about each of the group agency. So I think there is something to it. It's not I think in the first drop, we're

45:12 gonna be looking at like, if if I think of how we're building this, then, okay, we have base recommendations today. The next thing to do is to get better explainability about why we're giving recommendations. We haven't done. It's, like, really gonna land any day now. So you can see that graph. You can understand why we're recommending what we're recommending and play with the thresholds if you want to. And then the next thing, like, the next level is to be able to say, okay. I'm not just gonna recommend this output taking, like, a set a fixed request or a fixed

45:43 limit. I'm gonna be able to recommend HPA config. I'm gonna need to recommend the KDA config and so on. And then for me, the dev will come like, one beyond that is then to be able to say, okay. Now I'm gonna look at when I optimize this and when I analyze it, I'm not just gonna look at CPU and memory. I'm going to be able to take a look at this bigger or this picture. And then that's one side of it. And then the other thing that you touched on is there's this fundamental relationship between your nodes

46:10 and your cluster autoscaler in between your pods. And the good example of that is, like, let's say we give you a recommendation that you should set, I don't know, like, one CPU and one gigabyte, but all your nodes have 10 CPUs and 20 gigabytes. Then based on the ratios, you will always have wasted memory on your nodes. So there's a relationship here between scaling your pods and, like, setting up request and limits for your pods and between handling the actual nodes in the cluster autoscaler side. So that is something we're looking into as well. Ah, so, yeah, identifying those parts of memory

46:47 Memory Allocation

46:49 that can't be allocated because the CPU is is utilized. Right? Or not even utilized, just that it's been allocated by the scheduler. So it's never gonna that RAM's sitting there unless you can get a workload that actually takes over. Yeah. That's a nice idea. I like that a lot. But you have to freeze stuff. So you're looking at this, like, 10 dimensional problem, and you can't run on all the dimensions at once. You have to, like, freeze, like, nine dimensions and say, okay. Right now, we're just optimizing for CPU and memory. And then as time goes on, as you build something

47:16 that's more and more complex, then you start adding more, like, you start adding, like, more degrees of control and, like, unfreezing some of those dimensions. Yeah. Well, we have a question about one of those other dimensions, which is important for our stateful workloads on Kubernetes in the chat. Russell is asking about tooling that surfaces the availability of IOPS within container workloads on nodes. I guess that could be another dimension that KRR could hook into. Yeah. Me. I hit a nasty issue with this once where I had a Kubernetes node that was going missing an action. It just would suddenly freeze

47:26 Container IOPS

47:55 up on AWS. And when you looked at it, it was because it was, like, an AWS node with, like, a burstable number of iApps, you used up the burst quote on the iApps, and the whole node froze up. So I don't I mean, you could get that in a graph on the dashboard. Think the team is adding that to robusta as well in the dashboards we have. You can can you get it can you get it you can't get it in KubeCat on top. Right? I don't think it shows you IOPs. No. It doesn't. You would have to

48:26 yeah. It wouldn't it's not it's not surface to the metrics server on the KubeLit. So I don't think, yeah, you're not gonna have access to them. And you can get it probably with the e b b f's based stuff too. Right? Like, with me. Yeah. I mean, I I think from the Kubelet metrics, you may get, like, the flush time and FSN rules. I I don't know if you're specifically gonna get the reason right. It's been a while since I've dug into that, to be honest. So I'm not entirely sure. There may be stuff there, but it may be that you

48:52 have to instrument that another way or use a node exporter from Prometheus to grab extra metrics there. Yep. Alright. Awesome. Well, thank you so much for your time today. Is there anything else you want to cover before we finish up? Just a request from everyone. Like, this is new. It's still in beta from our perspective. So we're testing this out. We're getting feedback from people, and we're taking all that feedback to try and make this better. Like, just this in the past day or so, got feedback from someone, who's using this with HPA. And about an edge case there where, like,

49:07 Call for Feedback

49:27 the HPA was trying to drive stuff in one direction and was trying to reach 50% utilization, but the recommendations we were giving were trying to reach, like, 90 the p 99. Right? So we're trying to really make this, like, the best tool out there for determining request and limits on Kubernetes for maximizing utilization and to do that in an open source way. But to make it happen, we need help from everyone out there. We need feedback. Like, the biggest challenge when you build an open source project is that people take your project and they use

49:59 it, and they either love it or they hate it and say, I'm never gonna use this again, and they just never speak to you. Whereas if you think of, like, a company that's sales driven, then it never happens because you're like, people have to jump on calls with you and you see the feedback. And if you give them bad advice, you hear about it, you give them good advice, you see they're happy. So I just wanna request from everyone out there who's listening to this, please give us your feedback and open up GitHub issues, jump on the Slack channels, speak

50:24 to us, message me on Twitter or LinkedIn. We wanna hear from you how we can make this work for you. And I'd say the number one thing that I've learned also as a founder is when you like, I can't count the number of times that I've had someone say to me, you know, I guess there is some feedback I could give, but I think it's only specific to my company or it's just about how we work. And then you double click on that, and you discover it's something that applies to, like, nine out of 10 people. So even

50:53 if it's specific to your company, speak up and tell us because very often, it's not just you. Awesome. Alright, everyone. If you're listening, go check out the repository on GitHub. Open up issues with ideas, bugs, and make sure you give them as much feedback as you possibly can. Oh, we did get a last question sneak in there right at the end. Have answer one. I have to address it. I I have to. It's an excellent question. And so let me give background on what Goldilocks is for people who don't know. So Goldilocks is a platform on top of the VPA

51:13 Q&A: KRR vs. Goldilocks

51:32 that exposes the VPA recommendations in the web UI, and they also prevent you from having to configure a VPA for each and every workload. So they're doing two things. They're giving you the VPA recommendations in the web UI, and they're configuring the VPA for you so that you don't have to configure it for each and every deployment in your cluster. So I guess the difference between the between KRR and the and Goldilocks is the difference between KRR and the VPA. And just to go over that again briefly, so we don't require installing anything in your cluster.

52:09 You just run a command line. We take the historical data, we can give you recommendations immediately. And in terms of what we're, like, trying to accomplish, the the biggest thing that we're trying to accomplish is to, one, give explainability. So not just to give you a recommendation, but also to show you the data that supports that recommendation and give you the confidence to apply that. And that's something that we're gonna like, just any moment really something for. And then the other thing that we're trying to do is to give more complex recommendations and, like, analyze this historical data, not just

52:43 output a single request or a single limit, but to be able to output something that's richer than that, like HPA, like a Kata config and so on. And then, of course, in terms of reporting features, we're very, like, focused not just on the recommendations, but how you consume them as well, like getting a weekly Slack report, getting the table, like, showing what's the biggest priority thing to look at and so on. Nice. Awesome. Hope that answers your question, Jason. Like, Goldilocks is good. Like, Goldilocks if you're using VBA, you should use Goldilocks too. So me.

53:08 Conclusion

53:20 It's not a project I'm familiar with, but I'll something I'll look into after the stream and take a look. Alright. Thank you again for your time. It's always fun to see what new ideas you'll come up with at robusta. Thank you for coming on to this channel and sharing them with people. I hope people do get involved open issues and give you feedback, and hopefully, we'll see each other against it. So thank you, Nathan. Have a a wonderful day. And to everyone that watched, we'll see you next time. Bye.

Kubernetes Resource Recommender (KRR)

Meet the Cast

David Flanagan

@rawkode

Natan Yellin

@aantn

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Documentation

Code

Robusta KRR on GitHub

More from Rawkode Live

View all 173 episodes

Hands-on Introduction to Odin

Hands-on Introduction to Odin

Hands-on Introduction to Iroh

Hands-on Introduction to Iroh

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Hands-on Introduction to sympozium

Hands-on Introduction to sympozium

Friday, January 23rd, 2026 - Chevron7

Friday, January 23rd, 2026 - Chevron7

Hands-on Introduction to jujutsu (jj)

Hands-on Introduction to jujutsu (jj)

More about Robusta

View technology

Monitoring Kubernetes with Prometheus & Robusta

Monitoring Kubernetes with Prometheus & Robusta

More about Kubernetes

View all 172 videos

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Kubernetes Security Scanning: The 4 Tools You Actually Need

Kubernetes Security Scanning: The 4 Tools You Actually Need

More about Prometheus

View all 26 videos

Flatcar Linux: A Modern OS for the Always-On Infrastructure

Flatcar Linux: A Modern OS for the Always-On Infrastructure

Hands-on with Headlamp: The Kubernetes UI

Hands-on with Headlamp: The Kubernetes UI

Hands-on Introduction to Perses

Hands-on Introduction to Perses