Hands-on Autoscaling with Kubernetes | Rawkode Academy

Watch / Rawkode Live Live

The embedded player needs JavaScript.

Open the video stream (HLS) Download captions (VTT)

Expand player Shrink player

Overview

About this video

What You'll Learn

Compare when to use HPA with resource, custom, and external metrics for cost-aware pod scale behavior.
Set up Metrics Server, Prometheus Adapter, and RabbitMQ-backed external adapters to feed Kubernetes autoscaler inputs.
Walk through VPA installation and update modes to tune per-pod CPU and memory requests automatically over time.

Guy Templeton, co-chair of SIG Autoscaling, walks through Kubernetes autoscaling end to end: horizontal pod autoscaling on resource, custom (Prometheus adapter), and external (RabbitMQ) metrics, plus vertical pod autoscaling.

Chapters

Jump to a chapter

Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

1:21 Introduction & Autoscaling Concepts

1:21 Hello. And welcome to today's episode of Rawkode live. Am your host, Rawkode. Now before we begin, I just wanna take a moment to thank my employer, Equinix Medal. They provide the time for me to invest and producing the show and creating cloud native content for us all to learn together. Today, we're gonna be taking a look at Kubernetes and specifically, auto scaling on Kubernetes. And to do that today, I am join. Well, I messed that one up. I'm joined by my friend, Guy Templeton, an engineer for Skyscanner and the co chair of SIG Auto Scaling.

2:01 Hey, Guy. How are you? Hi. Good. Thanks. You? Yeah. I'm doing alright. I mean, until I I removed myself from the stream instead of bringing you in. You know, that's the joy of of live TV, I guess. So do you wanna give us a quick introduction about yourself, and then we'll talk about what we're gonna cover today? Yep. So as you already mentioned, I'm a software engineer at Skyscanner, my focus there has generally been on certain internal developer tooling, enabling Skyscanner's engineers to deploy workloads and build their code fast, etcetera. As part of that, for the last two

2:41 years or so, I've been on the team that has been building out our Kubernetes infrastructure on top of AWS and migrating containerized workloads onto that. And that's involved. As part of that, I've sort of got more and more involved in the Kubernetes community over the time with a particular focus on sort of cost saving and auto scaling, which is likely becoming one of the co chairs of psychotic scaling. Okay. Awesome. I mean, so let's start with the easiest question in the world. How about that? If I run my application on Kubernetes, does it just magically auto scale by default?

3:20 About Kubernetes autoscaling

3:25 I mean, that's a promise of Kubernetes. Right? Unfortunately not. Kubernetes is clever, but but not quite clever enough to yet to figure out that it needs to scale things magically and how and what where it should scale it, etcetera. So that as with a lot of Kubernetes, there's currently a lot of dials there for people to figure out like, to choose the way their application scales. That can potentially be a bit overwhelming at the moment for users because there's so many dials for them to possibly change and and figure out exactly how best to scale

4:06 their workloads, how they expose the metrics they choose to scale their workloads and make sure that Kubernetes understands those. And so that's that's what we'll try and cover today. Scaling both in horizontal sense and choosing choosing your metrics, setting up those pipelines to get metrics available for scaling, as well as vertical auto scaling. So the idea of adjusting the resources on single pods to to meet their requirements. Okay. So let's continue with a little bit of background and then I'll get my my screen shared and we'll we'll get hands on with this stuff. Now, I guess, the naive implementation and the

4:47 people and organizations teams that are adopting Kubernetes may do is just to, you know, have lots of deployments to the applications and use the replicas field and say that their spec. Set it to a big high number to keep their traffic satisfied. And that's not ideal. Right? You're you're what you just said there was that, you know, you're part of a cost savings initiative. It's guys gonna have to reduce cost. And that means scaling your application with the expected traffic levels at any given time. Would you say that using the replicas field on a deployment spec is almost an anti

5:20 pattern? I would say if you're doing that manually yourself, yeah, that's that's probably an active pattern. I mean, a lot of these tools up the hood will effectively be manipulating those replica fields on a target, whether that's a deployment or similar. But, yeah. I I think if if you're if you're of the having to, like, go in at, you know, say you've got protected traffic spikes at eight in the morning for some reason. If you're going in and, like, an engineer is going in at 07:50 every morning going better scale up, edit, compute, cut, delete, deploy,

5:59 that that's that's probably an anti pattern. And there's there are ways that you can achieve that without having to have an engineer manually edit A cron job. Right? Yeah. I mean, you could you could do that. I'd still say it's not not not the ideal solution. Yeah. I mean, obviously, I'm testing. Right? You know, we we want this stuff to be slightly more sophisticated, which is where HPAs and VPs come in. You know, we actually Yeah. We I I know that you're example. I don't know if we're we're gonna cover it. I'll I'll speculate. But, you know, we wanna

6:32 talk about using metrics from our applications to understand as the latency of the ninety fifth percentile that we respond to our customers, but it was an acceptable SLAs and scaling accordingly. Right? I guess that. Exactly. Alright. So all of the stuff we're gonna cover today, you have published to get up and it's public. Right? So I can share this link? Yeah. Yep. Okay. So we're gonna take a look at github.com/gjtempletonslash, well, I'm not gonna read that out. It's the Kubernetes auto scaling by example. I haven't cloned it yet. As I like to do with this show

6:48 Prerequisites & Metrics Server Setup

7:12 is just discuss what I do upfront, which is Yeah. Almost as little as possible. I don't think it makes a lot of sense for people to sit and watch me spin up a Kubernetes cluster. So we do have a Kubernetes cluster. That's it. Good. It's a sound like that here. Like, I'm gonna run get pods and it's gonna see connection not found or something. But we do have a Kubernetes cluster. There's not a lot going on in this cluster. In fact, it's pretty much oh, there's a few things running on it, but that's not important. There's networking and

7:41 there's some storage stuff, but no actual workloads yet. Yep. Right. Nice. So let's take a look at this repository. Let me try and zoom in on this a little bit so it's still readable. There's a dark mode now, isn't there? Yes. There is under settings and Oh, right. Yeah. It's it's it's pretty good. It's it's great. I'll play with that later for next thing. So this repository is considerable by anyone. Right? Anyone can just come to this repository. It looks like you've documented it pretty well. And one of the things that you said when we're talking about preparation for this episode was

8:21 that I could have just used a kind cluster. Yep. I mean, I have access to bare metal. I decided we're not doing that. So Yeah. And plus, I think sorry, Nico. Yeah. The kind cluster might stress your CPU and ram a bit with all the workloads and scaling, but it it just about worked on my last night. So Yeah. No. I figured if if I threw a bit more hardware at it, we can maybe have a bit more fun with the scaling rules and just see if we can, like, you know, push it through a little

8:49 bit of a speed run maybe. So, yeah, so feel free to check out this repository and follow along on your own time. Now I'm gonna assume, because I like to make loads of assumptions, is that we should start with the prerequisites. Yeah. I mean, it should explain to you what what we expect to have. Alright. So Kubectl, check. Working Kubernetes cluster, hopefully. Helm? Yes. I'm positive I have Helm. Very good. Okay. Cool. Version three. So awesome. And we don't need to create a kind cluster. So prereqs checked off. Now you've broken this down into four specific sections. We've got

9:33 Yeah. Resource metric scaling, custom metric scaling, external metric scaling, and then the VPA. Do you wanna give us a little bit of flavor on one, two, and three there? Why they're different and what they mean? Yep. So Kubernetes, as with all of its functionality, sort of started from the the basic, like, the auto scaling people were most familiar with. So that's, like, CPU or memory utilization. So allowing you to configure a a target and say, I want this to try and make I want to scale this and try and maintain, like, 70% CPU utilization. So some sort of sweet spot where you

9:40 Different types of auto-scaling on Kubernetes

10:15 think you can scale up as more users start hitting your site, but also you can scale down and still keep a fairly good resource utilization so you're not burning money as you mentioned at the start. That that is a core API. And then over time, people obviously realize, well, I've got I've got a lot of workloads that actually that's that's not the best metric. The CPU and memory are, you know, either fairly constant or they're not coupled to the metrics that I care about, whether that's latency or if I know my workloads my pods can serve a certain number of

10:53 requests per second, and then that's that's the metric that I most care about. So over time, that cost effectively, changes were made to the APIs to go through something called API aggregation. The idea being that the API server can then serve up a number can proxy a number of different APIs, and those API paths become metrics dot case dot I o for resource metrics scaling, so the original, and custom dot metrics dot case dot I o for custom metrics, we'll come back to sort of what the definition of custom metrics is. External dot metrics dot case dot I o for

11:33 external metrics, and we'll we'll touch on that a bit as well. But those those effectively should cover all use cases of a metric that you want to scale a workload on horizontally. Sweet. Awesome. So let's start with the the resource metrics scaling. So first, make sure we can access the metrics API. I mean, can can that fail? By default, some setups of Kubernetes will come with nothing installed to serve the metrics API. So Yeah. We can find out how your how your cluster is. Yeah. Metrics server is not available on my cluster. So this should fail.

11:50 Autoscaling with resource metrics

12:20 Yep. Apparently, that doesn't copy. So Okay. Cool. So that's these commands here. Right? Yep. Okay. So I mean So these these command so the second command with the install installation of the metric server, and I'm I'm going to update this later, is you might not need all of those options. And those options are generally set up for a kind cluster. So you're you're using Kube dot and secure and other bits to to make it work with a kind cluster that may not necessarily be required for all installations. So this is a cluster API cluster, which

13:05 is gonna have a self sane c a. So I believe we will still need the Okay. Awesome. And although I don't like the name in security or SSN. Yeah. It's incorrect. Because I think it's those people a bit I was gonna say really Scottish thing there. I think that's because I've got your accent in my ear, but ski west. Like, you know, it's not insecure. We should we need to change our vocabulary there, I think. K. So preferred address internal IP and secure recovered API service dot create true. Alright. I'm just gonna go for it. What's the worst that can happen?

13:39 Let's deploy some metrics over to our cluster, which starts collecting node based metrics. Yeah. And that will now be available. That just might be the pods not there yet. Is that right? Yeah. Yeah. It takes it takes about time for the pods to come up and basically register itself with the API server saying, I'm I'm here and I'm serving this API. I'm assuming that's gonna deploy to the Kube system namespace. No. Maybe in default. By default, I'm not sure. There we go. Alright. I guess I I I should have maybe oh, if I was doing this in production,

14:21 I would have tweaked that helm command. So Yeah. No metrics available yet. Now this I'm gonna make some more assumptions as that this is gonna have some sort of previous ten seconds script interval thing. Is that maybe right? Yes. It does. Okay. I can't remember exactly what the default script duration is. But yeah. It so as you've said, basically, the metric server is responsible for, these days, going out and scraping the metrics endpoints of cubelets. That's why we had to pass in a cubelet and secure TLS wagon to get it working with self signed certs.

14:59 And that's then responsible for serving up those metrics through the API path of metrics.caseio. And as well as node metrics is also the one that's responsible for getting plot metrics through those those same APIs. Sweet. Well, we now have our top command working. Awesome. So we've got loads of course, loads of memory. In fact, we're barely tickling this cluster. Yes. We can have a better fun with this for sure. Alright. Very good. And that takes the namespace flag. Let's just see what's gonna why is copy paste weird? There we go. Okay. Yeah. So as you can see, you're you can now correctly

15:45 get through the API, like, the CPU memory usage of different pods. Alright. So I mean, I don't mean to put you on the spot here and test your knowledge, but this this metric server, I mean, this used to be built and was this previously the c adviser stuff that was deprecated and metric server server replacement for that? So they they the component that the metric server is a replacement for was something called Heapster. And, Yep. That was built in and, effectively, had, like, pluggable back ends so you could I I think the default installation option was using mPlugDB as a back

16:22 end. That had some slight problems and shortcomings with it in terms of it if it if certainly an influx, we experienced a situation where we managed to fill the influx back end because we've scaled up our clusters without scaling up the influx. It tried, and one of the problems it had was when you push data into whenever you created it, it it first pushed data into the back end before creating it back. So it didn't have a hold a local state at all. So when you created it, if it dropped your data, it just went, I don't know,

17:00 which wasn't wasn't the most fun. So, yeah, metric server is effectively, like, built from the ground up for this this very purpose. Okay. Cool. I just feel like this command will work because I haven't closed your repository. Oh, yeah. Let's let's copy that. Go back. It's running. I don't know why my I mean I I was expecting something to go wrong today, but it wasn't copy and paste. It's always copy. I I don't know what it is. I just I don't know how to Mac. Like, I I'm a Linux user. I'm trying to adapt. It's not going

17:47 well ever. I don't understand why copy and paste is so difficult. I'm sure it's trivial, and I'm pressing controls at a command or something. It's perfect. Yeah. Alright. So we're gonna deploy stress deploy. Yeah. And then before I apply it, you know, I wanna make sure you're not gonna run a crypto minor on this. So Yep. This is gonna run program stress. So we're just gonna be artificially hammered in the CPU, I guess, a little bit. Yep. Memory stress as well. Alright. I'll trust you. One. One. Yeah. Stress. Good boy. And a watch on that. So it's gonna

17:57 Resource Metric Scaling: Setup & Demo

18:38 be pulling down the images and it's just gonna create some artificial loads. Let's see why we're doing this. What does it say? So you want me to run top pods. We should see them claiming and then we're going over. Actually, applying our first HP as well. Shall we take a look at that? Oh. Yep. Oh. There we go. Boom killed. Is that supposed to happen? No. And that may well be because the impeller seems to be a lot kinder on kind than it is on actual clusters. So we can we can work around it for

19:17 now by either we can raise the memory limit or we can remove the memory limit. I believe it up to your discretion. Okay. So I guess, I don't think I on any of these episodes, I've ever really spoken about these resource request limits. Yeah. Why don't we collectively put our knowledge together and just talk about what this is? So Yeah. The resource request and limits is a way for us to tell is it the kubelet that is responsible for the enforcement of this? Yes. So the kubelet the kubelet, I think, is maybe slightly off here. It pass passing it

19:59 down to the container run time, which actually does some enforcement. But there's also component of, like, a better bit that's left to the kubelet. So in terms of ranking the order and which to kill pods if resource contention actually happens. Okay. So we're suggesting here by setting these resources, and we'll talk about requesting limits in a second. Yeah. So the cubelet may do something, but it also may just create the c groups with the CRI implementation. Okay. Now what is 50 m in CPU? I've been asked this question before and I've never had a good answer.

20:36 It is 50 millicores, which there is an internal piece called the CFS or completely fair scheduler. There is a long back story to some previous issues with the the kernels maths and how it does. Does it effectively, what Kubernetes does is by default it has I think it's a hundred millisecond time slices. These these millicores are effectively translated into how much CPU time in a given bucket can a pod use of a CPU. So effectively, if it was if I was giving it one full core, aka 1,000 millicores, I would be saying it can use one

21:30 one second sorry, one hundred millisecond the full one hundred milliseconds of that bucket for a given CPU core. This is slightly complicated, obviously, given that in this case, we're definitely running on a big multi core machine. So the CFS then comes in and basically does throttling where if a container uses more CPU in a given slice than it should do, it will get throttled for the next few buckets of time to average out so that a single container given a resource given a CPU request limit can't request limit can't basically use more CPU than it should

22:08 be using. Oh, sweet. That's a really good explanation. I think even I understood that and I'm not the smartest. Okay. Now this is an on kill. So I'm gonna assume that this has nothing to do with our CPU limits and it's our memory limits. Yep. We'll bump that up. Now do you wanna what's the difference between a limit and a request? So in CPU terms, effectively, you're saying for for a CPU request, you're guaranteeing that on average this pod can use the equivalent of, in this case, 50 millicores, so one twentieth, is my math right, of a CPU on

22:56 average over time. In memory, you're base I'm sorry. The limit you're saying, however, if there's spare capacity in the CPU, you can use up to, in this case, the equivalent of 75 millicores of CPU. So if there's contention, if there's if the pods real sorry. If the node is really highly utilized, so you've got a lot of CPU intensive workloads on and all of them have far higher limits than the requests. They won't all be able to get up to their limits, but they'll still always get their requests. And it's the request that's used for scheduling the

23:35 pods onto the node. This is still being on killed. The other alternative because in this in the example HPA, we're only scaling on CPU, not memory. The other alternative you can give you have is not giving it resource request and limits. This is generally a bad idea because it means that sched a, the scheduler doesn't really have much information about how much resource a given polygism is likely to use and that will result in can result in problems because you get contention. Okay. Well, they're happy now. Why is that not showing me the CPU memory one?

24:37 So it's it's the same thing we we came came to with first call in that. It takes a while to for it to first scrape the Kiplet and discover those the the metrics for those pods that have been newly created. Okay. So we'll get that thirty seconds to catch up. What we'll do now is firstly, because I forgot at the start, if anyone wants to try Equinix Medal, then you can use the code Rawkode dash live at metal.equinix.com. 50 dollars of credit is roughly around a hundred hours of compute on a smaller instance. But using the small instance is not very

25:13 much fun. So, you know, spend it quicker and have and and enjoy it. Now we also have a comment. So some path has asked, does the memory or can the memory go beyond the limit when we're running the stress command? And I think the answer is no. We just get when it so when it has the memory limit, the arm killer is just gonna say, hey, you've had enough. You're you're barred and kicks out. Right? Yes. So, yeah, you cannot go beyond the memory limit. And I think based on what guys said a couple of minutes ago as well

25:48 is that you can't you can go past this CPU limit, but you'll be throttled on subsequent request to bring it back under the limit. Yeah. And and so this this becomes a problem for, like, really latency sensitive workloads where if if you're getting loads of requests fired and respond really fast, hit your limit, what if, like, use up more than your limit within a time window. And then in the next few windows, you're you're effectively throttled. That can cause your latency to spike because your your workload's getting throttled, so it's not able to respond or do the CPU cycles. It needs to

26:24 respond. So you can you you should be you're able to, like, lower the window slice to try and combat that. You can remove throttling completely, like, the CFS. Some some users recommend that. Generally, it doesn't seem to be recommended for multi tenant environments. So keep that in mind. Alright. Sweet. So as we can see here, we actually needed just under 250 mega RAM for the CPU stress. We needed almost a gig for the memory stress, but they both are happy now. So that's So it's stressing it's stressing the resources as expected. Yeah. Yeah. I'm happy with that. Okay. So

27:13 what we want to do now okay. So, yeah, we can see they're hitting the resource limits. Now we want to apply a horizontal pod auto scaler or an HPA to make that better. So Yep. Let's have a look. So that's horizontal pod auto scaler, like, you know, Will. We'll we'll go version and a kind of metadata that we expect. Let's just kinda run through the the specs side of this and try and try and understand what's going on. So we'll come back to scale target ref. I'm assuming that's just like a label matcher on what

27:48 we've got. I guess we're covering it now. This is gonna say, hey. What is it we are trying to scale? Is that right? Yeah. Yeah. So API version, so and kind effectively, we're saying it's a deployment in the apps v one spec that we've got to scale. So there's a number of different resources within Kubernetes that all implement this behavior called scale, And that that effectively is what this gets mapped to behind the scenes. Sweet. We then set the minimum and maximum replicas that we want to accept for this. And then we define the metrics that we want

28:29 to use as a benchmark for scaling. Yep. So if I can read this one correctly, then what we're saying is if the CPU of this pod is at 70% of its CPU limit, then it's gonna add on more replicas. Not quite. So there's there is there is an an algorithm effectively that the HP uses. So the the HPA, unlike a number of the other things that signal to scaling owns, that's that's actually contained within the core Kubernetes controller manager code base as a reconciliation loop. That's that's got an algorithm in it, which effectively each time it's evaluating,

29:19 you can configure that on the controller manager. It effectively does the maths of scraping the current metrics or getting the metrics for the pods currently in the deployment, getting effectively what is their current utilization, in this case of the CPU resource, dividing that by the desired, so in this case, 70, and then timesing it by the current replicas. So if we're using a %, we divide that by 70 so we get one point or something that messes off. Times up by current rep because there's ceiling on it and then set that as the new desired rep because

30:08 Alright. Okay. That makes sense. I like it. So we can just apply this to our cluster. And if we take a look at this, we can remove those replicas now. Right? Yes. I I gave a talk at GET upstairs about a month ago where actually, I I I was bold enough to say that setting replicas and either of these specs was an anti pattern. And you should just HPAs in all scenarios even for local dev. Not sure how well it went down, but I'm I'm gonna push it. So what this means is because we have

30:47 that is already a way. Right? So because that CPU thing is always pushing against the limit, I guess that's what the stress thing is doing is that right away, this is when, hey, you probably need more of these. And that's when it's hard to scale up. Okay. So if you and you can also see that one of the things you can do is you can the HPA is quite good at telling you why it's done a scaling action. So you can do kubectl describe HPA. Okay. Set the size of our replica set to three because the CPU resource utilization percentage

31:27 of request went above a target. Yeah. Yeah. Cool. And you can see higher up, you've got the metric section as well with the current and the target. Sweet. And then we delete. So that's it. That's that's That's that's resource spectrum scale. So you can also we we set this up to scale on CPU. You can also do resource utilization scaling on memory as well. Okay. And as a a rule of thumb or a guide for anyone that wants to start using this, is this the is this the first place you would start for workloads? Yeah. I I think it's the unless you're

32:10 coming into it with a real knowledge of, like, what the what the metric that's most important for you is, I think this is this is certainly the easiest to set up. Like, the metric server is owned and supported by a core a core part of the Kubernetes community second instrumentation. And it's it's it's well tested. So this setting up the metric server and setting up resource scaling is really easy as you've seen. It's like, in this case, one help chart to set up the metro server and then a a couple of bare YAML objects. Okay. Definitely.

32:53 Custom Metric Scaling: Concepts & Prometheus Setup

32:53 Alright. Plus they don't have to re instrument their own application if there's not added premium fees metrics or anything like that. Yeah. Yeah. So it's yeah. I'd say it's a good time point too. Alright. Let's take a look at custom metric scaling. Yeah. What are custom metric? So custom metrics are metrics which correspond to a Kubernetes object. So they're not just any arbitrary object, like, whether external to the cluster, so something like QN potentially. They are either metrics that are exposed by each pod in a, say, a deployment or potentially on other Kubernetes objects. So

33:00 Autoscaling with custom metrics

33:40 ingress objects are one example. So, effectively, if you were deploying a service scaling behind an ingress object, you can make use of metrics produced by that ingress object as well as a custom metric. Alright. So for this then, we need to deploy Prometheus. Yep. Do I really need to put forward? That that is just to have a check if we were if if if you were unsure about whether it's set up correctly. But Okay. So helm install. Alright. Let's see what we got here. Right. So we got a node exporter, a cube state metrics, push gateway in the server, and alert manager.

34:36 So with the previous server is obviously the database. We've got the alert manager for doing alerts. Why are those pending? Not like pending. Yeah. Oh, they requested yes. There should be a way to disable that, surely. So the problem is this cluster is not really configured for storage classes and PVC. So let's let's let's fix that first. Let's go over here. Let's grab this helm chart. I'm sure we can tell it to use a host path. In fact, can edit it. Let's just do that. Edit, deploy, Prometheus server. Is it a staple set or deploy?

35:39 You can disable the persistent volumes. Well, that's easier. Is that just a helm flag I can drop in? Yep. So if we go helm upgrade Yeah. What's the last what's the last values flag? Do you remember? Agreed. Reuse values. Okay. So helm upgrade, reuse values, Prometheus set. What do you know there? So there's to direct. It is oh, come on. There's a lot of options in this chart. Are you sure you don't want me just to edit the deployment? Yeah. Might might be easier to. Alright. Let's see. So I meant okay. It looks fine. Oilment.

36:55 Okay. We got data here. Thing. Let's step back here. Yeah. We we don't really need alert manager. No. No. We don't in this case. So Oh. Alright. They're gonna make me create that path first. Is that the problem? Oh, this error wasn't there. Wasn't there. Yeah. See. Alright. Come on here. Log dash f. Copy. Oh, it's 9 anyway. Oh, it's the other one. Let's see. Prometheus server and unable to create Open permissions to that. So I have find the the command the the flag you want. It is server. Use values. Prometheus set server. Dot persistent volume dot enabled

38:41 equals false Capital e. Okay. Oh, yeah. In the chart. Oh, yeah. In the Prometheus community. So that's Prometheus. Should have used Cane. It's I'm taking notes for things that I need to fix for people not using kind to get. Alright. Container crane. So it's it's got no persistence, no PVC, our crash with backup will disappear now that that's running. And this say terminating. It'll go away eventually. Right? We have what we need. That's important. Yep. Okay. So let's do the port forward. Make sure we can browse to it. Make sure everything's happy. Post 1990. And we have a Prometheus.

39:45 Now because those node exports are running, we should have yeah. We'll bunch of status. So Yep. Desk. Perfect. Cool. Alright. Okay. That then? Yep. So it's called the port forward and go back to this tutorial. So now we're going to deploy our sample application. Let's take a look at our YAML. Sample deploy. So this is running some auto scale demo thing. Got Prometheus annotations. So that's gonna be scraped by Prometheus and more resource limits. Oh, and even a service now. Yep. Does it need to hard code the cluster IP? No. I don't even know if my I don't think 10 or even

40:48 my not even my service side or so. Okay. So let's deploy. Yeah. Let's deploy that. Okay. So we're gonna apply Kubernetes scaling to sample deploy. We'll get the service. IP, good. Yep. 172 is what I needed. We'll make sure we got an endpoint. It looks healthy. So you want the port forward to Prometheus again, and we're gonna take a look at the HTTP request total. Now depending on the previous script interval, we may just have to kill a little bit of time. We lowered that as part of the install to thirty seconds. Lower that to thirty. I would have lowered

41:37 it to four. H t p. There we go. And let's just graph it over. We don't even have a minute yet. Is that alright? You need an aggregator on it as well to Right. But yeah. So you need you need multiple different things for that anyway. But we're get we're getting the main tracking. Yeah. Yeah. Are. Here we go. So we can see the values too. Yeah. So we just we have a couple of comments. So I can just lose a few seconds so we can maybe graph it as well. Never seen a hard coded service IP before.

42:22 Yeah. I I I'll blame Guy for being a bit silly with that cluster IP. I'm not sure what you were thinking there. That's something. I I have no idea. I I suspect that was me. Yeah. It's at 11:30 on Monday night. But yeah. That seems like a good idea. Right? Works your kind. It must work everywhere. So that's raised from two to three. Is the sample application going to be degrading over time? Is that This this sample application is recording all requests made to it, which includes effectively health checks. That's But the http oh, okay. This is request or this is

43:01 not the response time. No. Okay. Gotcha. So, yeah, that's gonna up to four. Yeah. So it's an ever ever increment time. I've got it. Okay. Cool. I mean, it took me a minute, but I got there. Okay. So now we have to install something else. Oh, we can visit the app. Oh, we don't need to do that, do we? Yeah. Okay. We don't need to do that. No. It's it's it's fairly simple. Like, it just shows I've served x number of requests all time. Alright. Why not then? Eighty eighty eighty eighty. Eight. 10. I'll go for a nice round 100.

44:02 So now we need to install the Mepheus adapter. So is that the part that is gonna allow us to use custom metrics then as part of our HPA rules? Yeah. So if you run so to start to show this, you can do you can run against the cluster kubectl get API services. Interesting. It might be APIservices.a you might have to fully qualify it. APIservices.API registrations. There we go. There is no dash. Okay. Cool. So what you can see here is all the metrics that that cluster knows about, where the service it should send it to

44:59 is, whether it's available and how old it is. So you can see that we've already installed v one bit one dot metrics. Okay. So IO being served by the the metrics server. Custom metrics is a different API path served by a different component, so we need to set up effectively something that is going to register itself against the API server saying, if you receive a request that matches this API path, don't don't try and serve it yourself. Instead, send it on to me. I will send you data back in the format you expect. Cool. Alright. Well, let's get that enabled then.

45:36 Custom Metric Scaling: Prometheus Adapter Setup

45:36 Hopefully, that doesn't need a PVC. Oh, yeah. K. One one thing that's probably worth mentioning here is so we're using the Prometheus adapter for this example. That isn't the only option for setting up custom metrics. There's there's a number of other implementations. So Lando have got one called Kube Metrics Adapter, I think, which has a load of functionality. There's there's others that have been built by other companies and communities as well. Cool. Is there a Google Sheets adapter? I think the closest you can get is Snapdriver. And also one other thing, which I think a lot of people obviously come across this

46:43 and go, oh, that's that's cool. There's multiple different adapters that can do different things. I could maybe get both install multiple of them and that way have functionality of that one plus the Prometheus adapter plus so I could use data dot metrics and Prometheus adapter. Unfortunately, at the moment, one of the limitations of these API aggregations is you can only have one thing registering itself as saying I serve these metrics. There's not there's there's some discussion at the moment to build effectively a proxy that would say would say say for this namespace, this component serves metric

47:21 that serves custom metrics and this namespace that other component. But the moment is you've got to choose one of the solutions. Yes. So the problem being if we take a look at the API services then, is that only one adapter can register as a custom metric provider? Yes. If you have metrics and and Fox DB and Prometheus, you really need to pick one for these use cases. Yeah. Alright. I'm as and I'm assuming the thing you were talking about there would just be, like, a gateway that registers this and then proxies them onto to different Yeah. Okay. Yeah.

47:53 That's kind of the discussion as a kid. But Alright. And then if we hit the endpoint, we have our these all are custom metrics? Yep. From us. Yeah. So if I grab this for HTTP oh, nice. And so this is this is where it gets interesting. So the Prometheus adapter but Prometheus metrics are not PromQL is not what the API server speaks. So it's sending metrics to a certain API path and expects a response back in a certain format. So Prometheus adapter is about sitting in the middle there and you're configuring it as to how

48:39 you want to what metrics you want to get out of Prometheus and what metrics how you transform them into the format that the API server expects. So you can see one of the things you can see here is that this name, pods slash h t p request per second, that's not the name of the metric in Prometheus. We've been looking at a a metric that is supposed by the app called h t p request total. Yeah. So do you want to jump to the file that we used when we installed the Prometheus adapter? So we did a bit of as part

49:18 of the the command that that we placed then with some configuration. This one here. There we go. So here you can see we're doing some overriding of the default, like, Prometheus endpoint is sitting on to match up with Prometheus chart. And then we've got these rules. So for this is Prometheus adapters configuration language. So here we're saying, serve up some custom metrics rules under these these queries. So in this case, we're saying any query that matches each field request total, which doesn't have the Kubernetes namespace blank and doesn't have the Kubernetes pod name blank, that's

50:06 not so important for this this sort of thing. But in case you're wanting to stop potentially exposing metrics from, like, POS containers. And then we're saying if we come back to resources, we're saying if the name matches regex dot underscore total, Alright. Yeah. Perform this metrics query on the series. So do a sum rate of the series that you have found with the same label matchers. And in this case, we're doing the time window of the rate is two minutes. So effectively, what are the requests per second of this metric over the last two minutes,

51:02 same group values, and then so all these things that are in, like, double double angled brackets, they're effectively part of the Prometheus adapter's query language. Got it. So you can you can have a look at the Prometheus adapter documentation for certain more detail on how these work. But then present that metric as this metric. You can see we've got a a regex in the matches. We're then using that group that we've captured as part of the presentation of the metrics. So that's where that that's how that transformation from a speed request total to a speed request per second comes in.

51:44 Alright. That's nice. So or we can actually just request that metric then. Right? Yep. K. Let's see that. Alright. That's nice. And as as back with the CPU, it uses Kubernetes internal presentation of numbers. So that's why it's it's had an average of 33 milli requests per second over the last two minutes. Rather than presenting us, like, zero point zero zero or something, it presents us 33, at least. Yeah. Of course. Why not? Okay. Alright. So now we're gonna use an HPA like we did with the first example to scale on our custom metric, and then you provided another

52:30 Custom Metric Scaling: HPA Demo

52:37 deployment, I I assume, that is gonna generate a load to increase those number of requests per millisecond. Yeah. Yeah. It's in this case, it's a job because that way, we can see it scaling down again. But Alright. So we got our HPA. It looks very much like the last one. Okay. So the metrics type here has changed from resource to pods. Why would that not be custom? So under custom metrics, there's two different metrics types. There's the pods, which is a metric that's exposed by every pod in the scale target, or there's objects. So that's

53:19 things like ingress objects. So effectively that in that case, you're saying this is a metric that comes from a single object that maps to the pods in the scale target. But that that changes basically how it does the evaluation of the maths behind the scenes. Alright. Okay. Okay. I got you. So So in this case, we got more than one pod. So it's gonna have to actually calculate that across rather than on an individual pod basis. Okay. That makes sense. Oh, I never I never knew that. That's really useful. Okay. So let's deploy this then.

53:57 We'll just copy the password different. Okay. Apply case to we want a sample h p a followed by load. We now have a horizontal pod out of scaler. Now we wanna generate some load. Let's run get pods, see what's going on. Our load generator is spinning up. And we can monitor this. Cool. So this is gonna because you've got the dash w there, it's gonna watch this. And what we should see is the number of record number of replicas claim after thirty seconds, I guess, because that's the premium script interval. Start claiming towards max pods.

54:42 Yeah. So you could you could see we've generated load. Suddenly, target is way above sorry. The current utilization is way above the target. I each the pods on average within that deployment have each received 93.2 requests per second over the last two minutes instead of the target 0.5. So it's it's scaling up. Yeah. And we can see that this that's based on a measurement how much how far beyond the target we are. It's actually hey, we are gonna need more than one of these. We're actually gonna scale by three and then it's added on another four because

55:18 yeah. Correct. Yeah. Cool. We have a a question from Sam Path again. So you've asked what is the difference between horizontal scaling and vertical scaling? I mean, we can tackle that now. We're also I guess, we're about to show the v p a? We've got I mean, it depends how long we got. But we've got external metrics as well. And so in this case, we're we're horizontally scaling and each new pod that we're creating here is is being spun up with exactly the same resource requests and limits as the previous pods in the deployment. So nothing is modifying,

55:56 like, how much CPU and memory each pod gets. They each get the same that we defined right at the start when we went kubectl apply deploy. Yaml. With vertical pod auto scaling instead, I mean, we'll we'll show in more detail later, the resources get adjusted in those pods to allow them to scale up and down to maintain utilization within each pod rather than so much across the deployment on average. Yeah. Horizontal scaling, add more instances. Vertical scaling, get the instance more power. That's the simple way. I mean, yours was much more correct, but I just

56:36 saw it summarize as well. Okay. So with that higher max, we've got 10. HP is working well with the custom metrics, especially from Prometheus. Really, really cool demo. I like that. So much, I'm gonna delete it now. That was k. Alright. Yeah. That's just a fail of it. Didn't really that's the value fail. Okay. Yeah. We can ignore that. Alright. So we have another question. I'll come back to that in a second. Let's get any setup we need done for external metrics. So you kinda covered us what we were describing custom metrics. So we got resource metrics which come

57:06 External Metric Scaling: Concepts & RabbitMQ Setup

57:18 from the the node on a cluster. We got a custom metrics which are exposed or metrics that we can associate with Kubernetes objects. And then external metrics could be anything that we want. Okay. Yes. It could it could be pulling how many domains and and dd146.com or whatever. Right? Like, it could be just Yep. That was a horrible example, but the first thing I my head. It could be any arbitrary information. We could just have it count the number of tweets on my profile in the last hour or something. Right. That was a slightly better example of the

57:51 the terrible. And to do this, what you want to do based on what I see here is we're gonna deploy RabbitMQ and then start throwing messages at a topic and then use that as a skill factor. Yeah. Awesome. Alright. Let's get this running then. Is rabbit m q gonna try and get persistence? We are doing it. I wouldn't want to stay. Ending. Alright. Let's that's a staple set this time. Alright. Robert. And claim. Oh, it's got template and and everything. Alright. Let's pull up the chart. That's annoying. Okay. The colon q exclamation mark doesn't shut down everything.

58:52 Anyway, this chart was okay. So it's a bit nami the bit nami charts are a pain. Because I could never find the example values then. Persistence enabled persistence dot enabled equals false, I think, should do it. Alright. So alright. Upgrade. RabbitMQ. Reuse values. Set okay. One more time, please. Persistence dot enabled equals false. Okay. Alright. Okay. Alright. Let's give that a second. Couple more questions, comments. So Andrea says, super interesting topic. What's your take on AWS Carpenter? Do you wanna go first or second? I I I will go second. I think AWS Carpenter is firstly, really good and amazing.

1:00:01 What I find frustrating is the fact that we tried to do no doubt was scaling based on unschedulable pods. I think that was a I mean, it's not wrong. Worked for a while, but you know, using actual metrics and calculating your applications health and let's say and understanding or using all these other metrics to do node auto scaling. I think it's a much wiser solution. So I'm really excited to see that come out. K? Yeah. I I I agree. I think I agree for the most part. And I think, like, there's there's there are historical reasons why

1:00:37 the cluster autoscaler, which is owned by, like, auto scaling, uses pending pods as its signal. And there are I mean, there's there's, like, there are shortcomings to it, obviously. Like, there's there's the fact that you can't really have much headroom aside from using, like, hacking peanuts pods to, like with the that are low priority and get chucked away when high priority pods scale. I'm I'm really interested in, like, the the design of it and how they're potentially aiming to basically be able to scale anything, any anything which has a scale target implementation. That that sounds like it's got a lot of

1:01:23 promise for us at Skyscanner, but the ability to scale on any arbitrary CloudWatch metrics and metric would be cool. I'm yeah. I'm I'm I'm having some conversations with the web maintainers. They've got office hours starting from January on Tuesday evenings UK well, European time. Sort of morning, separate time. I would encourage anyone who's interested to come on and have a chat. Yeah. So, I mean, now that you've been a bit more a bit more conservative in your answer. Like, I I don't think that it was wrong. Like, we should never have done that approach. I just mean that, you know, waiting until

1:02:07 something that's actually unscheduled before scaling can be optimized. Like, we can try to use measures to protect when and, you know, we don't we we don't wanna failure. It depends how quickly or how slow it is to add those new notes to the note code. And if that takes five minutes and we're waiting till something is unschedulable, you're talking potentially ten minutes before you're in a position. And that makes me uncomfortable. So I Yeah. That's I'm actually cool. And I'm looking forward to seeing that and adopt it and hopefully adapt it to other cloud providers too.

1:02:37 Alright. That persistence thing did not help, unfortunately. Yeah. Even though the flag does seem to be correct, we still have a persistent volume claim. The volume mode is failed system. That did not work. Alright. That's oh, okay. I just need to try and hack at it. Sorry about this. Oh, no. It's I I should give you a fully working cluster instead. I don't I cut corners. So it was easier for me. Alright. So where's this volume? What was that called before I delete it? Data. Okay. So that's not the volume I need. Volume notes here.

1:03:32 Yep. Name data. Where's our volumes? So name, beta. If we can get this working this time, temp, rabbit. I mean, that shouldn't have a permissions issue. It's gonna be trust be that the directory doesn't exist, isn't it? I think so. Yeah. So let's just do that. And then I'll work out what they'll notice on, and I'll SSHN. It's And create. Correct? Oh, no. It did not it did not like that. Because updates are okay. So is it maybe that we just tried to update something that it don't know how we should handle that. Right? Let's try one more time.

1:04:39 Back. Now let's just make sure that's just doing what we expect. So if we did disable persistence, does it stop the PVC being created? That's probably important. Stateful set dot nano. Yep. That's it. And the container, Volumes. Okay. So yeah. We use an empty there and it will not create anything. So that should it should be okay. Let's just assume it was some some issue. There's my there's my way of clear. There's some issue, and we're not reusing the values this time. Did you give me a values fail? No. So it's just all the default? Fair. Yeah.

1:05:34 Alright. Let's see. And we have another comment from Justin. Nice example. Yeah. Well, I mean, it would be a nice example, wouldn't it? Yeah. We'll try it. We'll try. There we go. Alright. So they have more great than the handle disabling the persistence. Less than slack. Yeah. So this is fun. I like it when things go wrong. Just because you you kinda get to challenge some of the assumptions you made, and then I think, hopefully, that's useful to some people. So, you know, we'll see. Yep. Alright. So we now have a rabbit in queue, then

1:06:08 we want to deploy publisher to this. So let me fix my pass. This is case three. And we are going to upgrade our adapter with our new values fail. Yep. Have you tested the Hemopgrade on this? Yes. Yes. That's one I got. Alright. Prometheus. K. So now we have to wait. Well, that collects some more metrics. What we should see from our API services then is that we have something new registered for custom external metrics. There we go. Yep. Field discovery check. I'm assuming it's just not healthy yet since they haven't updated. I guess not. It's probably done the registration

1:07:20 and then not not updated itself and said I'm serving these. Yeah. That's only thirty seconds. That's not Yeah. It's not part of the readiness appropriate. So let's give that at least a faint chance. There we go. And grab for external. Much better. There we go. That looks good. We now have external metrics. And what we're gonna do is take a look at the value of our rabbit and cure. Yeah. Not available yet. So that now may just be the thirty second delay from the previous script interval. And it'll also there's also then another lag introduced because, effectively, the the way the Prometheus

1:08:02 adapter works is it's rather than doing ad hoc queries when you create it. It's instead doing discovery of the series against the Prometheus and telling the API server, these are the ones I know about. You can create a need for this. Oh, you got a joke or something? Okay. So let's let's make sure I because I I don't think I understood everything you said there. Okay. So we upgraded the Prometheus adapter. Let's take a look at the values that we've we've modified there then. Let's see what's changed. So style of the Prometheus thing. Okay. We've added

1:08:48 our extra. Yeah. That's the old one. Right? Yeah. Yeah. That's exactly the same as the old one. Okay. And then so we've added this new external section, which has got a new series for hand okay. So this is just fetching queue lamp size and messages. Yeah. We're seeing this applies to I don't wanna guess. Why don't you you walk through this? So one one of the things that I mentioned before is that the Kubernetes API server is making these requests at at sorry. The API originally is making these requests through the API and then again, proxy. They're made in a really

1:09:29 specific pattern. So if you jump back to the the query that wasn't working, I guess, the the kubectl get raw. You can see we'll figure it Why that's not working? You can see that it's making this call to external.metrics.hsio, then an API version. That's fine. And then namespaces, then a namespace name, and then the metric. So if we then jump back to the written up config, what we're saying here is that this series actually has a label called Kubernetes namespace. So if you scroll a bit further right. Yeah. So we've got a label on this

1:10:20 metric called Kubernetes namespace. And what we need to do is map that into make sure that that metric is served up under the path that Kubernetes is creating on. So in this case, we're saying use the value of the label Kubernetes namespace, and actually that becomes the resource namespace label when the HPA is querying for it. There's also if it was a pod specific metric, you might also need to or you might want to basically use the value of a different label as the pod metric. So when we were doing the custom metrics example, you can you can actually query for a

1:11:05 specific pods use of value for a metric. So you can basically map that label value into, say, this is the pod name even even if it's some weird label because of the way you're exposing it somehow. Cool. Okay. So we probably wanna work out this. Yep. First things first, let's see if the metric is getting into Prometheus. I thought the same. Great. Thanks. Oh, yours is great. Mine is alright. I wouldn't say that. I just I'm just yeah. We were put forward into service before the time. And it's still here. So let's refresh. And what we wanna see is do we have

1:11:57 oh, we do not. Nope. So let's jump to the status targets on the Prometheus. Okay. Does it even know about the targets? So if you search for rabbit It is deadline exceeded. Alright. Let let's try that then. So let's get pods. Port forward. Grab it. What port was that on? 9419. Okay. No. No. It doesn't look healthy. Okay. Oh, well, yeah. That's just actually so it does say running. Right? Yeah. Okay. Let's take a look. Is that port number correct? 1439? Red name. Oh, wait. Something this is an error message. Readiness probe. Unable to oh.

1:13:28 Might this be how how how good is the headless service support in? Yeah. It should work. Let's let's check. We've got endpoints. Yeah. That looks okay. Yeah. Okay. Let's describe pods. We'll try a couple of things. We can always jump on to the VPA and come back down. So what is this actually telling us? So it's trying to speak to okay. So the actually, the service name is supposed to be RabbitMQ headless. So looks good. Okay. Yeah. It's got endpoints. I mean, both those services actually look headless. But okay. Let's see. Node rabbit not running at all.

1:14:55 Nothing. So suggestion start the node. I mean, the we we fax everything as to delete the pod. Right? For for if we if we had to look at the full logs or are those Oh, we don't even look at the logs now. Now. Looks good. Yeah. Are those events old? Where's the probes? Did we did we make mistake of of trying to upgrade rather than reinstalling from scratch? Again I I it. Okay. Those are answer old. I think it is healthy now. And yet? I don't know. Take a look at this. What do we got? Where's the ports?

1:16:07 Alright. Nine four one nine is the metrics. K. Let's put forward in four one nine. I'm not I'm not making a mistake there. Like, that's right. Isn't it? Nine four Yeah. Nine four one nine metrics. I'm gonna delete the pod, and then we'll see if those events disappear. And then we'll come we'll do the VPA, and then depending on how we're looking for time, we may look back. So Yep. And I'm sure if you and if you wanna try this in your own time, it it works in kind and potentially something to do with the changes we're making

1:16:53 live. So we'll see. I mean, it wouldn't it wouldn't be the live demo without something failing properly hard. Maybe better now. Let's try that again. The response is probably better. Yeah. Eric. That is not healthy. Yeah. Bugs starting. K. Last chance, rabbit. Come on. You can have used gnats. No? I've I thought Rabbit was the easy price. I've never really had much problems with Rabbit in the past, to be fair. So alright. Let's go on to the VP. We'll come back to Rabbit. Okay. Yeah. Alright. Okay. So let's see what we got over here then.

1:17:54 Vertical Pod Autoscaling (VPA): Concepts & Setup

1:18:04 Good comment there from Justin. Because we have disabled persistent storage, it may struggle to initialize the second time. I mean, they should start from fresh, I hope. But we'll definitely try that. I may try him on install and reinstall when we come back. In fact, I'll just check one. Should we delete it and then come back to it? Or while Let us delete and then come back to it. Alright. Alright. So persistence disabled. Let's just leave that while we do the VPN. K. Yeah. So vertical part of scaling as we kinda we just said to people horizontal means add

1:18:50 more of the thing. The vertical part of scaling just means tweak those resource limits that we have on our objects. Yeah. We're using cowboys sis what's cowboys sis up? It is I can't remember what who they're tied to. It's a it's a set of quite useful charts. You can have a look at them on our top so that there is some security scanning, etcetera. Alright. So we add that. We're gonna deploy stress deploy from example four. Stress deploy. And we'll we'll probably need to up the memory again because this is a familiar one. Alright. Okay.

1:19:35 Oh, it's okay. I don't know if that's a auto scaler. I love that everything is just going to this default in space. Yeah. On Cal. Okay. So it's b p a. Alright. So what was it we said we needed here? It was I I think it was, like, 270 gigs in that it said 270 megs, sorry, in that one. It was one gig time here. One gig or so. Okay. Let's reapply and deploy. Any labels on these just to make it feel or not a bit easier? There's an app on. Yeah. Okay. I may have

1:20:45 may have increased the the memory scaling on that one from the previous example. Let's go with Yeah. It's happy. Okay. Cool. And just check the CPU one as well. Okay. So let's bump that up. In fact, that'll probably just be the CPU here one thing. We actually we did bump that up in the last one too. But I don't wanna break your your actual scaling of that. Yeah. We we can tweak as we need. Yeah. And that's now healthy. Oh, it was healthy. Mhmm. The on Two. I don't know why I'm being so specific with those numbers, but

1:21:48 and creating running all code. Let's let's start. Okay. Cool. We did this. Let's actually see let's try and get those limits down to close so the VPA can actually do stuff, I guess. Right? Yeah. That would still be better. So our CPU, why was that being on kill just using three meg? I think I might have just been, like, grabbing it before the poll it's like, once the poll Oh, right. Started up. But, no, I actually started going, I'm here to stress your stuff. Right. Right. Of course. I keep forgetting about that delay. Right. So we we need 1,100

1:22:29 meg on memory and 80 on the CPU. So let's do that first. So we'll say request 80 memory. 1,100. So we know that's the baseline. So the request will be our baseline. Yep. And then we're gonna set some limits. So we'll say 112 because we're being really stingy. So we're not really giving it much wiggle room there. Yeah. Now let's see if our CPU run. And there we go. Okay. So we want 800 meg cores and 30 meg. So 800. That's that's speaking on them. That's actually been throttled right now. Yeah. I cannot remember basic numbers.

1:23:21 Okay. Folks say, yeah, you can have up you can stretch to a thousand and we're only gonna give you 42 here. But that's it. Pretty finely tuned, I guess. Yeah. Pretty pretty finely tuned, like bumping up against those limits. If we grab on stress, I should just return those two. There we go. Okay. Oh, no. They're both being killed. I think it might it might be one of those ones where they're actually, like, bumping up and down with the memory and depending on when you get it, it's potentially a useful met a metric or not.

1:24:06 Oh, alright. Yes. Guess we can tweak the b p a anyway. So Yeah. And 50 should cover everything. And let's just see. We know what didn't really go much bigger than that. So Sure. K. Oh, that's the old one. Okay. Yep. Running and running. Do not touch anything. Nope. Memory's dead. No. Let's just start speaking of numbers at Lakes. Okay. Oh, I've got the w. I was wondering why that wasn't actually doing the watch, and it's because of the alright. Okay. Running. Running. No arm kills. The CPU stress one that's running is the older one, though.

1:25:13 So it is alright. Let's bump up the CPU one. So I guess it's eating a bit more memory than I thought. I'm so confident I'm not gonna have to tweak that again. I'm just gonna go to the next step. Okay. So let's take a look at our VPA. Okay. VPA. Alright. You wanna run us through this? So yeah. This is so you can see this is, like, under an API version auto scaling dot case dot kind of record auto scaler. So this is a custom resource definition. Some people may be familiar with them, some may not.

1:25:31 Vertical Pod Autoscaling (VPA): Demo

1:25:59 But this this basically is an extension of the the base Kubernetes APIs with a new custom resource, and this is this is effectively declaring an instance. So that means you can then interact with it using kubectl. And this custom resource these custom resources, sorry, are picked up by the vertical pod autoscaler and used to inform it about what behaviour we want. And it also writes back status to these custom resources so that we can basically have a look at what it's saying about what behaviour it intends to undertake. So in this case, we give it some

1:26:40 metadata, a name, usual thing, we expect that. And then as with the HKs before, we're giving a spec, so a target ref. In this case, we're pointing it at the CPU stress app. Then this is this is where the magic of the VPA comes in. We're giving it a resource policy. So we're gonna need to tweak this because we've had to throw up the resources. So that's that was gonna be my first question. It's like, right now, we have both of these running, but we actually we, you know, we we bumped them a little bit

1:27:11 higher than we wanted. Yeah. Could we I I don't know how the VP works specifically. If we just put this back down so they're both on killing and apply the VP, will it fix it? It can potentially run into issues. So the problem is it it's it's pulling it's pulling its data from the metrics API. So depend if if these things are repeatedly on killing and therefore the metrics API is giving a misleading idea of how much resources these pods are using, it might not be able to do the maths it needs to do to figure

1:27:49 out, oh, wait. That needs more CPU or memory. Okay. So this best test CPU thing on the stress program, does that mean consume four CPUs? It means run it on four threads. Right. Okay. If we've got this at fifteen hundred and five hundred and we go to our VPA, does that mean that we want this at a minimum I mean, the minimum probably has to match. Right? Yes. In this case. 500 m and 1,500 m I. And then we can say down here, you can have four whole course and three GI. Yeah. And that should work for the memory and

1:28:38 the CPU. It'll be only targeting the CPU one, but this In this case, we're only targeting the CPU. Okay. And and we can also basically tell it, actually, only monitor and modify. And so for instance, if you're running a Java app where it may be always appearing like it's using all of the heap, therefore, the memory utilizations to Kubernetes always looks really high. You may only want the VPA to monitor the CPU and never to modify the memory. Okay. Cool. So I just apply that. Is it gonna do anything? So if you apply that and then we can

1:29:20 have a look at the logs of the vertical pod one of the vertical pod autoscalers components. Right. There's no events. That means it's not doing anything yet. Right? Not yet. And that's because, again, we've gotta wait for a previous and all those metrics to to do something. I I sorry. I tell a lie. It is already picked up. So if you jump back, So you can see that it's got these conditions, status conditions. So status is true and it's provided a recommendation. And you can see below that the recommendation. Okay. So we have what's the lower bound here in this context?

1:30:14 So the the lower bound is you I'm trying to remember exactly the the math. The lower bound is how much Is it this? No. It's how much it's used. I can't remember exactly. Alright. So what does this recommendations suggesting right now? Because I I'm I'm not sure, Andrew. Is it telling it to increase it? Is it telling it So if you so if you scroll up a bit again so there's there's basically there's the update policy, update mode. And so VPA has a number of modes it can operate in. So there's auto recreate, Currently, two are the same,

1:31:21 which basically says when you're when you do this, recreate the pods to change the the change change the resource stuff. So basically, it says if you think that we need to change the resource request on this, delete it, like, remove an existing pod to trigger recreation of the pod with the new recommendations. There's initial, which basically says, like, if a pod has been assigned resources, never delete it to force recreation of that and give it new resources. The reason auto and recreate are different is because auto, in the future, will hopefully be able to do in place updates with resources on pods,

1:32:07 and therefore, be the the recommended way because it's the least disruptive. And then there's also off, which basically allows you to if you're unsure about the VPA and how it might affect your workloads, you can basically install the VPA, say, watch these workloads and, like, record your recommendations, but don't actually do anything about them. Alright. So that's what the recommender's doing. There's then other components in it. So there's the updater. So check that then takes the recommendations produced by the recommender, checks whether the managed pods being monitored by that VPA CRD are have the correct resources set. And if

1:32:50 not, it kills them so it can be recreated by their controllers. So the DPA doesn't actually operate on. It doesn't change anything in the deployment that is it's monitoring in this space. It's it instead has the the third and final component at admission plugin actually make changes to the pods when they're created. So, effectively, the API server, the deployment in this case goes if a pod's been deleted by the VPA goes, well, that's been deleted. I need to create a new pod to match my desired account. Create a create a new pod, and that the API server passes it through the admission

1:33:27 plugin, which takes the recommendation, pulls that and sets the resources. Ah, okay. Alright. So this so you're suggesting we can take a look at the logs from the recommender? Yep. Home event will be discarded because it's too old. Alright. Yeah. I think that's unrelated, isn't it? I think that's what we were talking about the the thing. Then we did the describe, we just did. So now we can scale this up. Sure. But I'm looking for here. So if you do have a look at the if you describe the pods that match the label selector, Describe pods dash l app equals CPU stress

1:34:37 up. Yep. Oh, I bumped the limit up. Yep. Oh, yeah. Because our limit was 1,500, and it's now decided you probably want this as 1,700. Phew. Alright. That's the end of the v p one. Nice. Okay. A moment of truth. Where's rabbit? It's running as to hop on restart. Let's see. I think I'm getting metrics. No. Doesn't look too happy. Put forward service. Correct me for this. K. And if it works no. Okay. We don't have No. No. Never mind. No big deal. Yep. Cool. Awesome. Let me pop that off. So if anyone has any questions, we're gonna finish up in

1:35:56 Conclusion & Community

1:36:01 just a few moments so you can get them in now and we'll do our best to tackle them. I think thank you, guy, for putting all that together and, like, you know, just given this that kind of running commentary as well and explaining all those concepts and the vocabulary. You know, I think using the HP and the VPA is is really important when adopting Kubernetes. And it's it's just nice to have an example like this where we can set and go through it from start to finish and really break down and understand what's actually happening within the cluster.

1:36:30 Your description of, you know, the resource metrics, the custom metrics, and the external metrics was all fantastic. That's something that I've had a little bit of confusion about in the past. So it was good to kinda have that explained to me in really nice detail. Why don't well, we just wait for more and more minute. Like, tell us a little bit about seg auto scaling. How long have you been in the chair? What does that mean? And, you know, can people get involved? Yeah. Definitely. We're definitely always looking for more people to get involved if people have,

1:37:04 like, use cases that aren't currently covered by the the components we all we take care of. So we the signal to scaling currently has ownership of the cluster autoscaler, the HPA code within core Kubernetes, vertical pod autoscaler, and something called the pod mani slash add on resizer, which is a really sort of simple slash dumb DPA, basically. So basic basically, it just scales up and down in proportion vertically in proportion to the size of the cluster or something similar. We there's there's some plans for for us in terms of, like, what our what our near timeline looks like. We want to so

1:37:58 you saw a lot of those APIs we were using were b two beta two. Auto scaling b two beta two, API has been around in beta for quite a long time. We want to look at promoting that to GA. There's some bits of functionality that are still in alpha that we want to try and get into beta and GA to basically enable people using managed services, a lot of which decide to restrict people to beta and above feature flags. We want to try and get those promoted so that people can make use of them wherever they're running.

1:38:39 And, yeah, we we've had a lot of functionality and new best functionality pushed through by users saying we we want best. There's there's I I've not included it in the demos yet, but there's some really cool functionality called scale policies where, basically, you can get really granular with how you want a pod a service or a deployment to scale up and scale down. So you can say, at most, I want, like, to scale down 10% of the pods in this deployment or eight pods at a time, whichever is the highest or the lowest. So there's a lot

1:39:13 of granularity you can get in that and that users have implemented. So, yeah, I've been co chair of the six since earlier this year, April time, maybe. Time time has lost all meaning, like, everyone. And yeah, I I've tried to sort of bring a more end user focused sort of view of things because a lot of the a lot of the people with heavy involvement and maintainers of the code are very much like Google. Like, they're providers and they get feedback from, like, users of g c e and g c k GKE. Sorry. But, like, I've I've tried to

1:39:55 sort of bring a more like, I'm I'm dealing with clusters and Skyscanner day to day. So yeah. Alright. Thank you. We we have office hours every Monday at 4PM Central European time. Come along if you have any any ideas, things you want to discuss. Alright. Thank you very much. We got a comment slash question. I'm not sure. So firstly, Justin says custom metrics is my favorite. Yeah. I can definitely understand why custom metrics and very powerful demo that we had there. And then idea cron based horizontal pod or scaler, changing the scale policy on a cron

1:40:40 interval. Yeah. Think there's there's maybe a use case for that. I mean, I can think if you're an e commerce or you're a news outlet, you might wanna scale set differently between office hours between nine to five as you and and maybe you wanna lower your skill and policies in the in the evening or during the night. Yeah. Definitely. Called you. Alright. I know you've got another call to get to and we've been overtime. So thank you for sticking with me and for having sharing your knowledge of us. That was really really good. And other people can check out that repository,

1:41:11 have a play, and and enjoy. Yeah. Then let me know what is lacking. Obviously, I need to go back and revisit the limits for those stress deploys and figure out why rabbit doesn't work on it hits that. If you would like a bare metal cluster to play with, I will make one all for you. Not a problem. Alright. You have a a great day, guys. Thanks again for joining and sharing your knowledge and I'll speak to you again soon. Bye bye. Thanks. Bye.

Meet the Cast

David Flanagan

@rawkode

Guy Templeton

@gjtempleton

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Documentation

Prometheus Adapter

Code

Kubernetes Autoscaling by Example repository

More from Rawkode Live

View all 173 episodes

Hands-on Introduction to Odin

Hands-on Introduction to Odin

Hands-on Introduction to Iroh

Hands-on Introduction to Iroh

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Hands-on Introduction to sympozium

Hands-on Introduction to sympozium

Friday, January 23rd, 2026 - Chevron7

Friday, January 23rd, 2026 - Chevron7

Hands-on Introduction to jujutsu (jj)

Hands-on Introduction to jujutsu (jj)

More about Kubernetes

View all 172 videos

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Kubernetes Security Scanning: The 4 Tools You Actually Need

Kubernetes Security Scanning: The 4 Tools You Actually Need

More about Prometheus

View all 26 videos

Flatcar Linux: A Modern OS for the Always-On Infrastructure

Flatcar Linux: A Modern OS for the Always-On Infrastructure

Hands-on with Headlamp: The Kubernetes UI

Hands-on with Headlamp: The Kubernetes UI

Hands-on Introduction to Perses

Hands-on Introduction to Perses

More about Helm

View all 49 videos

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Platform Engineering: Asking "Why"? with Evelyn Osman

Platform Engineering: Asking "Why"? with Evelyn Osman

Build a Production-Ready Kubernetes Cluster with Spectro Cloud Palette | No-Code Tutorial

Build a Production-Ready Kubernetes Cluster with Spectro Cloud Palette | No-Code Tutorial