Kubernetes Cluster API for Equinix Metal (Formely Packet)

Watch / Rawkode Live Live

The embedded player needs JavaScript.

Open the video stream (HLS) Download captions (VTT)

Expand player Shrink player

Overview

About this video

What You'll Learn

Use Cluster API to define Kubernetes cluster lifecycle declaratively with separate management and workload responsibilities.
Initialize a Cluster API management cluster with clusterctl and provider plugins, including version and dependency checks.
Generate Packet provider cluster manifests, apply them, then fetch kubeconfig and add-ons to bring CNI and scaling workflow online.

Jason DeTiberus joins to walk through declaratively defining Kubernetes clusters with Cluster API, installing the Packet infrastructure provider, generating a workload cluster config with clusterctl, and bootstrapping nodes via kubeadm on Packet bare metal.

Chapters

Jump to a chapter

Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

2:25 Introductions

2:28 Hello. And we are now live. Hello, Jason. How are you? I'm doing good. How are you doing today? I'm very well. Thank you. I'm excited. I'm I'm really looking forward to today's session. So for people that are tuning in, we are going to cover Kubernetes deployments using the Cluster API on bare metal compute using Packet host. Now I know a few of those words, which is really cool. And I'm expecting Jason here who now we're colleagues. We work together at Packet as of, what, two weeks ago now? I think this is my third week. So

3:00 What is Cluster API?

3:08 Third week. Yeah. It's it's gone fast, but it's been fun. Awesome. So do if you don't mind, can you just explain to me what Kubernetes is? I will do that to you. What Cluster API what the Cluster API is? Let's assume I'm okay with Kubernetes and and So Cluster API is basically a way of declarative declaratively defining Kubernetes clusters using Kubernetes primitives. So, you know, basically, when the project was started, you know, we decided that, you know, we have this declarative system for defining, you know, basically applications and running them. Wouldn't it be great if we could also,

3:56 you know, define infrastructure this way and manage the Kubernetes clusters themselves this way? So that's kind of what, you know, started Cluster API as a project was taking those declarative concepts that you get in Kubernetes and moving them over to not just infrastructure management, but specifically managing Kubernetes clusters themselves. And that includes, you know, basically, the entire life cycle of a Kubernetes cluster from installation to ongoing operation. You would wanna scale the cluster up, scale the cluster down to upgrades, and then tearing those clusters down when you're done with them. Okay. So that so I can define a

4:41 Kubernetes cluster instead of Kubernetes cluster to get a Kubernetes cluster? Yes. Right. Awesome. So I guess my first question there is, is this a I I mean, when I first started using Kubernetes, the model seemed to be that we or at least maybe it's the organizations I was working with. We had one cluster to rule them all. Like, one massive cluster, we ran all of our workloads on it. And is that not best practice anymore? Is the cluster API a sign that we should have smaller many clusters many smaller clusters? So I think there are varying use cases

4:50 Is the world moving from one large cluster to many small clusters?

5:16 depending on your workload, what you're trying to get out of the workload that will determine, you know, how big of a Kubernetes cluster are you running. You know, are you specifically trying to fully optimize your resources to the maximum to get maximum bin packing for your applications? In that case, it makes sense to run a really large cluster and run everything in that one cluster because you can get kind of the best resource utilization for your applications that way. However, it also creates basically a singular failure domain around that Kubernetes cluster. So even though you would loo if you

5:53 lose the control plane of a Kubernetes cluster, you generally still have your applications running in there, but you'd lose the ability to manage those applications. You know, you can't scale those applications to meet demand. You can't push updates to those applications if the control plane is down. So, you know, being able to, you know, spin up multiple Kubernetes clusters allows you to create much smaller failure domains around the clusters themselves. And in the past, it was pretty prohibitive to have many small clusters mainly because, you know, installing and managing the life cycle of those clusters

6:32 was a very labor intensive process. And if you can declaratively define those clusters and manage them, then it's much easier to spin up and tear down clusters as you need them. Okay. So I write a manifest. We're gonna walk through all of this. So, you know, if you're tuned in, you can you can watch me make a mess of this or hopefully be successful. But I write a manifest to spin up a Kubernetes cluster. Now the state of that cluster, I'm assuming, is stored within the controller cluster. Is that using Kubernetes primitives as well, like

7:10 secrets, or how how does that work? Yeah. So we generally have the concept of a management cluster with Cluster API, and that's basically just a Kubernetes cluster that is running the Cluster API controllers themselves. And that will store kind of the external state of the cluster, you know, what cloud provider resources you're using for that cluster, how many nodes you have in that cluster, what's the state of the control plane, which version is that control plane, things like that. The actual state for the workload clusters themselves, which we, you know, we refer to the clusters that are managed by cluster APIs, workload

7:54 cluster, that's stored within the etcd cluster that's present inside that cluster itself. So, you know, if the management cluster goes away for any reason, you know, the clusters are still fully functional and the control plans for those clusters are still fully functional. The only thing that you really lose is the ability to manage the individual clusters until you restore that management cluster. So that makes me think. I mean, could I use a GitOps model for the management cluster to be able to tear that down and restore that relatively painlessly? Absolutely. That was the idea behind

8:00 Can we use GitOps for durable management clusters?

8:37 driving a declarative model was, you know, people already have methods of storing, you know, declarative resources, whether it's more traditional configuration management like Puppet Chef or even CF engine or or using a more cloud native model like GitOps, you know, they can use those same tools to manage Kubernetes clusters with Cluster API. Awesome. Alright. Subjective question. Is Cluster API the best way to deploy a new cluster? And I will say, you know, it really depends on your use cases. There are some folks that are already using other tooling that exists, in which case, you know, that's something that

9:00 Is Cluster API the best way to deploy a new cluster?

9:25 their group knows already. It generally makes sense to, you know, continue using what you know if you're happy with it. If you are searching for something new, I would suggest at least exploring Cluster API. And what we're seeing is is, you know, when we built Cluster API, we didn't necessarily build it as the end user interface, you know, for installing Kubernetes clusters. It's not an easy button necessarily to Kubernetes. The idea was specifically around trying to build something that higher level tooling could build upon to kind of create those easy button installers. And we're starting to see adoption

10:09 for, know, other tooling with Cluster API. For example, you know, there there's actually, you know, Kubernetes products on the market today that run based on Cluster API on the back end. Okay. That's really cool. We already have our first question from a familiar face, so I'm gonna drop that in. From Daniel Fennerin, why do we think the Cluster API came about? What were the issues with existing tooling? And he mentions KubeSpray as an example. So there were various different issues depending on which tool you're talking about. And I don't necessarily wanna pick on any specific tool because

10:20 Why does Cluster API exist?

10:52 all of these tools came about because of specific needs that weren't met at the time. But one of the challenges is is that a lot of the tooling may be very specific with how you need to run it. You need to use a specific type of configuration management tool to manage your clusters. Or, you know, it was a very they have very imperative models. You have to run the installer. You have to run the upgrader, and you have to build if you want to have declarative management, you would have to build that around the tooling.

11:25 So, you know, the idea with Cluster API was specifically not being overly opinionated in those areas so that it can be used in a more general purpose fashion and, ideally, would gain adoption from those more kind of opinionated tools. Okay. Right. I'm excited. I'm definitely interested to see how far I can get along here. So what we're gonna do now is we're gonna just use the documentation. I'm gonna do my best to walk through these steps. I have your wonderful guidance as I go. What I would suggest to anyone who's watching, if you have questions, you can

12:02 drop them into YouTube, Twitter, or Twitch, and we will do our best to answer them as we go. So let's pop open my screen. We are gonna deploy on Packet today. There is a Packet provider for the Cluster API. I always see that shortened to CAPI. Do we call it CAPI? So it's it's a challenge because if we, you know, call out the long name, it's it's really a mouthful. And if you say it a lot of times as we generally do in meetings, So we generally like to abbreviate it to, you know, a shorter way to say things

12:20 Do we call it CAPI? 🤣

12:41 when we're, you know, among folks that are commonly working on it. But, you know, it's really about the context. I I try not to abbreviate it if I'm not speaking to folks that I know already understand the abbreviations. That's a good rule. I like that rule. Alright. So we have here the Cluster API book. But I think we've covered the why, Cluster API. So let's go straight to the quick start. Alright. So the only things I need to get started are cube control and some version of Docker and Kubernetes. So I am prepared. And, hopefully,

13:00 Checking dependencies (clusterctl, Docker and kubectl) and version constraints

13:32 we have cube control one eighteen, client state, and cube control one sixteen. And trust me if they ship with the beta, but one sixteen dot six beta zero. So that is okay. Right? You're happy with that that version requirement. Yes. Awesome. And the only reason why we have a requirement on one sixteen is because we're specifically using CRD based resources if, you know, if it wasn't for that, you know, there wouldn't be a real minimum version. Yeah. Okay. Good to know. Next, we need cluster cluster control. I was gonna say cluster CTL, but then I say cube control, and I am

14:14 confusing myself. So cluster control, I'm gonna call it. That's I've I've never committed that. So and I need to just confirm that that works, and I'm good. And are any minimum requirements I need for this? Or zero three eight, am I good to go? That should be good to go. The only thing with the specific version is is it might affect which, you know, defaults it uses. But 038, I believe, is the latest release. Okay. Is my text large enough on that screen? Do you think it should be bigger? It probably wouldn't hurt to make it a

14:51 little bit bigger. Alright. Let's go with that. So task one, initialize a management cluster. Oh, and we have instructions for packet. I wonder if that's a real API token or someone just made that out. If I remember correctly, it was just a random string that was added for the purpose of the documentation. Okay. Good. Good call to make on the text bigger. We've already had a thank you from Lewis. Thank you, Lewis. Right. So now because this is a livestream, and I don't want to expose any keys, I need to I did this before, but let's do again.

15:00 Preparing our management cluster with the Packet provider

15:52 And I'm going to run the edit command. So I'm assuming that when I run cluster control in a infrastructure packet, So does that mean the providers are all baked in to cluster control, or is that gonna download some binaries behind the scenes? What what's happening when I run this command? So by default, what it's going to do is it's going to reach out on the Internet to GitHub releases for the various related providers. So there's more than one type of provider in Cluster API. We have the infrastructure provider, which in this case is Packet. It will reach out to the cluster API

16:34 provider packet repo, inspect the releases, look for the latest compatible version of the packet provider that is compatible with this version of cluster cuddle, cluster control, whatever we wanna call it. And it will also do the same for the core cluster API components and the bootstrap provider and the control plane provider, which if you don't specify in this case, will default to kubeadm. Alright. Okay. So this is gonna use q d kubeadm under the just because I'm a curious individual. I'm gonna run a get paused on all namespaces. And just so I can see what's happening

17:18 on this cluster when we have the management or control plan. Management cluster. Yeah. Let's do that. Oh, it says install in manager as well. It is installing cert manager, and the main purpose there is is because each of the different providers is actually a set of Kubernetes controllers that also have webhooks for providing validation, defaulting, and, in some cases, API version conversions. Start manager automates the certificate management for those components. I got it. Okay. Cool. So that was pretty quick. That was much quicker than I was expecting. That's good. So I'm gonna have to okay. So do we have our

18:15 own namespace for everything we just created there? There will be a few different namespaces depending on the provider. Each provider is deployed in its own namespace to avoid kind of issues because we allow specifying providers that aren't known upfront. So anybody can kinda create their own Cluster API provider, and we wanted to limit the ability of that you know, any provider deployed from accessing data from other providers. Okay. That makes sense. So cert manager, you've you've kind of covered there. We've got the CAPI webhook system, push up system control plan, and then the packet setting. So

19:00 here One of the of the things that may not make a whole lot of sense to begin with is the fact that we have a separate webhook namespace. And that's mainly because at the current time, the multitenancy model for Cluster API is deploying those resources per namespace. And if you have API conversion webhooks, you can't specify on a CRD multiple webhooks or conversions to be able to work with that kind of namespace multitenancy model. So all of the webhooks exist in one namespace, have no RBAC defined. So they can only operate on the input that they're given and

19:45 provide the the expected output. They can't actually query additional resources. We are working for the future to build in kind of native multitenancy, which will then move the webhooks into, you know, the same namespace as the providers. So is that multitenancy and that say, have two different teams deploying onto packet. Would they be able to deploy two different packet providers to the cluster? Or Yeah. So there's, you know, there's the use case of separating, you know, different users being able to create, you know, and manage different clusters. But then there's also you can provide you

19:50 Generating a config for our first Packet Kubernetes cluster

20:28 know, if you deploy two different packet providers, you can point them at different credentials for each of the providers, and then you have, you know, the ability to use multiple accounts with Cluster API that way. Okay. Cool. Alright. Let's see what we got next then. So our install went fine, and now we want to create our first cluster. So if I run cluster control config cluster, I will get some YAML back? Yes. So, basically, the the idea of this command is is that each of the providers publishes kind of a generic set of templates that can be used and

21:16 define certain variables that they expect the user to replace for kind of common kind of cluster deployments. This command basically reaches out the same way to GitHub releases, inspects the cluster templates that are published for the provider, and, you know, allows you to do the variable substitution and output that to a YAML file that you can then feed in to create the cluster. Cool. Right. Let's see what happened. Oh, unexpected argument. I'm assuming, potentially, like, a cluster name? Yes. Okay. So we will call this livestream cluster. And this is where you see the variables that are published for the provider.

22:21 We recently added support for setting defaults for some of these variables, but that, we haven't added that to the packet provider yet. We're looking to add that shortly. Okay. So I can, according to this message, either use environment variables or a cluster control config fail. I I think I like the sound of a fail better. Oh, we have documentation tools. So is there a example of the fail approach? I don't think so, off the top of my head. If I'm not mistaken, it's just, an I and I based definition similar to what you see with the environment

23:07 variables. Alright. Well, I'm not brave enough to try and see if that works. So what I'll do is build the end virus dot x h, and I'll just source it when I run. So do I need You'll probably wanna replace that project ID with the correct project ID. I'm kinda tempted to try it, to be honest. I mean, it looks almost legit. Like like well, maybe yeah. Almost legit. Okay. So let's get this done. That's not secret, so we can just grab that from here. Yeah. That is the same format. I think that was a legit project ID. I'm sure

23:53 it doesn't exist anymore, and no one can use it. It's just same as the key. Am I deployed to Likely, if you wanna be able to access this cluster afterwards, you might wanna set the SSH key too. But I know you know how to recover, you know, from that scenario anyway. Yes. We could. But I'll I'll try oh, yeah. So this is just gonna be a public key. Right? Okay. Fidelity Amsterdam one, I'm gonna go with the node operating system. And I'm assuming not every operating system is gonna be supported because that's the same support

24:28 as q b d d the q b d m, I would imagine. There is that bit, and there's also a bit of expectation on the packet provider with how we're doing the bootstrapping today that it expects to be at least Debian, if not, Ubuntu. So we probably don't wanna change that. Okay. Let's get my public key. That SSH key is actually going to be an SSH key name that's defined in packet. Oh, of course, it will be. Right. Okay. We can delete test here. I'm sure that won't break anything. Okay. So not my public key. It's the name,

25:21 which is that. I don't need to modify the pod or service CIDRs. And depending on availability in and one, those may or may not be available. So let's just quickly check first. New on demand. T one smalls. When they're listed here, I generally find that's a good indicator that they're available, so we should be okay. So I need to run this again, but first, I have to source Kubernetes version. And if you go back to the quick start, some of these, some of the variables that are exposed can actually be specified as command line arguments to

26:28 cluster cuddle config itself. So if you just scroll down a little bit, you'll see the example also includes the Kubernetes version and the number of, control plane machines to use as well and the worker machine count, which defaults to zero. Okay. Well, with today being the 01/19 release date, do you think it's gonna work? I would not guarantee it. There there's also the the way that we're bootstrapping right now also has a little bit of impact on this, and some of the work that we're doing with the packet provider is to kind of get it up to speed with some of the

27:15 other providers that are out there. So I think that, you know, the the Kubernetes version that we define here doesn't actually impact the packet provider today. Okay. Got it. Okay. So we wanna run this command again, and we're gonna specify those flags. There we go. Alright. So let's save this. My stream cluster. Yeah. No. Because this wrote it to copy quick start. So I'm assuming we're we're not actually created cluster yet. That's just to configure it. K. Yes. It'll be this next command when you actually run the cube cuddle, cube CTL apply that you'll actually

27:50 Creating the Kubernetes cluster

28:14 create the cluster. Okay. Let's do that then. So apply dash f livestream cluster. Too easy. Now one of the things you can do is you can do cube cube cuddle get cluster dash API, and it will actually give you all of the cluster API resources that are defined. Oh, okay. Alright. Okay. So we have some okay. So we got four different cube admin, QBD, and context. We got one for the control plan, each for worker nodes, conflict template. I'm not sure what that is. We have our machine deployment. So that's In in some cases, we have

29:15 kind of template resources, and that QBATM config template is one of those. It's attached to a machine deployment resource, and it kind of stamps out individual kubeadm configs for each of the individual machines that it creates. Okay. So this is the template, and then I can actually see here by the naming that each of the worker nodes is using that template. Mean, so I could have multiple templates to deploy different like, if I wanted to have different node groups, node pills, whatever they're called, where I have three t one small workers and then 12 c one large arms or something like that,

29:53 those would be templated machines. Right? Those would be more around the infrastructure templates. The kubeadm config templates are going to impact kind of the kubeadm config that's fed into bootstrapping. So whether it's the initial nodes and taints and annotations for a specific node or if you wanted to specify different kind of cloud provider or, you know, anything, you know, specific to the kubatem config. Interesting. Could I using Cluster API, have a cluster that spans more than one cloud provider? Technically, you probably could create one. I wouldn't necessarily recommend it just because the, you know, the way that the cloud controller

30:44 managers work, they don't necessarily play nice around those scenarios, especially when you look at some of the behavior like deleting instances if it can't find a node resource related to it or, you know, the other way around. It'll delete a node resource if, you know, you know, it doesn't see one for it. Yeah. That makes sense. Even halfway through that question when I was asking that, I realized I'm about how bad an idea that would be. Alright. So we got our control plane machines now. So there's some vocabulary here then. Right? So we have machine deployment,

31:23 machine a machine set. Yeah. There is a machine set. If you think about kind of some of the existing Kubernetes concepts that that you know, you have a pod, which is roughly equivalent to a machine in Cluster API. And then building on top of that machine, we have machine sets, which basically manages multiple replicas of, you know, machines similarly to how a replica set manages multiple pods. And then on top of that, you have the machine deployment that kind of manages a declarative config across different replica sets or machine sets similar to the way that

32:07 a Kubernetes deployment manages replica sets. So I'm curious. Like, you know, could I modify the replicas on this machine set demo? Would would that work? Would that spin up a new machine? It would, but I would recommend doing that on the machine deployment, not the machine set. Otherwise, the machine deployment would go back. Oh, so it's the same hierarchy then. Like, a deployment to replica set. This is a machine deployment to machine set. Right? Exactly. Okay. And because I'm curious go ahead. And they also expose the the scale sub resource so you can actually use

32:48 cube cuddle, cube control scale to scale the machine deployments and control planes. Oh, sweet. That's pretty cool. Alright. I'm just gonna save that then. I can run Clip CTL get Cluster API again. And I'm gonna oh, there we go. Oh, so these are all pending. I think we've had I bet you that machine type is not gonna be available after all. So the reason why they're pending right now is because the control plane is not fully ready yet because Cluster API doesn't current currently automatically manage the CNI provider. So if you jump back to the quick

33:33 start, you'll see there's a step to deploy a CNI provider. But first, you'll actually need to retrieve the cube config for the cluster. And then once you deploy the CNI provider, everything else will spin up. Alright. So I'm just getting two ego then. You tell them I just looked at it. Got it. Alright. So let's just do all the command that tells me to do. So I run get cluster all namespaces. Does my livestream cluster? Cool. And then we wanna make sure the control plan is up. Replicas updated unavailable. So is that still maybe just provisioning behind

34:16 the scenes, or is that an error I can ignore? So one of the things we can do is we can look up the the packet machines. So if we do you can just do the kubectl get cluster API again, and we can just look at those packet machines, and then that will tell us where in the process we are. Or any way for me to use, like, the kubectl logs command to get event information or to the describe command to to debug this? We do do publish events. You could you can run kubectl get events, or

34:53 you could go into the the actual logs. We try to avoid requiring users to do that just because there's a lot of verbose information that is really hard to parse. We've been trying to move more towards conditions on the individual resources to provide user facing information. Okay. That makes sense. So the phase on our machine control play as provisioning. So I guess we just be patient. Yep. And if I had to guess right now, the machine's probably still bootstrapping. I just like looking at things. I'm just gonna start describing things and see what's going on.

35:47 So I think you're right. It's probably still bit strapping. I mean, I trust you. Alright. So these are IP addresses. I mean, I guess, theory, I could SSH onto this machine. I could start running PS. I'd be looking for the usual candidates. I guess, there's gonna be, like, Qibla, maybe the API server process, etcetera. So or I could be patient, and we could talk about something else as well. And and, basically, what we're doing when we bootstrap the machines, we're injecting a Cloudinet config into the user data. And in the case of the packet provider,

36:25 it's actually downloading all of the dependencies. So it's downloading Docker. It's downloading, you know, the Kubernetes binaries and installing those. So it takes a little bit longer than some of the other providers at this time. Yeah. We can take a look at that. So there is an endpoint we can hit. Metadata.packet.net slash user data, which will actually show us that provisioning script. We got some key configuration. These are all generated by cert manager. Right? Those ones are actually generated by the Cluster API Bootstrap. Well, no, the control plane provider to generate the CAs that are used during

37:17 the provisioning process. K. And then we get a cube ADM configuration. Let's go by cluster name, Kubernetes version, And those are actual provisioning steps then. Yep. And those are the actual steps that we're adding in for the packet provider to kind of work around the fact that we don't have a prebaked image to use for Cluster API at this time. So is there if not now, you know, hypothetically, whether it'd be in three or six months, but this is using Docker. It's the runtime implementation. Is there the option to swap that out for container d or something else?

38:07 So in general, for Cluster API, there's a Kubernetes SIGs project image builder, and we generally bake pre bake the images for the various cloud providers. We haven't gotten a packet added to that yet. But when we do, that project already defaults to using container d. Oh, sweet. Good. So if I wanna see how far along this is, then Probably the easiest way would be to, just inspect the Cloudinet logs. So if you look at specifically, varlog, Cloudinet-output.log, I think. Oopsie. That looks bad. Oh, you know what? Is it a good what or a bad what?

39:15 Well, I'm so So it looks like it's trying to start Kubernetes/admin.com. Let's take a look at the other Cloudinet log. Yep. Okay. So let's open that. Log. I'm wondering if we're hitting an issue because maybe the 01/19 release is out. I don't know if no. That can't be the right version because that's 01/19/2011. So I just Yeah. That's it in it. Yeah. Oh, yeah. So what you were saying when we ran the the provision and step the YAML generation step was to take Kubernetes version as ignored by the packet provider for now? Mhmm. Yeah. Okay. Got it.

40:22 Well, sort of. So it's fed into the kubeadm bootstrapping config. So if we look at we can check to see that file on disk. I would expect that to be I think it's written into a temp directory. So where should I go? Sorry. Let's go back to that, the the user data script. Ah, yeah. Okay. So so if we look at that, we're, you know, making sure the swap's off. You know, we're getting those packages. We we can check to see if, you know, we actually completed any of those steps. Yep. Yeah. Docker. It looks like it got

41:11 past Docker. So we can assuming the ping is unlikely to cause any errors. So well, we just run the try to rerun that. Yep. Yeah. Badly, we're on the same page. Ah, okay. Yeah. So So so it was that Kubernetes version we specified that that got us. Yeah. So, yeah, the cubelet is one eighteen eight. The control plane is one seventeen three. So should we just tear this down and provision with one eighteen eight? Would that fix it? So let's try something something different. Let's go in and just modify the resources to be one eighteen

41:54 eight. Okay. So there's gonna be two places. There's gonna be the cube ADM control plane, and then there's going to be, the machine deployment that we have defined. Can I modify an Earth generated YAML and reapply? Would that work? Yeah. That that would work, or you can just, kubectl edit the actual resources. What you wanna do? It's your call. Edit resources? Yeah. Why not? Alright. So we're gonna modify. Should we do the control plane first or the machine deployments? The control plane is the one that's hanging up right now. So well, yeah, let's go ahead and do that one.

42:47 The demo, we have to fix the machine deployments really, really quickly, or do we have a better time? We'll we'll have time. Alright. Okay. So we want to I'm gonna run get cluster API again. So what we want to change is the control. No. You're gonna is it the cluster? No. It'll be it's down at the bottom of the screen. There you go. This one here? Yes. Okay. It could be the end control plan. Livestream. If I search for 1.seventeen, not in the annotations, and we want 118Dot8. Right? Yep. That's the word. The error message we

43:44 see in the logs. So I'm that going to shut down this machine and reprovision in Newborn? So, actually, we're gonna end up in a weird situation here specifically around the control plane. That's alright. I think we're Well, the problem is is because we've already technically attempted to initialize the control plane, we won't actually try to reinitialize the control plane with the change. And that's basically to prevent the situation where somebody accidentally scales down their control plane to zero and then scaled it back up. We didn't wanna present the situation where somebody thinks that their cluster is healthy,

44:33 but they've just wiped out all the data of their cluster. Okay. That sounds like a good thing, which is good. Yeah. So what's our what's our next step here? So we we may want to go ahead and delete this cluster and start over with that in mind. Delete. Delete. Delete. Okay. I think that's fine. So I am going to modify this, and we do have a question. So Adrian has asked, can I provide a custom Cloud configuration? Yes. So you can definitely specify additions to the Cloud Init configuration if you're using the default Cube ATM control plane.

45:23 You can specify any of the, you know, kubeadm configuration itself. You can specify commands to run before the kubeadm and it'll join command is run, and you can specify some other kind of Cloudinet specific things around users, NTP servers, SSH keys, things like that. Cool. So I think what's happening now is it's still deleting the old one. So when I apply the new one, we're gonna have a better delay. So in order to speed this up, what I'm gonna do is if we run this again with different name, that should still just start spinning up. It's

46:13 still waiting on the old one to delete. Exactly. Okay. But I'm now gonna have to replace 1Dot17Dot3 and 118 again. Apply. I don't know why I keep typing it in. I never did that. Alright. Okay. Now if I run get q a d m control plans, and we're just gonna have to wait a moment or two while that machine gets provisioned and the cloud and it kicks in again. There we go. Cool. I guess we can jump back to the quick start just now and assume that besides our little version on full path, that things are gonna spin up quite nicely

47:11 for us. So next, we need to deploy a networking solution. And, yeah, basically, we're gonna have to wait until the workload cluster control plane is available because in order to deploy the CNI, you have to deploy it against the cluster itself. Yeah. That that that makes perfect sense. So it seems from this documentation here that I'm assuming I can use any CNI, and Calico here is just a default suggestion? So if you're running through the quick start with the default configuration, Calico is known to work. So if you looked the pod CIDR that we specified

47:55 when we created the cluster with the environment variables is also the same default as Calico uses. If you are using a different CNI, you may have to modify that or potentially some other Qubitium configuration to work with that CNI provider. Okay. Is that something you see that would potentially be added to cluster control in the future where I could specify the CNI and it tweak the generated channel in some way? Like, would it know? Or as you think that's at the scope for cluster control. So it's one of the things that we're looking at kind of longer term. How do

48:33 we incorporate with, you know, different, you know, deployments that we don't necessarily want to manage the life cycle of specifically in Cluster API? So right now, there's a new feature that you can use to just specify arbitrary YAML to deploy at runtime. But longer term, we're looking at how can we align with projects like the cluster add ons project and use something like that for managing the life cycle of things like CNI. But then even longer term, how can we leverage projects like that to expose some of these things that may need to be modified

49:17 or that we may need to modify in the Cluster API side to help enable some of this stuff. So right now, you need to manage that stuff externally, but we're looking longer term, you know, how can we investigate ways to kind of automate some of that complexity in the future. And we don't necessarily have the answers for how that's gonna happen yet, but but it is something that we want to help solve for users. Do you see this as something that would potentially package up into, like, a Helm chart or use it in JSON or anything

49:51 like that to distribute prebaked configurations? Potentially, with the kind of cluster resource set feature that's there for deploying arbitrary YAML, if that exposed specific, you know, environment variables, you know, that we can use for kind of replacement in similar ways that we're using on the cluster template, you know, today, you can do something similar to that right now. But you would kinda have to predict, you know, all of the various permutations with that right now. So maybe. I I don't know what those are gonna you know, how that integration's gonna look. But if anybody listening has ideas, please come and present

50:45 them. Okay. Cool. Let's keep an eye on that space then. So I'm not worried. I I'm sure this is gonna be available shortly. But should we jump onto this machine and and check CloudEdit for just to Yeah. Prove that it worked. Right? Like I said, I'm not worried. Mhmm. Let's see. Okay. So it's at the docker step. QVDM is running. That's a good sign. Yep. So we can probably tell that Cloudinet out We have a kiblet. Yeah. Yeah. We yeah. We can we can go tell that log. Or I can just keep correcting until I

51:41 see more and more cubes in front of me. Okay. Let's do it the proper way. So right now, it's deploying a static manifest. That's awesome. Our control plane is happy. Yep. That QB to enjoy command for public IP address and token should probably disappear. It's a good thing this is a short lived cluster. Right? Why can I not take? Did my machine disappear? Close. Not sure if I lost my connection or not, but I'll just ignore it. I'm going to assume if I run get QVDN. I was not ready yet. At this point, it should

52:46 be ready enough that if you retrieve that cube config secret, you can query that control plan and see the status. So is there a convenience way of getting that, or do I SSH back into that machine? So if you look back at the quick start, there's a Oh. Set of commands in there. Right down there, the the oh, I forgot. We added a a friendly command to retrieve the Kube config with ClusterCuddle. So let's we can give that a shot. Oh, wait. No. It that version hasn't released yet, so let's let's run the workaround. Ah, yes. Zero three nine. Okay.

53:38 Oh. Correct. I need to That command and the warning there should be Oh, alright. Okay. Okay. So this is the painful way. That's the nice way that just doesn't exist yet. Got it. Yep. Painful is okay. Painful. So I need to rename this to livestream. Let's see if it config admin. Was it livestream two for that cluster? It was. I just did not and that looks good. So if I do cube config equals do I have to cat it or is it path? I can never remember. I cat is the safe way. And I think it's I don't think there's

54:34 an underscore in there, if I'm not mistaken. Get all. Yeah. I found it. Okay. So Oh, yeah. It needs the path there. Sorry. Long day. I say it. 10:56 in the morning. Right? And at this point, you can I was gonna say, at this point, you can use that kube config to deploy the CNI, and then we should see everything become ready and and everything else spin up? Got it. Okay. So we're not gonna rock the boat too much here, and we're oh, that's that's keep coming right in front of me. Okay. So let's do this. I'll just keep doing it with

55:30 the environment variables since it works. So I guess that's pulling images right now for the things that it needs. And and if you go back to the the management cluster and run get cluster API, you should see some of those additional resources start spinning up. We have oh, that's the old control plan or new control plan. We have an initialized control plan. Good. Is our provisioning. I guess we just give that another moment. Yeah. It might take a little bit for it to realize that it's Yep. Alright. So we have our our workers are very much on their way. That's good.

56:47 So let's step back, Rose, while is that it? Does it work? Yeah. I mean, now you can, I I would probably wait until everything finishes deploying? But then at that point, you can go ahead and deploy any types of workloads you need to the workload cluster. Awesome. K. So let's run our cluster API once more. We got lots of ready machines. Our control plane now says ready. I'm assuming this is If you see the control plane's also on its way to scaling up to three replicas as well, and it'll scale up got an AJ control

57:32 plane as well. Yep. Okay. She's getting interesting. Not that it wasn't interesting, of course. Does it configure SCD backups for me as well? That would be pretty sweet. It does not. You know, we we had talked about it, but even if you do SCD backups, it doesn't necessarily guarantee the state of your, you know, cluster being in a good state, especially once you start throwing in persistent volumes and things like that. So we generally recommend folks use some type of Kubernetes specific backup utility against their workload clusters if there's data that they care about, whether it's

58:19 something like Valero or or, you know, anything else in that space. Okay. I don't know whether I should just be patient or I should just keep poking at things. But And it's just gonna take a little bit of time because we are dealing with, you know, bare metal servers that take a little bit more time to come up. And then the way the packet provider is doing the bootstrapping, you know, downloading all the binaries on the fly takes a little bit longer. Yeah. I mean, we do we have seen that our control plane is scaling up, and we have got worker machines

59:02 being deployed. That's pretty awesome. So let's go back to this cluster. When I get when I run git all, I was expecting to see some sort of could proxy Calico pods. Because I'm not good on namespaces. Right? Silly me. So we do have that's cool. Okay. Perfect. Could also run some, like, kubectl get nodes on there. And, generally, if things are in the process of bootstrapping, you can see, you know, part of the progress there as well. So it does say that my worker nodes are ready. So I guess I can deploy something. Yeah. Let's go with index.

1:00:33 Resource on it. That should be okay. So I can do oh, I need my special config. I Internet. If I get bugs. Let's take a watch on oh, we'll find a watch if it's already running. So I see we do have a question out there too. Ah, cool. Oh, wait. Pocket fence. That's how I learn, I mean, I I I just type as many describe commands and edit commands as I can until I know what's going on. So okay. So we have a question. Does Cluster API orchestrate Kubernetes upgrades? And yes. Yes. It does. That's one of the things that we introduced

1:01:41 with v one alpha three with the cube ADM control plane. We've always been able to orchestrate the upgrades of machine deployments because we have the declarative config there. You can't necessarily do this in a very good way with the packet provider because as you've seen, the way that we kinda bootstrap causes some issues with it. But assuming that wasn't an issue, you can basically just upgrade or update the version on the machine deployment, and it would orchestrate the rolling update of those worker nodes. But with v one alpha three, when we introduced kubatem control plane,

1:02:19 that also provides the same mechanism there. Does it do is it an in place upgrade, or is it terminate and then reprovision on new machines? It does not actually do an in place upgrade. We wanted to avoid some of the complexity there. So what it does is it will spin up a new instance. It will run kubeadm join to join the existing control plane instances, and then it will remove one of the existing ones, and it will continue on that until it's made it through all the replicas. So in a bare mail configuration sorry. So is that upgrading the control plane

1:03:01 or just the worker nodes? Both with the default configuration, it will assuming you upgrade both you know, upgrade the definition on the machine machine deployment and the Kubernetes control plane, it's the same process. It will surge up for the replacement and then, you know, remove instances along the way. So I'm assuming it's waiting till SED is is replicated there to the new notes and then strips them back down. Okay. Correct. Yep. Yeah. We wait until, ETCD is in a healthy state. We also verify that, the static pod manifest for the controller manager, the scheduler, and the API server

1:03:46 are in a good state before we kind of continue in the process. Yeah. You're pretty much removing all the pain from every Kubernetes upgrade I've ever done in my entire life. So so this is exciting for me. Now I did try to do a port forward to my engine x pod. I'm not sure why that wouldn't work. Alright. It's not that important. I can oh, I'm getting pod not banned. Well, it's to be like, it was then expose it over a service or just a service prefix. It's just me being, you know, a better than an idiot.

1:04:36 That's okay. Awesome. That was amazing. Thank you very much. Thank you. I now know what the Cluster API does, and I've now spun up my own Kubernetes cluster on packet. That is pretty sweet. And then you can start getting into some, you know, fun down the line. You can throw in things like the cluster autoscaler into the mix and start automating scaling of these clusters as well. So the I can run the cluster autoscaler to add new nodes when, what, CPU and memory get above a certain threshold, etcetera. By by default, with the kind of

1:05:24 cluster API integration for Cluster Autoscaler, what it'll do is it'll look at the scheduling constraints of the workload cluster. So if you have any pods that are pending, it'll try to determine how many resources it needs to spin up for there so it'll scale up your machine deployments appropriately for you. And then if you start ending up with too much excess resources, it'll start scaling those resources back down. Okay. Okay. Well, I don't I don't wanna take up too much more of your time by installing that. That's not deployed by default. Right? That's an add on that I would then

1:06:07 add to my cluster. Okay. Then I think you've just, you know, verbally agreed to do another session with me when we dig into some of the extra features and of the cluster API and the other scaler. So, you know, thanks for that. Yeah. We can definitely do that. We we'll need to wait till some of the the changes are you know, that are currently in flight land because right now, you would need kind of like a self hosted cluster API cluster to make it work. So, you know, once those changes land, then you'll be able to run, you know, multiple autoscalers

1:06:40 and manage multiple workload clusters, you know, within a single management cluster. Very, very cool. Alright. Is there anything else you would like to share with people before we finish up for the this afternoon of the morning for yourself? Not that I can think of. It's been a pleasure to, work through this with you today. No. Thank you very much for joining me. It was really interesting to walk through that and just see how it all works and, you know, be able to ask those questions to someone who's really close to this is just, you know, invaluable. So, you know,

1:07:10 thank you again. I'm looking forward to future updates, and I can't wait to see where this goes. So again, thank you for joining me. Have a nice day, and I will speak to you soon. Adios.

Meet the Cast

David Flanagan

@rawkode

Jason DeTiberus

@detiber

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Documentation

Cluster API Book Quick Start

Code

Cluster API Provider Packet repository

Kubernetes SIGs Image Builder

Cluster Addons project

Cluster Autoscaler

More from Rawkode Live

View all 173 episodes

Hands-on Introduction to Odin

Hands-on Introduction to Odin

Hands-on Introduction to Iroh

Hands-on Introduction to Iroh

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Hands-on Introduction to sympozium

Hands-on Introduction to sympozium

Friday, January 23rd, 2026 - Chevron7

Friday, January 23rd, 2026 - Chevron7

Hands-on Introduction to jujutsu (jj)

Hands-on Introduction to jujutsu (jj)

More about Kubernetes

View all 172 videos

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Kubernetes Security Scanning: The 4 Tools You Actually Need

Kubernetes Security Scanning: The 4 Tools You Actually Need

More about Cluster API

View all 7 videos

Hands-On with Kairos - Edge Kubernetes Made Simple

Hands-On with Kairos - Edge Kubernetes Made Simple

Hands-on Introduction to k0rdent

Hands-on Introduction to k0rdent

Hands-on Tutorial of Project Sveltos

Hands-on Tutorial of Project Sveltos