Overview

About this video

What You'll Learn

  1. Provision tenant control planes as pods inside a management cluster.
  2. Expose each tenant cluster through a load balancer endpoint and kubeconfig.
  3. Join AWS worker nodes with kubeadm, then bridge connectivity with an agent.

Adriano Pezzuto walks through Kamaji, running tenant Kubernetes control planes as pods inside a management cluster, then provisions a tenant and joins AWS worker nodes via kubeadm and Cluster API conventions.

Chapters

Jump to a chapter

  1. 0:00 <Untitled Chapter 1>
  2. 2:44 Introduction
  3. 3:53 Guest Introduction and Background
  4. 5:23 The Problem: Kubernetes Cluster Sprawl & Multi-tenancy
  5. 7:12 What is Kamaji and Why It Was Built
  6. 13:43 Challenges of Multi-Cluster Management (Slides Start)
  7. 18:10 Kubeception: Managing Kubernetes with Kubernetes
  8. 19:39 Kubernetes in Docker
  9. 20:30 Technical Foundation: Etcd Namespaces
  10. 20:36 The Pillars of Kamaji
  11. 22:38 Architecture
  12. 22:44 Kamaji Architecture Explained
  13. 23:24 Installing the Crds
  14. 25:40 Demo Preparation & Etcd Question
  15. 27:43 Installing Kamaji
  16. 30:22 Deploy Manifest
  17. 33:04 Network Profile
  18. 33:35 Add-Ons
  19. 35:18 Provisioning the Tenant Control Plane
  20. 36:45 Accessing the Tenant Cluster
  21. 38:56 Setting up Worker Nodes on AWS
  22. 42:20 Control Plane Details & Design Discussion
  23. 43:27 Containers
  24. 44:31 Multiple Deployments
  25. 45:25 Checking Worker Node Setup
  26. 46:21 Joining Worker Nodes to the Tenant Cluster
  27. 48:59 Worker Node Becomes Ready
  28. 49:18 Recap
  29. 50:47 The Connectivity Challenge (Control Plane to Worker)
  30. 52:50 Solving Connectivity with the Add-on
  31. 54:11 Deploying Connectivity Components
  32. 56:10 Connectivity Established
  33. 57:29 Demonstrating Control Plane to Worker Communication
  34. 58:14 Demonstrating Advanced Features (Capsule Installation)
  35. 1:00:03 Upgrading the Tenant Control Plane
  36. 1:02:25 Roadmap and Future Development
  37. 1:02:30 Roadmap
  38. 1:09:25 Q&A: Other Kubernetes Distributions
  39. 1:10:57 Conclusion and Edge/IoT Discussion
Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

2:44 Introduction

2:44 Hello, and welcome back to the Rawkode Academy. I am your host, David Flanagan. Although, hopefully, you know me from across the Internet as Rawkode. Today, we're taking a look at another open source project. This one is Kamaji. It's going to simplify your Kubernetes control plane life. We're gonna dive into that in much more detail in just a moment. But first, I wanna introduce our guest who's gonna guide us through the Kamaji journey, Daniel. Hello, Daniel. How are you? Hey. Everything is fine. Hope the same for you. Nice to meet you all. Thank you very much. Yeah. I'm doing good.

3:21 I'm always excited to take a look at a new open source tool, especially within the Kubernetes space. I like playing with Kubernetes, and I like my Kubernetes to be easier. And I I'm pretty sure you're gonna show me some really cool things today to make my life happier and better. And if I get more sleep, I'm a happy man. And we've already had a comment in the chat from engine who is very curious about the claim with a fraction of the operational burden. So, yes, we are going to make your Kubernetes lives easier today. But before we do

3:53 Guest Introduction and Background

3:53 that, can you please tell us a little bit about about yourself, please? Sure. Sure. Absolutely. So my name is Dario. The surname is, but I know that it's really hard, especially because from abroad, it's really hard to pronounce in the correct pronunciation in Italian. And I'm based in the North Of Italy near Turin, and I'm software engineer. I fell in love with containers since a sizable amount of years. Honestly, I don't remember. And I was a developer, a web developer, and I faced the issue to scale my applications across the cloud and to the Infinite

4:28 and beyond. And I fell in love with Docker and then OpenShift. I use also MesoFacebook.c0s, and it was really a tough job. And then I started working on Kubernetes. I started working in a cloud provider, and I was responsible for the cyber liability engineering stuff. And I started developing operators, controller times. And I had the time to starting the code base and especially trying to combine my software skills, my software development skills with the systems. And starting from 2020, I joined Classics as a technical adviser, and I'm doing the technical stuff. So I write software.

5:16 I do demos and so on and so forth, and it's really fun. I'm enjoying this job. Awesome. So this is the the first Kubernetes open source project from Clastics. Right? You also helped maintain Captule, I believe? Yeah. Yeah. It's the second one. We started with Captule because we noticed that there is a missing gap in Kubernetes ecosystem, especially regarding the multi tenancy. So instead of spawning several Kubernetes clusters, you can use just a single one, and you can isolate the namespaces into a tenant abstraction. So the tenant abstraction is pretty common because when you got multiple namespaces, you're ending up

5:23 The Problem: Kubernetes Cluster Sprawl & Multi-tenancy

5:56 grouping them into this abstraction. It's a layer that is a collection of namespaces. But sometimes you have to spin up multiple clusters for various reasons, maybe because you would like to avoid at all the noise enabler effects, or you would like to use different API versions, different Kubernetes clusters, and so forth. So we started thinking, how to address the real hard multi tenancy because casual is addressing the soft one. So just name spaces. But I would like to bring my real own nodes and having a real control plane that I can use. So I can

6:32 toggle dynamic admission controllers. I can fine tune parameters and so on and so forth. And this drove us in drafting Kamaji that I'm going to present today. Yeah. You seem to be quite fond of solving the hard problems with Kubernetes, like multitenancy and control plane commoditization. Like Yeah. Yeah. I guess as if you just woken up about how can I make my life a little harder today, but at the same time, help everyone else? So I I really appreciate that you are working on these hard problems because multitenancy is hard and yeah. But it's it's good that there's

7:07 much more sophisticated and better tooling coming out to kinda help people with that. Yeah. Totally. Totally agree. So with that being said, why don't you kinda touched on it there about what Kamaji is, but do you wanna give us a slightly longer description? What what is Kamaji for, and what problems does it solve for people? Well, you know, I started using Kubernetes starting from 02/2016. And at the time, I don't remember, but I'm pretty sure that there was a huge bash script that was used to install a Kubernetes cluster. It was create cluster or something like that,

7:12 What is Kamaji and Why It Was Built

7:43 and it was a pain to manage that. I remember that all the people that I was meeting in Italy, they were using Kubernetes, but using the managed services by GKE, so by Google Cloud Platform or by Azure. So it was really hard to set up a Kubernetes cluster, but the community addresses that problem and started developing nice projects. And the most satisfying one is obviously KubeAdam, because with KubeAdam, we can configure the worker nodes, the control plane. So we got all the building blocks to create a full fledged installation of cluster of a Kubernetes cluster.

8:26 And after that, we ended up also with cluster API, and it's really great because in the end, it's a sort it's not a wrapper. Sorry if I'm saying something that is not entirely correct, but it's wrapping around kubeadm, although you can specify different bootstrap options. But with cluster API, you're ending up spawning multiple clusters. And with that said, I noticed that a lot of people, instead of trying to solve the multi tenancy issue in the Kubernetes ecosystem, they start spawning multiple clusters. And the first one, the second one, the third one, and you're ending up with dozen

9:04 or thousand or hundred of clusters according to your dimension of your company and so on and so forth. And I've been there because I was working for this cloud provider. And I remember that the size was bit impressive to me at that time because I was managing five production clusters on bare metal, and the overall amount of virtual machines was 2,500 nodes, virtual machines, but on bare metal. So you can understand that I was getting pager duty during the night, and I didn't enjoy that job, honestly, so much because it was really cumbersome to manage. You know, we are

9:45 ending up saying, okay. You can manage Kubernetes scale. You can have multiple cluster, and you can do everything so perfect, but you're ending up obviously with some side effects. So although you're trying to automate everything, you are ending up with some alerts or some tasks to perform according to the timing. So rotate certificates or eventually controlling the notice of duration and so on and so forth. And at that time, I was developing a lot of operators, a lot of custom controllers, and I then end up saying, we got operators. Operators are translating the human knowledge,

10:25 the operator, the human operator knowledge into code. And in the end, I'm doing a lot of repetitive task, taking care of a Kubernetes cluster. So why cannot create an operator that is taking care of all these activities? I was thinking about this project for a long time, then I joined Classics. I was on the same line with Adriano, my business partner. And with that said, we decided to start developing Kamaji. And right now, we got this it's not a proof of concept because it's working, and we got customers that are already using that. We are really happy for that.

11:05 And, yeah, that's the reason why I'm here. I started with multitenancy, and now I'm doing deep Kubernetes stuff, the cube inception. You know? Nice. Yeah. I mean, you said that, you know, you've you've done bare metal Kubernetes, and that's something that I'm really passionate about as well. I I like to run my Kubernetes as close to the metal as possible, but I think the pattern that that I've seen, I'm sure you've seen it too, we're seeing loads of other projects kind of follows it here, is that for the worker pool, yes, you want to be running on

11:37 the metal. But the control plane, you probably wanna virtualize it. Like, control planes aren't really doing a lot of heavy lifting. They're really just right into entity and answer some API request. And to run them on metal is usually a waste of CPU memory. And plus, we need to be able to handle the upgrade path for these control player nodes in a pretty safe manner, and bare metal is very, very, very slow to reboot. So that brings a whole bunch of other challenges to it. And I think this is why we're seeing this virtualization of the control plane become,

12:13 a more popular pattern. Then with an interesting one I've seen recently was, you know, we've worked for pushing firecracker for the control plane, but then bare metal worker nodes. And I think Kamaji really comes into this space as well and is that, yeah, like, we can virtualize this and then run all your work bring your own worker pool bare metal or not. And I think this pattern will become even more prominent as time kinda progresses. Yeah. Yeah. Absolutely. Anyway, I would like also to thank you because while we were talking about Kamaji, I told you about the cube inception, and

12:46 you used this term, the Kubeception. And I tried to create a set of slides about these topics of the Kubeception. I hope that you enjoy my memes because I really love crafting memes about Kubernetes. Yeah. I thought that Kubeception was a really was a really good kind of prominent capture word play on what we're gonna be looking at today. I I I really wanted that to be there. And in fact, we have a comment from Russell on the chat as well. So is this Kubernetes inception, many Kubernetes clusters, and say the one Kubernetes cluster? Kind of. I think we'll see in just

13:23 a minute. I'm gonna let Adi present some slides, and we'll take it from there. Engine does ask a question as well. I don't know if you wanna tackle it now or wait till after your slides, but he says, is this similar to SAP Gardener in a way to handle the control plane? Yeah. It's pretty similar, but we will show you. I don't want to spoil the plot. Alright. Well, you are sharing your screen, so I'm gonna pop over there. Feel free to take it away. And if I have any questions, I'll throw them at you as we go.

13:43 Challenges of Multi-Cluster Management (Slides Start)

13:51 Sure. Sure. Great. Thank you so much. So welcome here, and I would like to present you Kamaji. And as David said, this is the exception because we are doing something really weird, at least according to the majority of people. But I would like to start from the beginning because I've been a developer. I've been on the other side, and I was one of the people asking, would like to use Kubernetes. But I don't want to mention that. And right now, in 2022, a lot of people is using Kubernetes. I say most of the companies around the world are

14:28 using Kubernetes because it's used to create your software and service solutions. It's used by AI, machine learning, or analytics applications. And also on the edge, next to the Internet of Things. And as I said, I've been there. I can say that's shameless. Sometimes we feel the burden of the complexity of Kubernetes. So we would like to offload everything to managed providers, or if we can, also to the infrastructure team if we if we have an organization like that. So why we need that? Because in the end, we are developers. We would like to focus on the applications, and we don't want

15:12 to manage a Kubernetes cluster. I don't know. Maybe there is people around the world that are excited to get pages during the night. I wasn't, so I would like to get something that is solving all the problems. So this is the reason why I started developing Kamaji. And what are the challenges? Because with Kamaji, we are aiming to build a sort of building block. It's not a product. I'd say it's just one of the tools to create multiple Kubernetes clusters. And there is a huge lack of tools, especially in the open source. We know that there are some players that

15:53 are already providing multicluster management. For example, there is OCP, OpenShift Container Platform, by Red Hat. There is also SUSI that acquired Ranger a few years ago. We got also VMware Tanzu. And they're absolutely great developer experience on top of that. It's absolutely mesmerizing because I also work with OpenShift Container Platform. And for developers, it's great. But there is a there are some downsides, I'd say, because all these tools are designed for enterprises. So it means that you have to stick to their way of doing stuff. And besides that, there is also a high total cost of ownership. You got licenses. You

16:37 got operations. It's not straightforward. I say it's not real Kubernetes because in the end, are using Ranger. You're using Tanzu. You're using OpenShift, and all these solutions are built on top of Kubernetes. Although, pretty sure that the API server is the same, but I hope that you're getting the point. It's not clear Kubernetes. It's a opinionated way of managing Kubernetes. And last but not least, all these solutions are pushing on the way to manage your clusters by spinning up multiple clusters. So you're ending up, or you could end up, in the cluster sprawl. And what's the cluster sprawl? It's a nice

17:20 term that has been inherited by the city sprawl, by the sprawl of neighbors when the city, when a city, is growing a lot. And this is the first reference to inception because this is the cluster sprawl because we are in the gap with a lot of clusters, a lot of skyscrapers, a lot of houses, and so on and so forth. So it's really hard to maintain. I've been there, as I told you before, just few clusters, but it was really cumbersome. And you can imagine what it could be with huge organizations where you have to manage and keep care

18:04 of multiple clusters. Anyway, there is another meme. And why we started this project? Because I love complex stuff, to be honest, especially when you have to write when you have to write code for complex stuff. And I remember the first time that I saw Docker in Docker, user used by GitLab. And it was really interesting because you're ending up with Docker inside a Docker container, and you can do a lot of stuff. Essentially, GitLab pipelines is built on top of that. And then suddenly, I discovered the operators that are really great because in the end,

18:10 Kubeception: Managing Kubernetes with Kubernetes

18:44 you are floating all the SIL task to a software, to a binary. This is running inside the cluster. And it's a binary. It's a software. It doesn't have bugs. Well, I'm just joking. But, you know, you can do whatever you want and focusing on your infrastructure or your code and so on and so forth. And then I saw also, I'd say together, the database on Kubernetes, the rise of databases on Kubernetes. And I remember that a lot of people were saying, no, no, no. We cannot use databases on Kubernetes. It's too much complicated. We can not

19:20 we can we can't do that. And in the end, we saw a lot of vendors starting developing operators, starting offering database as a service backed by Kubernetes itself. So it means that it could be done. And lastly, I ended up also with Kind that is Kubernetes in Docker. So it's inception inside inception. So we are ending up with a Docker container that is a Kubernetes cluster that you can use for testing, for CI, for development. For example, I'm using Kind a lot for the development. So it's 2022, and we can do that. But while I

19:39 Kubernetes in Docker

19:57 was thinking about that, I started asking myself, I cannot really do that because we know that in the end, Kubernetes cluster is pretty complex. I'd say we got a lot of components, a lot of business logic. I mean, it's not like a classic web page where you got the database, you got the front end, you got the back end, end of the story. But with Kubernetes, obviously, we got a lot of components involved. We got the certificate authority. We got certificates management. We got the back end storage, and so on and so forth. So I started asking myself, how can I do

20:30 Technical Foundation: Etcd Namespaces

20:32 that? And I started trying to define the pillars of Kamaji because our idea is saying, okay, we got applications that are running in Kubernetes that got a back end storage. In the end, Kubernetes is made of several components. I'd say the first one is the stateless one, because we got the API server that is connecting to AdCity. Then we got the controller manager and the scheduler that are connecting using the kube config for their own component using the API server. So API server is absolutely stateless. All the state is stored in ad cd. So the idea was to say,

20:36 The Pillars of Kamaji

21:13 let's try to move the multitenancy from the infrastructure level, so from the virtual machines, to the containers, as we are doing with the applications. But multi tenancy must be moved into the data store. So you know all that. Etcd is the key value store that we are using for Kubernetes. And starting from version 3.5, it has been introduced a new feature that is named the add cd namespaces, and too long didn't rent. It's a sort of schema of the databases. So in MySQL or PostgreSQL or whatever database you would like to use, you got the schemes,

21:53 and essentially it's the same stuff. So in the etcd, you know, to be honest, you're ending up just with a prefix of the keys, but you can use certificates, user roles, just to segregate this data, and you're ending up with namespaces like the ones of Kubernetes. And after that, as I told you all before, it's kube inception kubeception because in the end, we are using Kubernetes to manage Kubernetes itself. So our Kamaji is orchestrating the pods that will be the virtual control plane and it's going to manage also the shared data store. But I guess that also a picture is

22:35 more worth than a thousand of words. So here it is, the architecture, and I'll try to describe it in the best way possible. So Kamaji needs a Kubernetes cluster, because otherwise that will be a next chicken problem. Because we have to manage multiple clusters, but we need a cluster. So the cluster must be created in advance. It means that we need an admin cluster, something that will be named a management cluster. And this could be any Kubernetes compliant cluster. It could be whatever version you will like. For our demo, we are going to use one

22:44 Kamaji Architecture Explained

23:13 twenty one or one twenty two. I don't remember. We're in this cluster. And inside this cluster, I we are going to install also Kamaji through an Elm chart, and this Elm chart is installing the CRDs. We got the Tenant Control Plane CRD that will spawn these pods that are the virtual control planes. But how these pods are connecting to the AdCity? Obviously, we got a multi tenant AdCity that, for the sake of simplicity, has been deployed inside the admin cluster, but it could be also outside of your cluster. It's not a prerequisite. It's up to

23:24 Installing the Crds

23:54 you to manage etcd. We developed the ham chart, and the ham chart is taking care of setting up the etcd, enabling the feature for the multi tenancy, generating the certificates, and so on and so forth. So with that said, when we are creating a tenant control plane using Kamaji, we are ending up with a pod. And this pod is exposed using a node port, an ingress, exposed using a load balancer service. And obviously, those pods are a cluster, but they are missing the nodes. Because Kamaji is not using the nodes of the admin cluster, but is exposing

24:35 all the, I would say that, the endpoints, all the information required to perform the bootstrap of nodes to join that cluster. So in the end, we are ending up with the definition of the tenant node pools. So it means that these are worker nodes, and they could be also containers. They could be Firecracker virtual machines. They could be virtual machines or bare metal instances. They just need to join the cluster using kubeadm. And another meme, because I'm really in love with meme, because you can understand there are so many components that are moving together. Because

25:15 when we are dealing with the API server, we need certificates, we need the kube config, we need the config map with the kube adm configuration. There is also a spoiler alert down there. We got also connectivity. So Kamaji is trying to simplify the management of all these silly task in an automated way. So I'd say that the next the next step is starting the demo time. And, obviously, I have to summon the demo god because I tried several times the demo. It's working, but, you know, it's also Murphy's Law, so fingers crossed. Yeah. Now that we're all looking at a

25:40 Demo Preparation & Etcd Question

25:56 whole different ballgame. But we do have a question on the chat if you're happy to answer that just before you jump into the demo. Sure. Sure. So Russell says, is the admin cluster etcd also the etcd used by each tenant cluster, or do we need separate ones? Yeah. I'd say that, you know, it really depends. But, honestly, I would avoid the single point of failure because in the end, what happens if you break the ad cd using Kamaji or something like that. So you can do that. You can use the same ad cd, but honestly, I will avoid that

26:40 just to be sure to decrease the point of failures. So having add cd for the tenant clusters will be much better. So you're decoupling from the admin cluster one. Okay. You know, you're you're not putting all the eggs in the same how to say that? Basket. Basket. Yeah. Sorry. Yeah. I was speaking Italian. You know? So yeah. So I was just gonna continue with that a little bit. So you can have one entity cluster. And using entity namespaces. You can have the admin cluster, one namespace, and all the tenants on another. But in production, you probably want one entity for

27:21 the admin cluster and then one entity shared amongst your tenant. Is that what you would kind of promote? Yeah. Yeah. Absolutely. Also, keep in mind that I don't want to spoil it, but I will I haven't used entity in production. But I don't want to spoil that, like, because it's really interesting. I'll show you later. Okay. Anyway Yeah. Carry on. Carry on. On. Carry on. Keep in mind that we need Kubernetes cluster. So what I did is to install well, not install it, but creating Kubernetes cluster on AKS. So I'm on Azure cluster info, as you can see here. So we create

27:43 Installing Kamaji

28:02 a Kamaji in the zone of AKS. It was West Europe. And this management cluster, this admin cluster, as you can see, is 1221, will, get installed with Kamaji. So how can I install Kamaji? It's pretty straightforward because I have to use Helm upgrade install. I don't remember the order of the arguments, but I got my history with me. So I'm installing Kamaji from the chart that I got in my local directory, so Ham Kamaji, in the namespace Kamaji system, and creating the namespace. That's great. And now in the other pane, I'm going to watch all the pods that are going to be

28:55 created. So using KubeNS, I am switching the namespace in Kamaji system and performing with my beloved aliases. Kubectl get pods minus w. So I am watching the pods. When you're installing Kamaji, by default, you're getting up with a net CD in and we are taking care of generating certificates, enabling all the features, and so on and so forth. So right now we have to wait a few minutes, I'd say, for the free replicas, and we should end up with everything installed. Yeah. So in the end, as you can see here, we got etcd, free replicas, and Kamaji.

29:43 And we got also our CRD deployed, and it's the tenant control planes Kamaji Classics IO. And also, our operator has been deployed, and I can check the logs using kubectl logs minus n Kamaji system. I don't remember. So, yeah, I go to this three. I love this three. Minus l, Kubernetes IO component controller manager, minus c manager follow. Just to be sure that the controller is up and running, is watching all the events, and is starting the workers. That's great. And now what we have is to deploy a manifest of the talent control plane. So I am entering in classics,

30:22 Deploy Manifest

30:30 Kamaji, And for this demo, I already have a sample that I'm showing with you. So this is the API definition of the CRD, and the name is Rawkode. Keep in mind that all these resources of the tenant control plane, since our spawning pod secrets, config map, services, and so on and so forth, are names is is a namespace scope resource. So we can specify the namespace to default. And we can define also various options. So we can define the options for the control plane and managing the deployment, the service, and also the ingress. Well, with the ingress, it's really hard because

31:21 Kubernetes has not been designed to use to get exposed using an ingress. So you are not barely using fully qualified domain names. Just for the sake of simplicity, I'm going to use a load balancer. So AKS well, Azure is going to deploy a service of load balancer type and giving me an API, a public and routable IP address. Not API, IP address. Then I'm defining also the deployments up to two replicas. And why two replicas? Because in the end, if you think about that, we are just managing the API server, control manager, and the scheduler because all the state would

31:58 be managed by etcd. And now here comes the really important stuff because we can define all the information regarding all the information regarding Kubernetes. So I'm saying I would like to deploy one, two, two, three, And I can set also a configuration for the cubelet, for the nodes that are going to join my new cluster. I'm going to say that I would like to use the cgroupfs systemd, defining some admission controllers, and I can say also extra parameters, extra arcs for all the components of the control plane. So if you would like to enable some specific

32:32 cloud integrations or specific flags, you can do that. So we don't have to create a mapping with all the possible informations, all all possible settings of the API server, the scheduler, and so on and so forth. Just the most important ones. Okay. I'm just gonna interject very quickly, just because you answered a question there. So engine did ask in the comment. Can we set feature flags? So the answer is, as you asked, using the extra Yeah. Yeah. The extra. Yeah. Absolutely. Absolutely. Because otherwise, it will be really hard to follow everything. And then we got also network profile, And network

33:04 Network Profile

33:06 profile is really interesting because we can set the address if we already know the address, or otherwise, we can set the port that we are expecting to get connections. This example, we are going to use the sixty four forty three, the default one. So I am going to get my load balancer with my public IP, and it will be the internal control plane will be reachable using that IP on this port. And then I got also the add ons. So if you are familiar with kubeadm, obviously, we are following the same structure. So with kubeadm

33:35 Add-Ons

33:44 in it, I don't remember, but pretty sure just need to read the docs. There is the add ons, so you can configure the core DNS and kubeproxy. It's optional, kubeproxy, because if you would like to use, I don't know, maybe Cilium, I don't want to use kubeproxy, you can do that. For information. Sure. Sure. So can you can we specify our own CNI here, or are we bound by by admin cluster? No. No. It's up to the admin cluster. Oh, yeah. Okay. Cool. That's what I Not the admin cluster, but I'd say the 10 the cluster owner,

34:20 so who's going to use Rawkode. In this case, I can show also the cube copy to you so you can type from your local machine interacting with the cluster if you want. Will be great. Yeah. I wasn't I wasn't thinking about that. I'm going to share with you also configs, you're interacting with the cluster. Anyway, yeah, we had a discussion internal and classics if we will like to provide also installation of the CNI, but we try to stick to the to the convention of kubeadm and cluster API. So when you're spinning up a cluster using q

34:57 a cluster using cluster API, the CNI must be installed after Yeah. The extra provision. Yeah. Because I could use different CNI for each cluster. So we don't want to be opinionated, but we can talk about that. Don't worry. We are looking for feedback. So this is our definition. That's great. So what I would like to do now is to apply that. So just to be sure oh, it's Friday also for me. When as default, kubectl apply minus f config samples, Kamaji v one alpha one, then a control plane. And I'll open also another tab watching for kubectl

35:18 Provisioning the Tenant Control Plane

35:48 get tenant control plane. Yeah. The short name is pretty misleading, TCP, but, you know, we we do love doing jokes. So TCP is standalone control plane. So now I can create create it. Fingers crossed. I'll watch again for the logs. And as you can see here, the logs are polluting us, saying, I I don't have an address. We have to wait for the address. In fact, as you can see here now, in the get TCP, in the watch, we are ending up with a control plane endpoint. So this is the routable IP that we are going to use

36:27 for connecting to the cluster using this port, using this kube config, and the status saying that it's in provisioning state. And in a matter of seconds, I'd say, yeah, just now, after forty five seconds, we got a cluster that is ready. In fact, if I try to issue Carl minus k, this control plane endpoint, we are getting our cluster. Right. But, obviously, I cannot use Carl. If I would like to interact with the Kubernetes cluster, I would like to use a kubeconfig. That's fair. In fact, Kamaji is also generating the admin kubeconfig that is stored

36:45 Accessing the Tenant Cluster

37:12 in a secret named Rawkode admin kubeconfig. Rawkode is the name of my tenant control plane. So what I have to do is to search on my history, and kubectl gets secret Rawkode admin q config, output JSON, j q minus r data admin config, base 64 decoding, and piping to TMP Rawkode, which will write a bash code or a plug in to do that because it's pretty cumbersome. I always use the history. Okay. We got our kube config. I'll show you just for the sake of sharing. And as you can see here, we got our certificate authority data. We got our server,

38:02 our context, and, obviously, also the certificate to interact with the class. With that said, I just need to override the kube config config to TMP Rawkode Rawkode, kubectl version. And as you can see here, we got our CLASA because it's 01/23, and it's the same version that we saw in the tenant control plane definition. So '1 '20 '3 and '1 '20 '3. If I try to issue again kubectl get nodes, obviously, we don't have any node. Why this? Because we just provisioned the control plane. And we will like to use Kamaji just for the control plane pods, the virtualized

38:53 control plane, because the nodes will be in another region. And for this demo, I'd say that the nodes are not on AKS, well, are not on Azure, but rather they are on AWS. So with that said, I already prepared also on AWS a launch template. This launch template is using Cloud Init to install all the root power components. Let me check because I'm not so much familiar with AWS, especially using the UI. Yeah. As you can see here, we got the cloud config, the Cloud Init configuration. So we are installing containerd, car, performing all the well, all the

38:56 Setting up Worker Nodes on AWS

39:41 obvious task that we have to do to set up Kubernetes cluster. So installing the kubelet, kubeadm, containerd, and so on and so forth. Okay? And we should have also the configuration for containerd. I'm a bit worried because I see that it's missing a parameter, but ah, yeah. Because I have to use the latest version. Yeah. We have to use the system DC group driver because otherwise we will end up with some issues. Anyway, from this launch template, I would like to create some set of nodes. And these nodes that are on AWS will join the virtual control plane deployed by

40:24 Kamaji that is deployed on Azure. So Rawkode worker nodes, I'm going to use, the template description. No. I don't want that. Auto scaling guidance, no. Template tags, source template. I would like to use this one with the latest version. I don't I don't need to use the oh, my bad. Sorry. I don't have to create a launch template, but I have to start note screening group from this template. My bad. So Rawkode oh, Kamaji Rawkode worker nodes using my launch template, the latest version. That's great. We are going to spin up t free medium machines.

41:20 I'm saying in my VPC, all of these, t free medium, it's okay. Let's go for next. No load balancer. No monitoring. Not at all. We just need some machines. So desired capacity, let's start with one. Maximum capacity, 10. And now I'm going to create everything. Okay. VPC settings, create auto screening groups. And in the meanwhile, I'm going to share with you, David, the file of the kube config. And I think that I can use the commands. I'll share with you via email, if you don't mind. Okay? Yeah. Yeah. Sure. Go for it. In the meanwhile, we can wait for

42:15 the node to get provision. Yep. Can I ask a couple of questions as well? Absolutely. Absolutely. Yeah. So you haven't specified the KubeConfig for the worker nodes. Are we gonna be doing that interactively? No. No. No. There are some default values. We are using just the default values for the sake of simplicity. Okay. And see if the control plane containers, like I mean, that was pretty fast. Like, thirty seconds, and we had a control plane. Are those images that you're building and publishing? Are those being like, how do you The images were already cached, so

42:20 Control Plane Details & Design Discussion

42:53 it really depends also because I already tested that before the video stream. But, yeah, you have to think also about the pooling of the container images. But But what images are told? What imagery are you using for the Absolutely. I can show you that. So let's edit the Rawkode deployment so we can also take a look to the internals of Kamaji. So, obviously, we got the owner reference control well, we got the owner reference to the time control plane resource that is used for the operator pattern, and then we got our containers. And in the containers,

43:27 Containers

43:31 we are defining all the arguments for the API server. So we are defining the etcd servers. And as you can see here, we can define also the etcd prefix. So this is the namespace that we are going to use in the multi term etcd. And then we got the etcd key file and blah blah blah, all the informations. Then we got also the following one that is going to be Schedula. And as you can see here, we are using the k eight s VCR IO images. Yeah. These are just the standard upstream This is one. Control plane images. Perfect. Alright. Cool.

44:13 Absolutely. And also for for the controller manager. Obviously, are wanting all the certificates, the CA, and so on and so forth because all the communication, although, is living inside all the components are inside the same pod. We were thinking about ending up with multiple deployments. So as you can see here, we got a deployment, two replicas, and three containers. And the reason for that is that in the end, we would like to use the loopback interface for faster connection between the controller manager and the scheduler. So in the end, we don't need a leader election.

44:31 Multiple Deployments

44:54 We don't need to scale differently the controller manager or the scheduler. Or we could potentially have that need, but we are open for discussion for that. Keep in mind that it's an open source project, so the governance must be managed by the community. So we are really hoping to get your point, your feedback about that. From the beginning, it was much easier using a single container well, a single pod with multiple containers. So let's see also what the community think about that. Anyway, we should have now, fingers crossed, our instances. Yeah. We got our instance.

45:25 Checking Worker Node Setup

45:35 Now I'm connecting to it just to be sure that everything has been configured properly. So Ubuntu, that machine, and let me check if Cloud init has been completed or not. I'd say so because we got kubelet, kubeadm, kubectl, and just to be sure, I like to play safe container d config tamag. Oh, it's not my laptop. Okay. System d c group true. That's perfect. Okay. So now that we got our node, what we have to do? Well, obviously, we have to extract the token for QADM to let, to let the node join the cluster. So with that said, I export the kube

46:21 Joining Worker Nodes to the Tenant Cluster

46:34 config to Rawkode. We don't have any node, and what I have to issue is kube ADM token create print join command. So in the end, it means that we are going to use the exposed IP address on the port with this token, with this discovery token CA cert hash that is used for the CA. So with that said, in the pane that is SSHed to the node, I am passing the command and adding b three just because the guard has been really good for to me. Everything is working, but, you know, I just want to be sure.

47:23 And in the meanwhile, I'm going to look at two nodes and pods. Obviously, keep in mind that we don't have any node, and obviously we've got Kubernetes, Kube proxy that has not been deployed. It's not a big problem. So now in the node, in the AWS node, I'll issue kubectm join, and the node will join the virtual control plane that is deployed on AKS. We just need to wait for the TLS bootstrap, but here it is. As you can see here, we got our node, 100.280.98, 1 0 0 2 8 9 8, and we got our kube proxy, our cover DNS,

48:14 and so on and so forth. But, obviously, there is something that is not working because the node is not is is not ready. So what we have to do is to export again our Rawkode and apply the manifest of Calico. At CNI, you can solve whatever you want. It was working for me with Calico, so don't want to test with another one because everything is so perfect right now, so I don't want to waste everything. So installing Calico, it has been installed. And as you can see here, we are ending up with the Calico node that will install all the CAE

48:53 and all the binaries and configurations. And our node should be marked as ready in a matter of seconds, or at least I hope so. Yeah, you know, that's been marketed as ready. So we are ending up with CoreDNS. We are ending up with Kalicoke, KubeControllers, and all the containers that we are going to deploy inside of our cluster. So just to recap, we got Kamaji installed in a Kubernetes cluster also on Azure. This control plane is exposed using a load balancer. The pods are on Azure, and the nodes are joining using the nodes are joining from AWS.

49:18 Recap

49:39 So you can use whatever you want. You can do that on bare metal. You can do on your even even on your virtual machines using KVM, virtual machines, or Firecracker VMs, whatever you want. And that's great because in the end, it's I'd say obviously, I'm I'm saying that's shameless because this is a project I work to I work and contribute to. But getting point, you can use Kamaji to create as a building block for a truly managed Kubernetes service. You can imagine maybe a portal, an application, where you're telling your developers, click here to create your Kubernetes cluster.

50:20 Give me your credentials for the AWS account so you are accountable for the cost of the control of the worker nodes. I'll send you the kube config and so on and so forth. I will install everything on your behalf. So in the end, it's a sort of framework to build whatever you want. But now, there is something that is not working, and it's the following one. So everything is working. So I'm really happy for that. We don't have pods that are in restart mode. But let's imagine I would like to take a look to the kube proxy logs.

50:47 The Connectivity Challenge (Control Plane to Worker)

51:04 Obviously, I can do that from the machine. Try CTL ps. Try CTL where is? Kube proxy. It's the latest it's the last one. Try CTL logs minus f. So everything is fine, but I don't have access to that machine because that machine has been installed by the developer. I have the credentials, obviously, but I will try to extract them, connect to the AWS, and blah blah blah. It's a pain. Also, maybe I couldn't have also the SSH key to connect to the node. So as an administrator of the cluster, I could say, kubectl minus n, kubectl, log,

51:47 minus f, kubectl, it's not working. Why? Because the node is IP 1002898, and it's trying to connect to the cubelet port. And here comes the troubles, because obviously, if you think about that, Kamaji is on AKS, is on the Azure network, and the worker nodes are on AWS. Obviously, I can say all the worker nodes must be must have a routable public IP. I'm letting join the nodes using their public IP, but that will be recumbersome also because it means that from the security perspective and also from the security assessment, you can imagine, for bigger organization, it will be terrible, I say.

52:40 You have to open the ports, all the firewall, the WAF, and so on and so forth. So it's not manageable. Okay? We noticed that. So there is a nice feature that I would like to show you. And let me unset the kubeconfig, and I'm going to change the definition of the tenant control plane. As you remember, we got the add ons. And with add ons, we can configure coordinates and kube proxy. But there is also another component, I'd say an add on that unfortunately is not managed by kubeadm, and I'm saying that because trying to wire

52:50 Solving Connectivity with the Add-on

53:21 everything in Kamaji has been a pain due to that, and it's connectivity. Connectivity, I'll try to say that using the terms from the documentation. It's a reverse proxy for the control plane communications to the worker nodes and vice versa, or vice versa. But I prefer to say that it's a reverse exploit tunnel. Because in the end, what's going to happen is that we are ending up with two components. The connectivity server that is on the control plane, so on Kamaji, and the connectivity agent. So the connectivity agent is launched as a pod inside the worker nodes. The

54:10 worker nodes, obviously, they can connect to the Kamaji tenant control plane. They establish a connection, and using this connection, they are starting a bidirectional communication from the cubelet to the API server and vice versa. So with that said, I also just set the connectivity port, or rather the proxy port. And I'm going to say eighty one forty two. That's great. Now what I would like to do is to create another tab, and I will show you with watch minus n heap CTL get pods secrets service. Maybe this one. Yeah. What's going to happen? So I can enter

54:11 Deploying Connectivity Components

55:07 the path where I got my manifest, Kamaji, apply my Nuxt config, samples, and a control plane. And here comes the pane. Because as you can see here now, we got a new certificate. That's the connectivity certificate. And we are spinning up a new deployment. It means a new pod. Why that? Because we need to get the proxy agent. Well, the proxy server sitting beside the API server, because the API server is aware that there is the connectivity server that will be the tunnel to communicate with the worker nodes. So let's wait now for the reconciliation.

56:02 But as you can see also from the logs, we got Rawkode, so our instance of the time control plane has been reconciled. So if we take a look here, we got our service Rawkode, our load balancer on Azure with this IP, and now we got also 8142. So the connectivity port, so far so good. So now we can switch again on our tenant control plane. Well, yeah, on our tenant cluster. And as you can see here, now we got a new port. Well, it's not a port, it's a demo set. And it's connectivity agent that has been spin up eighty seconds

56:10 Connectivity Established

56:43 ago. And since I know that things can go wrong, I just want to be sure that the connectivity the connectivity, yeah, of the load balancer is good enough. So let me check from the logs. Price CTL logs minus f is one. Yeah. Start serving. So with that said, remember that the AWS node has been announced using a non routable IP, so the API server cannot connect to the internal cluster network because they don't have the same CNI. So it means that webhooks, validating webhooks, mutating webhooks, or whatever it is, it shouldn't work. Now what I have to do is to

57:29 Demonstrating Control Plane to Worker Communication

57:31 export again the kube config for the parent cluster, and I can issue kubectl minus n, kube system logs minus f, cube proxy, and here comes the magic. Nice. So it means that our connection is getting processed by the Tenon control plane API server, is going to be forwarded to the Unix domain socket of connectivity agent that is connected to the node where the pod is running and can interact with the local cubelet. And with that said, it means that we can retrieve logs. We can install, obviously, our CRDs because in the end we got our a different schema.

58:14 Demonstrating Advanced Features (Capsule Installation)

58:16 For example, now what I can do is to say, m upgrade install capsule. It's our shameless self promotion. Sorry for that, Daniel. But I can install custom CRDs. And these CRDs are interacting with validation webhooks, conversion webhooks. So it means that we are ending up with a lot of stuff, a lot of moving stuff. And in a matter of seconds, are ending up with capture that has been installed properly. So you're ending up with multi cluster management made possible by Kamaji with the multi tenancy inside that cluster made possible by Capture. We just need to wait.

59:01 But it should be okay. Fingers crossed. It was working. I think I tried that. Everything has worked so far. It's not gonna stop now. I think you're good. Yeah. Come on, Cap. It's a it's a matter of resources. I remember that. It's a matter of resource. So just to be sure, captured system, capture control manager, limits because I'm not using so powerful machines. That's the reason. Bug in in production. Okay. Yeah. It has been solved. So Captual has been installed. So custom resource definitions inside the ad c d, and everything is working. Very cool. And with that said, I think that

1:00:01 we finished our demo. Well, how brave are you feeling? Because Yeah. Yeah. Got an idea. And that's all. So Do you wanna jump back to your terminal? Yeah. Absolutely. Absolutely. Well, our control plane here, can you run kubectl get TCP? So the kubectl control plane. You in this pane, I can do that. Yes. So that is Kubernetes version one twenty three dot zero. Can we upgrade that to one twenty four? Absolutely. Sure. Sure. So let me disconnect from here, and let's go for watch minus n. Kube CTL get panel control plane. I'm going to edit panel control plane, Rawkode,

1:00:03 Upgrading the Tenant Control Plane

1:00:51 and let's search for the version. Like a free one. And as you can see here, we are entering the upgrading state. And in a matter of waiting for the images getting downloaded and placed in the in the notes, we should ending up with the status upgraded and so ready. So it's just a matter of twelve seconds. But, obviously, keep in mind that we just upgraded the control plane. So you have to manage everything on your own, like cluster API, Terraform, Ansible, or whatever it is because we just upgraded the control plane. And it's okay. I

1:01:41 mean, you upgrade the control plane and then you upgrade the nodes. There is no problem with the version skew between worker nodes and control plane nodes. Nice. I like that a lot. And engine says he had that question too. But engine does have one more question. He was asking if we would be able to add kilo to the mix, which is a WireGuard based cross multi cloud component. Kilo. I I don't know what is kilo. Never heard about that. Yeah. I I think it would work. I mean, we don't need to dive into that right now, but there's something to experiment with

1:02:19 another day. So Yeah. Yeah. Absolutely. Alright. Well, that sorry. On you go, if if you wanna Yeah. Yeah. I was missing this slide. My bad. Because I fucked up the order. So we got also a roadmap for that because we know that this project is pretty young. It has been open sourced during the KubeCon in Europe, in Valencia in 2022. So it's really, really, really young. And we are looking to get some benchmarking and stress tests. So please try to destroy Kamaji. We are looking for that, and we are really happy for that. But obviously, we know that we

1:02:30 Roadmap

1:02:58 have still a lot to do, especially with that kubeadm is the right tool to manage Kubernetes cluster integrations, we would like to go in that path. And if you think that's the not correct one, please share that. We got the discussion on GitHub in the Kamaji repository. Start discussion. We got also, I forgot to say that, the channel hashtag Kamaji in the Kubernetes Slack channel. So join there so we can talk about that. We will be really happy for that. But, obviously, my demo was totally focused on the click ops. So I connected to AWS. I

1:03:38 clicked the ear down, blah blah blah blah. And, obviously, it's just for the demo because having everything automated will be absolutely great, but you will miss all the moving pieces behind that. So we are working with the cluster API integration. We are working also with the Terraform because a lot of people wouldn't they don't want to use cluster API because they are managing Terraform for the whole infrastructure. And in the end, the TerraControl plane is going to be considered as a component of the infrastructure. And, honestly, also with cross plane. But since I got this heritage from cyber mobility

1:04:14 engineering, I'm working also and I'm really happy. I will be really happy also having had to achieve that. We are working with the Prometheus metrics because we need to get monitoring and alerting. Obviously, you can install the metrics server in the time control plane, but we would like to get also overview of all the clusters. So getting know how many clusters are performing the upgrade, how many clusters, for example, they got CA that is near the end of the validity and so on and so forth. Or eventually, how many tenant control planes have been restarted

1:04:58 for any reason? And last but not least, it's my it's really interesting, this one, is the Kine integration. Kine is a nice project. And if you recall, I told you all, if I will use Kamaji production, I will never use Acd. And someone will say, starting raising an eyebrow, how can I use Kubernetes without etcd? I need that. And that's true. But this project that has been formally developed by Ranger people and right now is in process of getting donated to the CMCF. It's in the incubating or staging phase, I don't remember. Anyway, with KINE, you

1:05:41 are using an adapter. So it means that there is adapter that is, how to say that, is is conformed, is using the same signatures of the HCD. So you got this endpoint, and these HCD endpoints are the same signature of a real HCD. And you can use a driver to say all the rights shouldn't be forwarded towards an etcd, but rather against a MySQL, a PostgreSQL, or eventually also SQLite database. Obviously, I will use a database as SQLite for production. It really depends on this. You know so with that said, I truly believe that Kubernetes is great,

1:06:33 although sometimes I feel the pressure of the complexity behind that. But operators are saving our lives. I mean, with the operators, we are removing the toil of managing complex infrastructure. Kubernetes itself is solving a lot of problems. So if we can use an operator to spawn multiple Kubernetes control planes, And then these control planes, obviously, they need to scale out because SSD got too many limitations. The first one is regarding its size. You cannot exceed the eight gigabyte of storage. Otherwise, You know, I'm Italian. I use a lot of Latin words. So It means that here comes the dragons. So

1:07:22 you don't know what could happen. And I don't want to deal again with split brain of etcd or unrecoverable data from etcd. And with that said, we are also working with kind integration. We already have MySQL one. We are working with the Postgres SQL one. And that's great because with the operators, there is a nice project by EDB, EnterpriseDB, that developed the CloudNetIPG. So it's an operator that we are defining a custom resource. And from that custom resource, you're ending up with the CA, with the certificates. So it's just a matter of connecting all the things together.

1:08:04 And with PostgreSQL managed by Kubernetes itself that can auto scale, your ending up of managing a Kubernetes cluster at scale like an hyperscaler, I'd say. I'm not saying that Kamaji could replace AKS. Obviously, no. Neither EKS. But it could be used by your organizations, also by your company when you are selling or providing Kubernetes as a service to your developers or other teams. And I think that's really great because it's the power of the open source. So you don't need to rely on companies. Obviously, they need to build. They need to pay the salaries for all the software engineers

1:08:45 employed by AWS, Azure, and so on and so forth. But you can do that on your home, at home, with a shared governance from the open source community. And I think that's more interesting rather than, okay, by AWS or by Azure. End of the story. I can say a word about the direction of the project. I don't even know the internals of AKS or EKS. With Kamaji instead, you just need to issue the discussion. With the discussion, we're starting a discussion with the whole team members, with all the community, and we are just committed to deliver

1:09:22 our reliable software. End of the story. So these are the links, and I don't know. Maybe I talk too much. My bad. No. No. I I I love what you're saying, and it's all useful context around what's going on here and how cool this is. I think we have one more question on the chat. What I'll say is before I pass that question on, if anyone else watching has any questions before we finish up, please drop them into the comment section now. The question that we have is from engine. Again, engine is asking, like, do you plan

1:09:25 Q&A: Other Kubernetes Distributions

1:09:52 on supporting other Kubernetes distribution? Could we use Kamaji for KCS or k zeros or etcetera? Oh, that's a nice question, honestly. We can think about that. Yeah, yeah. I don't think there is any downside about that. Keep in mind that we are just using the default images because in the end, if you would like to use multiple clusters, you're ending up using products by K3S, by Muralantis, or K3S by Ranger, and so on and so forth. So we just we were trying to commit to the vanilla solution. But, yeah, we can think about that. I

1:10:31 don't know. Maybe we are but, yeah, k three s is CNCF compliant CLASA. Yeah. So it's a compliance. So I think that also the signature of the CLI flags are the same. So open an issue, open a feature request. Let's try to discuss that, and we can think about that. Absolutely. I will lie. I I will I will love that. Yeah. Absolutely. Very cool. Alright. I don't think we have any more questions. So with that, I will say, Dario, thank you so much for taking time out of your day and showing us Kamaji. It's a really interesting project.

1:10:57 Conclusion and Edge/IoT Discussion

1:11:07 I'm really I I think it's something I'm gonna have to experiment with on my bare metal clusters and try and get more more utilization of those control plane notes and bring my own worker notes. So I'll definitely kick the tires on it a bit more and, open any issues, but I'm excited. I think this is very, very cool. Yeah. Thank you so much. Engine is saying, I see Kamaji in the edge IoT context. Feel free to share more on that. Like, is that something you had considered? Well, in the end, if you think about that, I saw well, it's not my field,

1:11:39 honestly, because I've been a web developer now playing with containers. So the edge is really hard, also the bare metal, to be honest. But I remember that the great problem of the edge is the fact that you're ending up with a control plane on the edge. You need a control plane on the edge because otherwise, it's not like Docker's wall. Okay? With Docker's wall, I remember that you were ending up with multiple nodes that were acting as control plane and also as workers, but it was really a bad pattern, I'd say, because you're mixing up the roles. So what happens

1:12:16 if an application is killing the control plane components? Instead, with Kamaji, in the end, you just need, this sort of connection with the control plane, and you don't need to deploy the control plane on the same edge location. But, yeah, it's something that will be absolutely interesting. To be honest, I'm missing the overall picture of the edge because it's a different word word than me from my usual one. But, yeah, I think so. And if I'm wrong, tell me that. No offense at all. Alright. Well, I think we'll leave that there. Thank you again for joining me and for sharing.

1:12:57 I hope you have a wonderful day and a wonderful weekend. To everyone watching, thank you for your questions and for joining us today, and we'll be back soon for more Rawkode Live later. Alright, Daniel. Have a good one. I'll see you later on. Thanks. Bye. Bye.

Technologies featured

Meet the Cast

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Comments, transcript, and resources

Documentation

More from Rawkode Live

View all 173 episodes
Kubernetes

More about Kubernetes

View all 172 videos
etcd

More about etcd

View all 24 videos
Cluster API

More about Cluster API

View all 7 videos