About this video
What You'll Learn
- Bootstrapping FluxCD Toolkit with gotk and repository-backed source control for cluster state.
- Defining multiple Git and Helm sources, then reconciling changes with dependencies and health checks.
- Deploying and managing Helm releases with values overrides, test hooks, rollbacks, and observability tooling.
Stefan Prodan walks through the GitOps Toolkit (Flux v2): bootstrapping with gotk, the source, kustomize, helm, and notification controllers, then adding Git and Helm repository sources, customizations with dependencies and health checks, and Helm releases on Kubernetes.
Jump to a chapter
- 0:00 Holding Screen
- 1:25 Introductions
- 1:31 Introduction
- 2:00 What is GitOps / GitOps Toolkit?
- 2:21 What is GitOps?
- 3:00 Introduction to Flux v1
- 4:10 GitOps Toolkit and Flux v2 Explained
- 5:00 Should I use Flux v1 or GitOps Toolkit?
- 5:03 Status of GitOps Toolkit (Feature Parity & Roadmap)
- 7:45 Bootstrapping GitOps Toolkit
- 7:50 Demo: Setting up the GitOps Toolkit
- 8:41 Explanation of `gotk bootstrap` Command
- 12:17 Running the Bootstrap Command
- 15:00 What are the GitOps Toolkit components?
- 15:05 Overview of GitOps Toolkit Controllers
- 16:19 Webhooks and Notifications
- 17:40 GitOps Toolkit CRDs
- 18:47 Source and Customization Reconciliation Intervals
- 21:00 Suspending reconciliation
- 21:37 Suspending and Resuming Customizations
- 22:50 Demo: Suspending a Customization
- 23:25 Demo: Resuming a Customization
- 23:30 Deploying our first workload
- 23:51 Deploying an Application via GitOps
- 25:32 Demo: Reconciling Changes
- 26:25 Verifying Application Deployment
- 27:10 Questions
- 27:28 Structuring the Git Repository for GitOps
- 31:41 Q&A: Using Toolkit Without Customize / Custom Reconcilers
- 33:39 Q&A: Subversion Support
- 34:30 Add another GitRepository
- 39:14 Creating the GitRepository CR for the Second Repo
- 43:30 Dependencies and health-checks
- 59:20 Deploying Helm charts
- 1:00:17 Adding a Helm Repository Source
- 1:01:09 Creating a Helm Release (Contour Example)
- 1:03:09 Customizing Helm Charts with Values
- 1:04:56 Demo: Applying the Helm Release
- 1:05:19 Verifying Helm Deployment
- 1:05:50 Advanced Helm Features (Delegation, Tests, Rollback)
- 1:07:07 Monitoring the GitOps Toolkit (Prometheus/Grafana)
- 1:09:00 Final questions
- 1:11:19 Q&A: GitOps Toolkit vs. Rancher Fleet (Multi-Cluster Management)
- 1:14:09 Multi-Cluster Addons Use Case
- 1:15:06 Summary and Conclusion
Full transcript
Generated from the English captions. Timestamps jump the player to that moment.
Read the full transcript
1:31 Introduction
1:31 Hello, and welcome to today's episode. Today, we are taking a look at GetOps and the GetOps toolkit. And who better to join me than Stefan? Hey, Stefan. How are you? I can't hear you. Okay. Sorry. Peter, there. I was like, oh, shit. What's happened? There we can hear you. How are you, Stefan? Great. Very happy to be here and share the work we've been doing on the GitOps toolkit and the road to FluxCDO. Awesome. Well, I think the the best thing for us to do would best be to quickly give an overview of what GitOps is.
2:21 What is GitOps?
2:21 Right. So GitOps is a way of doing operations through a Git repo. You define your state there, and it's happening by magic on your Kubernetes clusters. That's the key. So we store our declarative state and then get repository and it is magically applied to our customers for us. And to do that, we need tooling. And you've got quite a lot of experience in providing the tooling that actually makes us work. So I guess the sorry. On you go. Yeah. We we started, I think, three years, four years ago with Flux as a way to run a daemon
3:00 Introduction to Flux v1
3:08 on on the cluster that can be configured. You tell the daemon, hey. There is my Qt repo. Here is the authentication for it. Whatever you find in there, try and reconcile that particular state on the cluster and follow, any other change and also, do garbage collection. If someone deletes something from from Git, you also delete it from the cluster. So it's more about it's like a proxy between your, Git changes and what's happening, inside the cluster without you having to directly connect to the cluster and perform those changes manually from your laptop. Excellent. So like you said, you you've been working on
3:54 Flux for a number of years now. And and we're now at a stage where there's the GetOps toolkit and conversations around Fluxv two. Can you maybe just explain what the GetOps toolkit is and where Fluxv two kind of commence to the picture as well? Yeah. So the the toolkit, like like the name implies, is a collection of tools. Flux version one was this monolith that you deploy and does a bunch of things for you. So we took Flux apart. We break it down into microservices, and we then we'll be assembling back Flux from this microservice.
4:10 GitOps Toolkit and Flux v2 Explained
4:33 And these are these microservices are Kubernetes controllers built with Kube Builder, and the configuration part of your whole GitOps pipeline is made through Kubernetes custom resources unlike Flux v one, which would you would configure it with in your deployment spec with common flags for Flux. So it gives you more, you know, freedom to do a bunch of stuff. Okay. What is the the status of GetOp's toolkit? I think there's been a fair number of releases over the last week. The last time we spoke, you mentioned the API was stabilizing. I mean, is it in a position where
5:03 Status of GitOps Toolkit (Feature Parity & Roadmap)
5:14 people should now start to to pack GetOp's toolkit over Flux v one, or should they wait a little bit longer? So we we've started with with the toolkit experimental things around six or seven months ago, and we we've published a road map on the website. We broke the road map into three milestones. One milestone is get on future parity with Helm operator. Helm operator is the component that deals with Helm in a creative way. And for for the Helm features, we are at 100% feature parity, the GitOps Toolkit. So for people that are using today Helm
5:57 operator, they could switch to the Toolkit and have the same the same features there and many other features on top of that. Regarding to Flux, we with with split Flux into two milestones. One is Fluxing read only mode. What that means is where Flux connects to a git repo with a read only deploy key, so it will not commit back, do not write anything to it. It looks at it and applies it. And for this milestone, we are with in future parity in the tool kit so you can use it today to synchronize your Git reports.
6:38 And Flux has a has a feature where it can scan container registries, find a new image tags that you've pushed there, and write those back to to your Git repo based on some policies. For example, I don't sample range or or Git Shine zone. And that particular feature is still under development. So if you are using Flux today and you are relying on the write back mechanism, the image update mechanism, you'll you'll have to wait more, probably till the end of the year when we get that out. For for the helm operations and git synchronization,
7:17 the current toolkit API is is no longer alpha. We we've published the v one, beta one release last week. And, yeah, I'm thinking it's it's ready to go for sure. Excellent. Well, I'm very excited for today. I'm also not the only one. We have a comment from Ilya who says he's very excited about Flux v two and the GitOps toolkit. So we're all we're all we're all really happy. We're looking to see this. We all wanna start using it. So let's let's just share my screen and walk through the process of how people can start to use the GetOp
7:50 Demo: Setting up the GitOps Toolkit
7:54 tool kit today. Cool. Alright. So but there we go. So I've gone ahead and configured as little as possible upfront. Of course, we need a Kubernetes cluster. So I spun up one using the cluster API on Equinix metal and it is good to go. Let's run get nodes to make sure I haven't broken anything in the last five minutes. Excellent. Now we have a couple of resources. You have very kindly provided this guest, which I will share on the YouTube show notes when the episode is finished. This is a rough kind of playbook of the things we're gonna cover today.
8:32 And at your recommendation, I have also configured a GitHub organization that we're gonna try and use for today's examples. So do you wanna just explain? I the first thing I see here is the first command we have to do is g o t k bootstrap and then a whole bunch of parameters. What is this command going to do to my cluster? So the toolkit is composed out of a CLI that you can use to install, to bootstrap, to configure, to debug what's happening on your cluster, and and a bunch of controllers, each with its own,
8:41 Explanation of `gotk bootstrap` Command
9:13 you know, scope. For example, we have a controller that deals with sources like Git repositories or Helm repositories or even s three buckets. So you can add sources, remove sources, and so on. And there are other controllers as well. The bootstrap command, what it does, it takes an owner, let's say, a GitHub organization. You pass it to GitHub organization and a repository name, and it creates that repository for you. If you have, let's say, teams inside your your GitHub org, you can also tell it which team teams should have access to this repo. The repo is created by default private.
9:52 You can also create the repo public. And what what's happening there, the CLI will create the repo, will will push to the repo the the toolkit definition, like the deployments, the customer resource definitions, and so on, applies them on the cluster, then it configures that this particular repository to be the source of truth for your cluster. And what that means, it creates a deploy key. It sets up the deploy key in your GitHub repo, and then it's it starts to listen to whatever is happening on your main branch. So every time you do a commit
10:32 there, you change something even even the toolkit definitions itself. So if you, let's say, run the bootstrap command once, then in a couple of weeks, you run it again, it will not create a new repo. It will see, okay, the repo is there, but maybe the toolkit components have advanced. When you version, it will do an upgrade. And how it does the upgrade, it just modifies the YAML inside your repo and the toolkit itself upgrades it. It it upgrades itself on the cluster. Ah, very cool. That's about it. Alright. So let's let's just grab this command then.
11:08 And I just need to make a few changes. So my organization is Rawkode does get ops. And I can just get this any name, and this is gonna be created for me. Right? That's Yeah. I that go ahead and create a team dev. So maybe we can talk about that feature in a moment. And then the path I'm assuming is where it's gonna store a manifest in the repository? Yeah. So the the repository created by Bootstrap is meant to be used for your whole feed of clusters, not not just one. So you can, using the path, you'll say, hey. This particular
11:53 instance is for that path, and you can use the cluster name for it. One one use case for that is you may want to test the toolkit upgrade on your staging cluster before upgrading your production cluster and so on. So the definitions are stored for each cluster separately. Okay. Well, I like to live life on the edge, so we'll call this my production cluster. And return. Oh, I got bad credentials already. Bad credentials. You need the github underscore token environment variable set before you I will quickly move that off screen and check. I have underscore token. Right?
12:17 Running the Bootstrap Command
12:46 Yep. It's got a space in it. Alright. Let's fix that. Well, I wouldn't have thought that would have caused the problem. Put loads of text on my screen so that when I move it back yep. We don't reveal my token. Alright. I'm gonna try that again. There we go. So We've got repository created. It oh, it's a wave of the world now. Okay. It's created oh, no. It's at grant anti granted access to the dev team, clone the repository, generating some manifest, and then provisioning itself within the cluster. It's doing a lot for me. I don't
13:43 really need to do anything. Right? It just Yeah. We with Flux, we've we've seen people struggling with with the configuration. So we said, okay. Let's use the GitHub and GitLab APIs so we can streamline the whole process with a with a single command. We are looking at, I don't people that have experience with Bitbucket. If you know Bitbucket, if you are using it and you are familiar with with Go, yeah, we are looking for someone to help us add Bitbucket implementation to the bootstrap command. Now maybe you are not running on any of those SaaS
14:24 Git providers. So we we also have instructions on how you can do the whole thing manually. You create the Git repo on your own, then you generate the manifest, you push them, and you set up the deploy key. But, yeah, the boost rep command also does a sanity check of what's installed. So it it checked all the controllers. If you do, for example, kubectl minus n g o t k system, get pods, we'll see we have four controllers here. So the source controller is the one that pulls data from Git, Helm repositories, s three buckets, and so on.
15:05 Overview of GitOps Toolkit Controllers
15:13 The notification controller is a daemon that can listen can listen to webhooks from GitHub or GitLab or Jenkins or whatever. So it can trigger instant reconciliation inside your cluster. It also pushes notifications from your cluster to, let's say, Slack, Microsoft Teams, Rocket, and and other chat systems. The customized controller, it's a controller that can work with customizations or with plain YAML. So it applies that on your cluster, does garbage collection for you, health checking, and all sorts of things. And health controller does the same stuff but for health charts. And that's that's the toolkit. Awesome.
16:03 So we got four controllers. The notification controller, immediately, is kind of interesting to me. So it could push notifications to say that it's updated something within the cluster. That was one aspect of it. So that part was it can lessen for web hooks coming from GitHub, GitLab, etcetera. So does it configure any of those web hooks for me to push to my cluster? Is that something that I need to go ahead and and manually set up? So there are a couple of custom resources, which you can use to configure, things. One custom resource is called the receiver.
16:19 Webhooks and Notifications
16:39 So it can create a receiver of type GitHub, and need to generate a unique URL that you can use in your GitHub webhooks. And it also gives you a secret to check that GitHub is really sending all those notifications for GitLab the same and so on. So that's the receiver part when you can define these kind of receivers and, you know, you can suspend them, you can delete them, and so on. And you can also create alerts for different providers. You can add, for example, your Slack channel there with API token, then you create an alert and say, hey. If my,
17:21 I don't know, my ContourHem release is failing, post the error message to this Slack channel and so on. So there are custom resources which you can define receivers and alerts. Okay. Perfect. So if we take a look at my organization here, I'm curious about what's happened on the GitHub side of this API then. So we have this new repository here. Okay. So it's created a directory with the path that we provided. And this is the YAML for deploying itself. Yeah. Okay. So if we look at the toolkit source.YAML Yep. Can you make that bigger? I can.
17:40 GitOps Toolkit CRDs
18:12 There we go. Yeah. So this is how a source gets added on the cluster. You give it the URL. In this case, it's it's an SSH. It created already a secret that contains the SSH public and private and known host keys. And using that particular secret, it now it monitors the main branch of that URL. Okay. So every one minute, it's gonna check for changes to that repository and then just apply them to my cluster? This particular object only pulls the Oh, okay. Repo on the cluster. There is a different object customization. And the customization object says, hey. From this
18:47 Source and Customization Reconciliation Intervals
19:01 source, name, geo t k system, apply the production path on the cluster. And prune true means if you delete something from inside the production directory, it will also be deleted from from the cluster. So this enable this this enables garbage collection, and this is applied every ten sec ten minutes. Okay. So let me clarify those two intervals then. Our get repository CRD is pulling changes every one minute, but the actual application of those changes is on a ten minute schedule. Does that just mean that if I push something to the cluster multiple times before the ten minute starts,
19:43 the those all gonna apply to once? No. No. So this interval here for the customization ten minutes, what it does, it it applies what what is found in the in the repo at that particular path every ten minutes. What that means is, let's say, goes into the cluster, has access to the cluster, modify something. In a maximum of ten minutes, that modification will be undone. If it modify something in a toolkit itself, let's say, it it configures, it changes, I don't the limits, those kind of changes will be unknown. But if you push something to the repo,
20:25 in one minute, the source control will detect that there is a change. And through Kubernetes events, it notifies the customized control, hey. There is a change. Don't wait ten minutes to apply. Apply it now. And if you are using a webhook, then you don't even have to wait that one minute because the webhook will issue Kubernetes event source controller and say, hey. This webhook is telling me there is a new there is a new commit on the Git. Let me pull it. If there is a change, then it notifies the other controller, the other controller
20:53 will do the reconciliation. So Okay. The difference between Flux and a toolkit, the toolkit is is reacts to Kubernetes events. So it's also a reactive system, not only a pool based system where it does things only at a strict interval. Got it. Okay. So the get repository interval applies all the changes on that schedule, and then this interval on the customization is just really about fixing stuff that people modify it manually. So I'm assuming if something extremely bad happened to my cluster and I was making changes, I'd probably wanna change that interval to be a bit more forgiving,
21:00 Suspending reconciliation
21:32 I guess. No. If you are dealing with an incident, you probably want to suspend this particular customization. And there is a command for that. It's GOTK suspend customization and name of the customization. And what it does like a Kubernetes cron job, it will never reconcile from that moment on. So you can do whatever you want on the cluster, fix it, then move your changes into Git, commit those changes, then resume the customization, and everything will be, back to normal. But in the past, what people were doing were, scaling to zero Flux. We do not fight with you.
21:37 Suspending and Resuming Customizations
22:16 The the thing that we've improved here, maybe you have different things reconciling on your cluster. Maybe you have optimization for your apps and another optimization for your databases. But if the incident is only affecting the apps deployments, then you shouldn't be stopping everything. Your whole GitOps pipelines on on the cluster. Maybe you want to stop that particular thing where where the incident is happening. So that's we don't have to scale to zero anymore or anything. Okay. So can we just type that command here? Like, if I run g o t k suspend customization. I'm just gonna keep following the docs assuming
22:50 Demo: Suspending a Customization
22:59 I know what I'm doing here. And the name of the resources GOTK system is the name of them. But that's no longer gonna get any updates. That's it. That's nice. Okay. And I can just I'm assuming change this to resume. Awesome. Cool. I was I was if they tell you what revision it has applied. So it's the main branch and that particular commit. Okay. So I feel like then we should deploy something to our system. Yeah. Good to hear. So let's see. Now what do you recommend? So I'm sure I've seen that it cloned something earlier or did I clone it and
23:51 Deploying an Application via GitOps
23:56 the cluster size? I need to clone my best repository. Okay. Got it. Yeah. Get ops, and let's open this up. So would I add a new directory here? Yep. You can from this moment on, you can organize however you want your repo. The toolkit is not in any way how you want to configure it. The idea is if you want to target your production cluster in this case, all the definitions should be in in that inside that directory. And you can use customized overlays, hand releases, whatever you want. Okay. Let's start nice and simple then.
24:58 And let's do NGINX Port. Itty. Now I'm not gonna add a service or anything. Let's just I just wanna apply this as it is, and then we'll see that relate to the cluster and then maybe make a small revision or something. So I would just they get add million dollar application, and then I just push this up to my cluster up to GitHub, and we wait one minute. Yeah. So we either wait one minute or we can say, g o t k source git g o t k reconcile source git source Git, and that's GOTK minus system.
25:32 Demo: Reconciling Changes
26:02 I need to do it from this directory because that's where my Kube config is. Oh, do think it reconcile source get POT? Reconcile. Yeah. Type in is not my strong suit, especially when I'm being watched. There we go. So this is now updated everything inside of the cluster. So if I run get pods there we go. That's easy. Now you can also do geo t k get customizations, And it will show you the same revision as the source. So it may happen that you push a change, that change gets pulled into the cluster, but it doesn't get applied. Maybe there is an
26:25 Verifying Application Deployment
26:58 error or some new configuration, and there will be a drift between what it's supposed to be on the cluster and what's really running on the cluster. Right. Okay. So okay. Right. I get it. I get it. Just thinking a load here. What is so I guess this is this is gonna come down to where your experience is really important here. Like, what is the best way for this director to be set up? Like, I what I feel now is that I've made a a mistake, storing my deployment NGINX inside of this production folder. I feel like
27:28 Structuring the Git Repository for GitOps
27:39 would you start immediately with customize so that you have bases and the ability to overlay into environments? Or, you know, is this perfectly normal and it get ups it get ups slow? Well, nobody I cannot say, like, use customize. Maybe you don't like it. Maybe you want to do hand releases or other other things. Maybe you want to, I don't know, generate all your YAMLs with JavaScript or, I don't queue or whatever and commit commit the final YAMLs to to some other repo. But I think for for most people, customize should work really great if you just want to do small changes
28:20 of your deployments across your fleet. Customize has a downside in terms that you cannot just, you know, use variables. It doesn't do that. Right? So if you want to set, for example, the region name, and the region name will be different between clusters. And that region name is, I don't know, somewhere in a config file or, I don't back down below. Then you have to, you know, teach customers how to patch your things and so on. Maybe for that use case, it's better to use a a health chart and so on. But I I like customize.
28:57 Okay. Yeah. Use it if you if you can, but there are so many other options. Okay. Do you see a a future for Flux two or GetUp Toolkit where there's, like, a controller for JSON, controller for, you know, like, Carvel maybe, all these other tools that do the same templating responsibilities as customized, but giving people a bit more flexibility or choice, etcetera? No. I don't think any of those tools are are meant to be reconcilers. Those tools are meant to create YAML. So you should if you want to use, let's say, q or j or whatever, you can
29:37 you you can compile the final YAMLs in your CI, validate them with something like Qvalve, apply policies with verify them with the conf test, for example. And after all these things are okay, then you should push the final YAML to a branch and let the toolkit reconcile that branch. I think the the faster you see the YAML in Git, so you can do diffs and you can run this, let's say, offline validation, is better than running all these tools inside the cluster. And if the tool fails, then you have to crawl logs and so on. I think it's it's better to get
30:21 the final YAML earlier in your pipeline. But that's not what customize is doing. Right? I mean, we are running customized and the cluster to produce the final YAML. I mean, not in this example here, but that's what this controller does. Or have I completely missed the point here? So no. No. No. A customized controller uses customized as a library for many other things than just building a customization. For example, it does an ordering using customized. For example, custom resource definitions before custom resources, namespaces before deployments and and stuff like that. It also underneath uses customized to label all your objects inside
31:03 the cluster. So if you want to say, hey. For this particular customization, what kind of objects have been created on the cluster? We can use a label with the customization name, and you'll you'll get a list of all your things. So, in a way, the the the customized controller is using customized as a library inside of it to to deal with the reconciliation. Maybe the name is not good is not good. Maybe it should be, like, sync controller or whatever. But, yeah, I don't think you should create a controller for any templating solution you have out there.
31:40 Okay. Well, let's quickly bring in a couple of questions then just I think that relate to this. And I think you've potentially answered a couple of them, but I'll I'll pop them in anyway. So, Elliot, I do see your first question. I'll keep that one till the end because I think that's interesting. But I will pop in this one. Elliot says, as a matter of personal preference, can I not can I use it without customize? And I think the answer there is Yes. For sure. The controller can can synchronize plain YAMLs. It has it's not operating that way. It doesn't need
31:41 Q&A: Using Toolkit Without Customize / Custom Reconcilers
32:13 the customization dot YAML in your Git repo. It creates one on the fly and adds labels and all the things that it needs to do to enable garbage collection, ordering, and all this stuff. Okay. And then he follows on with, could one build their own custom reconciler? Of course. We we intended for the toolkit to also be an SDK so you can build your own controllers on top of the current API. And there is a development guide that we've published on the docs website on how you can build a controller with cube builder that listens to a source controller events.
32:52 Fetch is the source that source controller has pulled from all the sources, and then it's up to you what you want to do with that those manifests. Can be whatever. I've seen people trying to to use source control in CI systems where source control pulls the source code from it. Then there is a different controller that actually does the build of that particular source and, I don't pushes the container image to to a registry. So notification control and source control are not tied in any way to the GitOps idea, like like, continuous deployment. You can do a bunch
33:29 of other things with them if you want to. Excellent. One more question, and then we'll move on with the the demo. So we have a question saying, can I use GetOp tool kit with subversion? Not right now, but, yeah, we've implemented s three buckets part and the s v SDN shouldn't be that hard if someone wants to do it. I guess with the get repository being as one CRD, adding support for other quote places locations, whether it be, you know, Mercurial or subversion would just be a case of adding on the CRD and the controller
33:39 Q&A: Subversion Support
34:08 to do the sync aspect of it. Yeah. In our case, it's a you would add a custom resource definition to the source control API. Let's call it SVN repository. And then you'll be adding a controller to a controller loop, source controller that deals with that particular custom resource. Excellent. Cool. Let's jump back over to our demo then. So what do you feel like the next step is here? Should we add a customized basis YAML, or should we jump straight into Helm? No. I would say let's let's show how you can add a different repository to your cluster. Like, here in in this
34:30 Add another GitRepository
34:52 example, we've added the deployment YAML directly to our free triple. But in normal cases, you you don't want to do that. You want let's say, if you are developing an application, probably somewhere in your app repo, there is a deployment director or customized director or something like that where you specify how your app should be deployed. And the tool kit, what it tries to do it it with with this custom resources, it let you add let you add register other sources inside the cluster and decide how they are reconciled. So let's add a different repository.
35:34 K. So I I can copy this source and say I want to deploy my other app. And I do have a previous one here. So this is I was messing around with the get ups and I created some YAMLs rendered from Helm that provisions on FluxCDB and telegraph dot telegraph daemon set. Okay. I can the URL. Yep. And then can I just use the public URL? Will that work just fine? Yep. So the git repositories can can have an SSH URL. And when you specify an SSH URL, you are obliged to define the SSH key for it.
36:27 If you use an HTTP address, if that particular repo is public, you don't have to, configure any kind of authentication. So in this way, we let you add to your cluster things from open source, like, I don't know, if I'm having my own open source project, you can reconcile that particular thing without having to clone it or anything like that. If you trust me that I will do, I don't know, sample releases and groundbreaking changes, then you can actually add the upstream to your cluster directly. So, yeah, you can delete the secret ref. So is
37:07 I think I just I'd like to clarify something there then. So if I use a public repository in fact, no. If I use a private repository and specify a secret ref, I remember coming across something in the documentation that talks about read only modes. So does that mean there's a rate mode where the GitOps toolkit modifies my GET repository? Yeah. In the future, the GitOps toolkit will contain two more controllers. One will scan container registries and the other one, based on some policy that you define, will write back to your Git repo when you push new
37:45 container images. Let's say you have your own app and you you push Sendware releases to your registry instead of going every time and changing the the new version inside your YAMLs. You you could configure the toolkit and say, every time there is a new Sendware release in this particular range on on Docker Hub, for example, get that tag, write it back to Git, and some other part of the system, the optimization control, for example, will will apply. Okay. Got it. That makes sense. And I guess in read only mode, does it would that as semantic
38:24 version and rules still be something that people can consume? Does it use state within our cluster to monitor those tags, or is that still something that's to be determined right now? So in in video in video only mode, you can you can tell the toolkit to follow similar releases for a whole repository. It doesn't look at container registry. So if you version your YAML and you do SEMWARE releases, then the toolkit will be able to detect, oh, there is a new release for on that particular repository and you'll apply that release. But with the container registry, it's more about
39:01 writing back for each particular container. K. Got it. Right. So I am assuming I do not want to override that. So we'll call this custom repo in line with the directory. With regards to interval, I mean, one minute seems like quite a sensible option. If I were to do one second, I mean, is that likely to cause problems? Could be. I I don't think it will actually work. I think it will not work. It will set something like ten seconds or We can go for a minute and do that manual reconciles step. Okay. So if I save this.
39:14 Creating the GitRepository CR for the Second Repo
39:42 No. No. No. No. No. Wait. Alright. Let let's let's see above if there is something else in this file. Oh, no. It's only this. Okay. That should work. K. Can can commit this one. Alright. So we're adding a second repository. Push this. And then we can do reconcile here just to speed it up. So I guess I'm gonna assume if I do geo case system get get repositories. I like it when things are just intuitive. So master main error here, I'm assuming. Yeah. This one has been updated. So it's it's correct branch. Reconcile one more time
41:04 and there we go. That's now fetching the other get repository, applying that to the cluster so I can run get pods. Don't see anything. Yeah. You just declared the source. Now what you should be doing is that source. That's correct. See? I'm glad you're here. So I need to and tell her to apply it. So that's Yeah. You can start in the same. Yeah. Oh, yeah. K. So we want custom repo. No. There you should say what you are actually trying to do here. In FluxCDB. Maybe it's a better name because you are applying in FluxCDB.
42:16 K. Then the source rep should be your custom repo. I wish I had named that better now, but I'm stuck with it. So So I've seen you had, like, two directories. Are those independent things, or they have to be applied? Yes. So there, I guess, what you're suggesting there is I could have a customization to apply each of these individually, or I could just do a dot apply everything recursively. That can well, both of those work? Yeah. And you can also do dependency management for let's say, for example, you don't want to install in FluxCDB if
42:56 Telegraph is not there or the other way around. So you can, for example, define two customizations for each directory and say for the Telegraph one that depends on the influx DB one. And what the customized control do will will build the dependency graph before it does the apply of your Git repo and applies the things in a particular order based on the dependency relationship between those. K. That's that's where it go. Yeah. Why why we implemented that particular feature is because of admission webhooks and service matches. Like, I'll give you an example. If you if you install
43:30 Dependencies and health-checks
43:39 a service match, then you install your apps. Everything is good. When your app will be installed, if the service mesh is already there, it will get injected with the sidecar and it will function great. But GitOps is meant to to work when you, let's say, bootstrap cluster from scrap a cluster from scratch. Let's say you lost your cluster or you want to build an identical cluster. Now if you apply all the things in your Git repo in in the same operation like Flux version one does, there is a great chance that your app port will start before, let's say, the linker,
44:16 the injector or the yeast injector. Right? And your app will start, will have no sidecar, and many problems from there because, well, it's not meshed and it's meant to be meshed. With the toolkit, you can say, all my apps depend on this infrastructure customization that can contain InstaLink or the gatekeeper, OPA, or any other stuff that needs to be there before applications get get provisioned. And in using the dependency tree, you can reliably, you know, clone clusters, create identical clusters, and it'll happen in that particular order. Okay. That makes sense. So just to make sure I understood that
45:01 correctly, to use the dependencies, I need to define each of these separately. Right? Yeah. And then I'm configuring the path here and changing the name. Yes. And now inside this Yep. Sorry. On you go. Inside the spec, now you can add the depends on entry with oh, k. Depends on, and now it's name to point name. The list of objects? Yeah. Okay. Yes. And it should be in FluxCD. That's it? That's it? I like it when things are this simple. Okay. So that looks okay to me. I'm sure there's probably a mistake, but What's yeah. It should tell us
45:56 it should tell us everything is going wrong. Okay. So we're now going to apply influx DB and telegraph with dependencies. We can reconcile. And I'm assuming I can run get customizations like we did earlier. Oh, we're in the correct namespace. We gotta oh, so it's not ready yet. So we can see right away that it's not deployed to Telegraph daemon set because it's waiting on our influx DB containers to become healthy. It's currently pending. And there's Telegraph. Yeah. One thing we should have done was to create a health check on InfluxCD. Like, what what we've done right now is that we
47:08 are configuring the tool kit to apply first InfluxCD then that. But that doesn't mean that in FluxCDB is actually running. Right? You apply it, then it can take forever to spin up the pods. What what we can do is add a health check entry to the customization and specify, hey. Look for the deployment named or in this case is a stateful set. Look for the stateful set, InfluenceDB, and check it if that is healthy. And only then allow all the other depends on relation to go to go for a reconcile. Going back to the to the service mesh example,
47:52 you can apply link ID, but that doesn't mean that the injection webhook is up and healthy for your pods to be injected. So you can we you can use health checks in in this particular way to make sure that everything is running before nothing else. So Okay. So I do have an error, but that's my mistake because I haven't configured actually any PVC options on this cluster. So I'm just gonna ignore influx DB for now. But to confirm what you were seeing there, if if my state will set here had a oh, no. It does have health
48:34 hit. So how do I connect that up then? So there is in in the customization of the influx DB, you can add your health check entry. I can pull up the docs. It's ops tool kit. K. If you go to toolkit components, customize controller, customization, custom resource definition, and health health assessment is on your right. Yep. Can you see there? Cool. Oh, yeah. So we tell the object that we are looking for to be healthy, the namespace, the name. Alright. Got it. So okay. So let's I want to remove this. We have prune enabled. So I'm assuming if I just comment
49:44 all of this out and push this up, it will prune them away. So I'm gonna do that first. I like it when things go wrong because then I learn how to fix it. So remove influx DB and telegraph. That'll get add. There you go. And because we have prune on, when I run the reconcile, it's gonna remove those from in the by Rawkode pods. There we go. Terminating and unboxed d p is already gone. You should also remove the oh, this is in the default namespace. Yeah. If you'd have had the namespace definition inside your, let's say, in FluxCDB manifest,
50:34 it will first, what it does, it goes through all the namespace objects. So it will remove deployments, custom resources, and so on. After all of all of those are gone, then it will remove cluster level objects, like namespaces, like cluster or bindings, and so on. Why we do this kind of garbage collection is for custom resources to be finalized. Otherwise, you could get into a stuck state where you delete a custom resource. It needs to be finalized by something, but you already deleted that something. So your main pages and everything else will be stuck. So we we try
51:11 to to play nice with with other controllers and do the deletion in different steps. Okay. Excellent. So what's the best way for me to use those health checks? I'm gonna bring Telegraph back in, but we don't have a satisfied dependency here. Yeah. Can remove it, and you can add the health check for for Telegraph itself. How okay. So we're removing the dependency. And you want me to apply this as is? Sorry. I don't understand what you were suggesting. I was suggesting to add the health check for the telegraph demo set, if you want to do that. Oh, in
52:02 fact, together okay. There's let's let's let's change this a little bit. So we already have this engine x deployment here. So if I just add a health check to the to aliveness probe or readiness probe to this, if I just copy them from here, then we can have telegraphs only will only get deployed if that is healthy. So let me add this. And we'll just do it there because engine x will respond it's gonna respond by a oh, okay. Okay. This is fine. I shouldn't make this stuff up. Okay. Let's go into GitOps. What I'm gonna do is add
52:43 our production engine x and deploy that first. Add probes to engine x. Let me do a reconcile, and then if I run on this, we have our new engine x spinning up and it should become healthy. I'll wait till I see one zero one. Yes. I didn't add those probes correctly. See if I can quickly fix this or just move on. Oh, okay. No. It's not a named port. My bad. So let's add that one more time. We use my commit message, which I I know I shouldn't do, but I'll do it for now. And let's get
54:01 get part. It's it's now got a probe on it. So in order to demonstrate the health checks, let's come back to here, bring back our telegraph d s, and add our depends on name, and this is the customization Geotrigo system. Geotrigo. Thank you. We're also gonna add a health check. I'll I'll just copy this from the docs. No. So let's think about it. You want to make the deployment depend on another object, right, another thing in your infrastructure. So you'll be creating the health check on the customization that creates NGINX. Right? Yes. Then you'll say, hey. All the things that
55:00 I'm trying to add afterwards have to be reconciled only if that the GitOps GeoTK system customization actually works. Right? So the health check will go on the GeoTK system customization file. Okay. So the health checks cannot they don't they have to reference resources created within its own customization? No. No. But you should. Right? Okay. So You can you can define health checks in optimization that reference whatever in your infrastructure even if it it doesn't come from a Git repo. But the idea of a dependency is that you define all these checks on object that creates them, and then all
55:50 the other things will wait for that. If you define the health check on your, demand set there, what what the customer has control will do, we'll apply the thing, and then we'll check if it's healthy. What you want to do is do not apply the thing if if, I don't know, the the health check is failing for another personalization. Makes sense? Ah, okay. So I got that completely wrong then. I thought okay, I thought the health check here was gonna say don't deploy this customization unless this resource is healthy. But that's not what it's doing at all.
56:33 No. Right. Sorry. I completely misunderstood that. Got it. That make okay. That makes much more sense to me now. Whoops. Okay. So let's not bother with the health check on this. Well, yeah, we're doing a health check but for Telegraph. Well, there are no probes on it. So let's just take it out. Okay. So let's let's just apply this and then we can move on. So add telegraph then. Shall we take a look at the helm controller, or is there anything that I've maybe missed here that you'd like to cover? Other than me, just not understanding anything so
57:20 far, but that's okay. Mhmm. No. It's okay. It's a it's a learning process because we we added all these things that you can do and yeah. I I can understand how health check can be confusing. There are two roles to health checks. The role on the customization itself is you apply something and then you want to be notified if that particular something worked or not. That's one use case. Right? Then when you create a dependency tree, all the dependents if you have defined health checks on your customization, all the dependents will be blocked on those particular health checks. So, for
58:01 example, you will not be deploying your app if the linker, the injector is failing because app will not be injected. That's one example. Another example is don't apply any YAML if the gatekeeper, deployment is failing because you want to ensure that everything that, gets changed on your cluster is approved by your OPA, policy agent. Right? And for that to work, the policy agent has to run. So yeah. Okay. So, I mean, I guess the the only mistake I was really making there was is this depends on is gonna is is gonna wait until this engine actually source exists.
58:43 And then I was gonna add a health check here to check the health of this. But, really, what you've just said there is what I should have done is add the health check Yep. Here. Ah, okay. So I've I've actually I had that wrong in my head twice. So really, if I had just added that we have a deployment called engine x and the default namespace, and my telegraph wouldn't have been deployed until this was actually healthy. Yeah. Okay. Makes sense. Got it. Finally. Okay. Let's take a look at the the helm integration then. So right now, we have
59:20 Deploying Helm charts
59:24 a GetOps pipeline that has two repositories. One managing the GetOps toolkit system, one managing our million dollar application, which is nice and trivial. It's just some telegraph stuff. If I then want to add something from Helm, like, MariaDB, MongoDB, some sort of database to How much the best way to deploy that? What are my first steps to doing that? So first step, you have to define where is your chart, where is that particular HEM repository. Right? And the same way you define a Git repo, you can define a Helm repo. And if you scroll down here,
1:00:10 for example, here, we add the Bitnami repository. Okay. So we are using g o t k create source. We're telling it as Helm and not GET, and then we just specify the Bitnami repository. Yeah. Which is public. If you have a Henry poetry that's private, you can also give here in the command the username and password or token of your register repository. Okay. So if I go into my get up director here, we were feeling brave. So production. We were I was feeling brave. This is my production. And then we deploy this. That's gonna create oops.
1:00:17 Adding a Helm Repository Source
1:01:01 But, Naomi, we've got our kind Helm repository called, Naomi, with this chart. And then we okay. So this is just generating the YAML to create a namespace. Yeah. So one one particularity of our Helm controller is the fact that helm controller only installs charts. It doesn't create anything beside what's in the helm chart package. And you shouldn't be placing namespaces, for example, in your charts. So this is where helm controller, customized control work great together with with the customized control. The customized control will create namespace definitions, which should be in your fleet repo or your cluster
1:01:09 Creating a Helm Release (Contour Example)
1:01:50 repo. And then we'll apply helm release manifests and those you can tell those, hey. I want this particular helm chart to be installed in that particular namespace. Okay. So let's grab this and then generate this. And then in fact, what I'll do is just next to this, v one kind namespace rather than just doing the command on tour. And then exit Action. So we're using semantic version in here to deploy a version two verse v two contour. You tell it the helm repository name, which we just created. We give it a release name, the target name space, which I just
1:02:46 added to the YAML. And the and after export, you should be creating the contour YAML log by the thing. Yep. So So we should have this here. Perfect. So my my first question here is how do I customize the chart with values? So here you can type values. As part of the spec? Yes. That's the chart spec. No. Part of the header link spec. This is just an an object where I could see my one, two, three, four. Yep. So these are those values specify here will override the values dot YAML from your help chart and
1:03:09 Customizing Helm Charts with Values
1:03:48 can do other things like I want values to be fetched from a Kubernetes secret. Let's say some charts, yeah, I have to pass in, I don't know, username, passwords, tokens, and so on. You don't want to place your tokens your secrets here in plain text in your Git repo. So you can use something like Mozilla SOAPs or seal secrets to manage Kubernetes secrets in a GitOps way. And then here, in in values, instead of saying value token something, you'll say values from this particular secret. And what helm controller will do will take the secret from your that lives inside your
1:04:27 cluster and merge the values in your secret with the values you specify here. So here, you should only set values that are not secrets. Alright? Not secrets. Okay. That I definitely understand. But to find the values is quite easy. So I can just add all of this and add it contour via helm and push. And if we run our reconcile one more time, See. Oh, there. Our names our namespace is ten seconds old, so maybe I was just a bit quick. There you go. We now have Contour running deployed via Helm via the CRD and get ups to look
1:05:19 Verifying Helm Deployment
1:05:33 at It will create many other ports. Right now, it it's just a it spins up the third generation, and then it'll spin up an Envoy, a demand set, and the Contour ports, which are the control plane of of the ingress controller. But yeah. And with with Helm, you can do several things. For example, here, what what we are doing is using delegation. If you go back, for example, to the the Helm release definition, you'll see that there is a target namespace there. So what I'm doing here, I'm placing the helm release definition in my system namespace, JLTK
1:05:50 Advanced Helm Features (Delegation, Tests, Rollback)
1:06:09 system. Let's say no one has access except for cluster admins to that particular namespace. And I want to provision some infrastructure for a particular team. And I can use target namespace to say, hey. I'm declaring the hand release here because I don't want anyone else to be able to modify what cluster admins. I don't want installed in the same namespace where the hand release is defined. I want to install it in that particular namespace. And this gives cluster admins the power provision things in in other teams' namespaces or tenants' namespaces. And, yeah, no one can modify that original object.
1:06:51 Alright. Excellent. Let's jump back over here. So we've covered Helm, and this was just another good example. We we have an add on for for the GitHub toolkit that's composed of Prometheus and Grafana. And we have instrumented the controllers, so have an overview of what's happening on the cluster, and here is how you can deploy the monitoring stack. We are currently developing more adding more metrics to to our controller. So in the next release, you'll be able to set up, for example, other manager, say, hey. When a reconciliation fails, send me an alert based on some metric that
1:07:07 Monitoring the GitOps Toolkit (Prometheus/Grafana)
1:07:43 I don't customized control is failing to reconcile influx DB in the last half hour or something like that. We we have the notification controller, which files all these alerts to Slack, Microsoft Teams, so on. But other manager has so many back ends that knows how to deal with them. Don't PagerDuty and the lots of it. And we we don't think notification control is the only answer to other things, so that's why we've we we are going to add metrics. So you can use Prometheus and other manager to do things like that and also build nice dashboards
1:08:26 so you can see all the customization with their status, how long does it take to do a reconciliation, and if it failed or not, and so on. This is, let's say, our graph on our dashboard collection that will be shipping. And maybe in the future, we'll have a full blown user interface. But for now, we'll try we try to have, like, top touch graph on our dashboards so we can see what's going on. Oh, very cool. Well, I will add this. I think I think these steps here are fantastic for kinda covering everything that we've already been through today.
1:09:00 Final questions
1:09:03 So people can follow along with us in their own time and deploy their own get ups toolkit. What I will do is I'm sorry. Sorry. I have an else you want to demo before I stop sharing my screen, or do you think we've we've covered most of the things that you want to take over? Yeah. I I think we we've covered the basics. The idea is that with with the toolkit compared to FluxCD one, we let you define your cluster state from multiple sources, and then you decide how those sources are going to be reconciled.
1:09:32 If you want to add health checks, if you want to add garbage collection, if you want to spend or resume them, depending on if you have an incident or not. All these things come from future requests and, things that we've seen for FluxCD one, what our users wanted more more stuff, and, yeah, we we added them to the toolkit. Yeah. I I think it's a very powerful set of tools. I'm really impressed with just how quickly and easy it was for us, you know, to bootstrap the system, start adding our own get sources, and then applying it to the cluster. And
1:10:08 the Helm integration there was so trivial. Really, really impressive. Just the ergonomics of of working with GetOps is nice. I really like it. Yeah. For for Helm, we have many other things that you can configure in that particular Helm Helm release. For example, we you can trigger Helm tests after a chart is installed or upgraded. And based on the result of the test, you can roll back to the previous version. Let's say you are running on 100. You bump it to 200. It fails. The help test failed. Or the native help help check fails, like the deployment is stuck
1:10:52 or a stateful set is stuck. You can tell help controller, if that one fails, roll it back and let me know. So you end up with with the previous state that should be fine, and you get a notification. Hey. Your help test failed. This is why it failed, and we we roll it back for you. And that's that's one feature of of help control that we've added recently. Oh, very cool. So we have one more question that we skipped over earlier, but I think now would be a good time to cover it. And then we can maybe talk about
1:11:19 Q&A: GitOps Toolkit vs. Rancher Fleet (Multi-Cluster Management)
1:11:26 what's coming down the line, and then I'll let you get back to your day. But earlier left a question earlier, and he was asking about Rancher's new product called Fleet and how that really compares to GetUp Toolkit. His specific question is Fleet seems to be designed for scale, and how does that compare to what GetOp's Toolkit is doing as it meant for a very specific use case? We are working on integrating the the toolkit with cluster API. So our our approach is a little bit different from what Rancher and OpenShift and others are doing. Instead of managing everything
1:12:06 for from the management cluster and make that management cluster a single point of failure. What we are trying to do is when you bootstrap clusters, you'll be adding cluster definitions in your git. You install the GitOps toolkit on your management cluster. Then the toolkit will will discover that, okay, a new cluster has been created. Toolkit will be able to provision itself on that particular cluster with the repo that you you assigned to it. And then even if the management cluster goes down, I don't know, runs into trouble, all these, clusters will be able to keep,
1:12:47 keep working, keep synchronizing with their own Git repos. And in a way, the the our GitOps approach is more distributed than, you know, having everything on a management cluster and drive all the operations from a single point. But you'll be you'll also be able to do that with the toolkit if you want to. Customized controller has a feature where it can mount an existing KubeConfig from a secret. So for example, many copy providers will create a KubeConfig for each cluster they have created. And you can tell a customize, hey. Reconcile this source, but not locally
1:13:26 use this particular to config. And it will connect to that cluster and and do its job. So we we are trying to cover cover both, things where from the management cluster, you apply everything. Or from the management cluster, you you boost up your clusters, and then they can run on their own with no single point of failure. So, yeah, I think our approach is very different from what Rancher is doing. Yeah. And very exciting to me. I I'm really enjoying my exploration with the cluster API right now and with GetOp Toolkit. And now you're pushing both of these together and giving
1:13:58 me this workflow that will allow me to have GetOp Toolkit on my management cluster, bootstrap itself, and then go into its own reconciliation loop, which is really exciting. I'm looking forward to playing with that in the future. Yeah. I think another use case is for adding, installing the cluster add ons after you bootstrap a cluster. So it can you have can have, like, a generic Git repo with all your cluster add ons there. Right? And you can tell the toolkit, hey. When after we could create a cluster and you you see that the QConfig was
1:14:09 Multi-Cluster Addons Use Case
1:14:32 created, it works, can connect it, install these add ons from this repo at this particular revision. And you can send where your cluster add ons, make releases out of it, and and the tool kit will will keep updating your fleet based on that. So the idea of a multi repository gives you this freedom of saying, want dedicated repo with only cluster add ons and and maybe other repos with, I don't know, other things like user applications and stuff like that. Awesome. I look forward to that update coming down the line. So I just wanna say thank you for
1:15:06 Summary and Conclusion
1:15:09 spending the last hour and fifteen minutes walking us through GetOps toolkit. And like I said, I it's just really impressive how easy it was to get started and start deploying things to your cluster. So thank you for your continued work on this. It's really cool to see it. Thank you very much, David, for for inviting me. And yeah. I hope by next next year, we'll do another round, and it'll be, like, full blown KP support. Next year, I'm gonna schedule you on every month to follow the updates, I think. I'll just keep harassing your calendar till you finally get
1:15:41 in and see. Yes. But alright. Well, thank you again for joining me. It's been a pleasure. A video will be available online. All of the show notes will be available in roughly an hour. And thanks again, and I'll speak to you all soon. Have a good day, Stefan. Bye.
Technologies featured
Meet the Cast
Stay ahead in cloud native
Tutorials, deep dives, and curated events. No fluff.
Comments