Overview

About this video

What You'll Learn

  1. Install Komodor on a Kubernetes cluster using the Helm agent chart.
  2. Inspect service timelines, cluster events, and related workload details.
  3. Configure deployment alerts to Slack and test log redaction rules.

Hands-on tour of Komodor for Kubernetes troubleshooting. Installing the agent via Helm, browsing service and event timelines, best-practice checks, config-change diffs, deployment monitors to Slack, and log redaction with regex.

Chapters

Jump to a chapter

  1. 0:00 Introduction
  2. 0:20 What is Komodor?
  3. 1:21 Getting Started & Adding a Cluster
  4. 1:49 Komodor Service Overview
  5. 4:42 Exploring Kubernetes Events
  6. 5:31 Filtering & Related Resources
  7. 6:08 Viewing Service Details
  8. 8:09 Best Practice Recommendations
  9. 10:10 Investigating Specific Events
  10. 10:33 Demo: Config Change Event
  11. 12:24 Demo: Availability Issues & Pod Details
  12. 14:10 Exploring Other Resources
  13. 15:27 Monitors and Alerting
  14. 16:52 Demo: Deployment Alert
  15. 18:50 Security and Data Redaction
  16. 19:18 Configuring Log Redaction
  17. 21:38 Demo: Redaction in Action
  18. 23:01 Testing Redaction Locally
  19. 25:05 Summary & What's Next
  20. 26:07 Conclusion
Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

0:00 Introduction

0:05 Hello, and welcome back to the Rawkode Academy. I'm your host, David Flanagan. Although you may know me from across the Internet as Rawkode. Today, I'm going to guide you through Komodor. If you're not familiar with Komodor, it is a SaaS based product to help you troubleshoot and debug your Kubernetes clusters, Something I have a few opinions about. And I don't often cover SaaS based products. However, Komodor just this week announced their new free tier, meaning you don't need to pay to get started with Komodor. Not only that, Komodor sponsored my time to produce the advanced scheduling demo

0:20 What is Komodor?

0:50 and lastly, they've also committed to being on an episode of clustered I'm going to put them through the tests. I'm going to give them a broken cluster and they're convinced they can use Komodor to fix it. So thank you Komodor for sponsoring the advanced scheduling video. I'm sorry and thank you for joining me on clustered very, soon. But today let's focus on the tutorial and show you how you can get started with Komodor. We're going to start from the beginning but also showcase some advanced use cases for Komodor. But the first thing we need to do

1:21 Getting Started & Adding a Cluster

1:29 is go to commodore.com. From here you can feel free to read the marketing material, go to resources, pricing, documentation, whatever you want. I'm gonna start by logging in which I use my Google account. So immediately we're presented with the service list. This is a list of all of the micro services or maybe huge services, who knows, deployed to your Kubernetes cluster. Now you won't see this list straight away, this is because I've already added my first Kubernetes cluster. But let me walk you through the process for doing that yourself. Down at the bottom left you will see

1:49 Komodor Service Overview

2:11 the integrations button. When you select this you can click on add a cluster. You can give it whatever name you want and hit next. This will give you the command that you copy and run-in your terminal. It will add a Helm repository, deploy the Helm chart with a Komodor agent, and then you click next where it will wait for the connection and confirm it. From there go back to the home page and you'll see your Kubernetes services from your cluster. Now there's a couple of nice things on this page right off the bat. First, all my services are healthy.

2:52 But secondly, we get a good overview of the workloads running in this cluster. You can see I have a bunch of Prometheus stuff. I've got one password connect, Weave GitOps, cert manager, shop at I, lots of cool stuff. Now if things weren't all healthy, we could either exclude the healthies or we could filter on the healthies. If you have more than one cluster, you can filter by that too. And if you only want to take a look at particular namespace, in my case, let's just take a look at my community namespace. You'll see that I'm only running a single

3:32 service. If I want to view the platform and the community namespace, I can do so as well. If you want to filter by workload type, we can click on daemon set and see daemon sets. Just the basic settings that you would expect from a service overview of your cluster. The last thing I'll point out on this page is at the top right. Here we can sort by a few options. By default is on health which makes sense, if there's something that is unhealthy in your cluster you want to see that first. The other view that I've been enjoying over the

4:04 last few days is namespace. This is a good way to break it down by namespace without specifically filtering on a namespace itself. And if you're only worried about things that have changed recently, go to last modified and you'll see the most recent resources that have been modified within your cluster. I deployed Prometheus today, so we can see Prometheus front and center. And that's the service overview. It's not life changing, but it's very valuable with just enough functionality to maybe pry cube control out of your hands when things go wrong. So let's see what else we can do

4:41 with Komodor. So we also have the jobs option on the left. Although I have no jobs on my cluster, however this is just the same as services, if you are using the job object or the cron job object you will see them listed here. Next we have the events, this will show you all the events from your Kubernetes cluster. Now this is something that can be typically quite overwhelming to do from the kubectl command line because events come fast and furious in the Kubernetes cluster. And when we have an abundance of information, bring in a visual layer

4:42 Exploring Kubernetes Events

5:20 to that information is how we develop understanding. So let's see how we can understand the events within a Kubernetes cluster with Komodor. Much like the service page, we have the ability to filter these events on cluster and namespace. However, now we can filter by individual service. We can filter by the event type. We have the ability to filter on the status of the event as well as deploy details and availability reasons. And we'll get into more of these in just a moment. But first, let's take a look at my platform namespace. Now here we can see all the events

5:31 Filtering & Related Resources

5:58 as Komodor was deployed to my cluster and went through the discovery phase. That is discovering all the workloads and resources within my cluster. From here, we can click on a service name. So that slides out at a nice kind of popover model dialogue meaning that we don't really lose our original context when we're debugging which I think is really important for debugging tool. So very nice addition. We have the service name, the health status, can see all the events for the service, as well as the pods, the notes are scheduled on, and some additional information which gives us access

6:08 Viewing Service Details

6:33 to the labels and annotations on the service. Okay. Let's pop in to the monitor namespace and we'll select our Grafana service. Currently, we only see information about Grafana, which we'd expect. We can see the events, again the nodes, pods and our labels and annotations. Now before we take a look at the best practice recommendations, let's pop back over to events and see here we have the related resources button. Now this is quite nice because it allows us to select other resources within the same namespace and we want to be able to collectively group them and view their events together.

7:19 So I'll pick on Kubernetes services, and I'll mark this as related. I'll pop over to config maps, where I'll select the API server one, and I'll pick one more, which is to pop over to secret and sacred fan out config. We apply the selection and now you'll see that the events listed for this resource include their related resources. Now I think this feature could be improved. I'd love to see scan the YAML for reference config maps, secrets and and services with matching selectors and hook this up for me. However, doing it manually for those few resources that I

7:56 do want to group collectively isn't exactly at the end of the world. It's a cool feature and one that could have some really interesting improvements over time. So let's go back to the information screen and we'll see these best practice warnings. So when we click this, we have a bunch of checks and here we can see that our deployment has one replica. Now this is a warning just because if we lose that replica, we've lost our service. So maybe you want to run two or three. However, you know your services better than any tool can. So feel free

8:09 Best Practice Recommendations

8:32 to use the ignore button. For Grafana, maybe we determine that we do only ever want one and we're happy for that to be offline if something goes wrong. We can just say ignore for fourteen days, thirty days, ninety days or forever. So perhaps I'm not ready to make a decision on whether this is good or bad yet and I'll ignore it for a couple of weeks. Next we have a critical warning telling us that this workload has no liveness probe. If we expand it, it tells us that liveness probes are sustained to ensure that an application stays on a healthy state. When

9:07 a liveness probe fails, the pod will be restarted. This is a pretty neat behavior, the Kubelet monitors our workloads and if it needs to kick them, it kicks them. So you should always try and have some of these best practices whenever possible and Komodor brings up front and center. So I'm not going to ignore that one because you know what, I should have a liveness probe in this workload. Now we've got some past ones here where we have a readiness probe, we have CPU and memory constraints and the last warning is just the pull policy.

9:40 It's not good practice to have an image pull policy of always. Usually preferred to set it to if not present, it just means when a work remotely starts, you don't need to go to an image registry and see if it can be pulled down and it usually means you're using some sort of alias tag system. Again, we want to kind of get away from that as much as possible. So it's not a critical but it is a warning that you maybe you need to update this. And I think this is a nice way to gain more insights and understanding of the

10:08 services within our cluster. So let's see what else we can do from the events page. So I'm not gonna filter on an individual service. I don't think that shows anything new, but if we scroll down we could filter on the event type. So let's filter by one of these event types and see what information we get back. Let's start with one of the most common ones which is config change. This is going to tell you when a conflict map is created, modified or deleted within your cluster. So let's create one. Here I have cm. Yml, which is a

10:33 Demo: Config Change Event

10:44 ConfigMap called Rawkode. When we go to the terminal, we can apply this to our monitoring namespace. Let's make a quick change to this conflict map and say that we no longer want key value, instead we want name David. Go to our terminal, apply this one more time, and let's go visualize this with Komodor. So right away, we can see that a ConfigMap was created in the monitor namespace called Rawkode. We click on this, we have all green because it was the first time this ConfigMap was created. We then have our change. And this time we can see that the key value was

11:31 removed and name David was added. And if you want to view this in more details, you can expand the depth. We can see the data changed along with some metadata about the resource as well. This is one of these really simple but very valuable features. When things go wrong on a Kubernetes cluster, it's not because the resources haven't changed, it's because of our changes that things sometimes go wrong. Human error is probably still the biggest cause of problems on a Kubernetes cluster. So it's crucial that you understand when config changes in your cluster and how

12:10 that can have a cascading effect on the workloads within your cluster. And your ability to see those changes as they happen will substantially lower your mean time to recovery. So beyond config change, let's filter on availability issues. Now availability issues give us an understanding of when a workload was unavailable. Perhaps because the pod was being restarted or the probes were failing. If we take a look at the Grafana one here, you can see that pod was unhealthy and why was unhealthy? Well, it was container creating. Of course it's not healthy if it's just creating. Also what's nice here is it shows you

12:24 Demo: Availability Issues & Pod Details

12:59 each of the containers and the status for them too. If you want, you can click the live pods and logs button. This will show us the current pod and our cluster for that selector where we could pop it open and go to logs. So it's nice having our logs right front and center when required. If we pop back to details, we can see the conditions that tells if our pod is healthy. We have the containers, the images are running, the pull policy, the ports, the mounts and arguments, all the useful information that you need. We have the ability to see the tolerations,

13:39 the volumes and of course the events associated with this workload. If you're a fan of the kubectl describe command, you can click describe and get the exact output on the screen as so. So to reiterate from the events page, we've seen a pod with availability issues. We went to the current instance of this pod, we've seen there was no problem and we had all the information we need to debug a problem. To debug if there was something wrong. Now the rest of the Komodor UI is pretty much more of the same. We can break down all of the resources.

14:10 Exploring Other Resources

14:17 We can see nodes. We can click on a node. We can see all of its conditions, the capacity, and allocatable resources across CPU, memory, storage, etcetera. We have the ability to cordon and drain a node if we wish. For workloads, it's the same. We have deployments. We can click. We can edit. We can scale. We can restart. You can do this for most of the resources within your cluster. For storage, we can see storage classes. For config, we go to config maps. And we can see them. Pretty much, you're getting a visual representation of everything you can do with the kubect

15:03 ectl command. We can even list the custom resource definitions within our cluster and search. So I'm not going to spend any more time going through this because these webpages are dashboards and as we all know dashboards are not to be looked at until something goes wrong. So how do we get information from Komodor to give us alerts when our attention is needed? And for that we have monitors. We can expand our cluster and we can see that we have some rules already in place. These are shipped by default by Komodor. We have an availability monitor. This will let

15:27 Monitors and Alerting

15:50 us know if any of our workloads are less than 80% for more than ten seconds. If we need 10 pods on a deployment and for more than ten seconds, we have seven or less, we'll get an alert. If our cron jobs are failing, we get an alert. We can get alerts for when deployments are updated and we can also get alerts for when our nodes are not healthy. So let's take a look at one of these alerts and then configure our own. Here is the deployment alert. This is going to let us know whenever a workload is

16:28 modified within our cluster. Using the integrations that you have configured with Komodor, you can use these as destinations. We can use a standard webhook or publish a message to Slack. I have a channel called SRE and I'm going to click save. So now if we modify a deployment, we should get a notification to my Slack channel. So let's test it. I have my Slack here, but first we need a deployment change. So I'm just going to do this through the Komodor UI. I'm gonna go to deployments, and we'll modify the cert manager deployment. We can click on edit YAML,

16:52 Demo: Deployment Alert

17:14 where I'm just going to add a new label. We hit apply, we can see that we have new events and we can see that our manual action to edit a deployment, we click on it, we see the change. So even though we can see the deployment changed here, this will not trigger a Slack notification because it uses the resources generation rather than a resource version. Which is good because that we really need a notification that a label changed on a workload when the workload itself is not restarted, rescheduled, or modified. Now let's make one more change to our

17:57 resource. This time we're going to add an environment variable, which is going to have a value of high. Now this will trigger a new generation. And if we pop over to Slack, we'll see the notification and the SRE channel. And if we click on this, it takes us directly to the event with the change. We can see that the revision and generation of this resource went from one to two because of an environment variable addition. This workload change is also denoted here by the new deployment which tells us that the image doesn't change but other aspects did.

18:47 So given that we have a pretty sophisticated troubleshooting and debugging tool here, it's also worth noting that Komodor has pretty elevated privileged access to your cluster and as such we need to be able to trust it. Luckily, Komodor provided the ability to protect some of our more sensitive information from being leaked through the Komodor UI. Let's go to resources, workloads, and pods. If I select the default namespace, I have a new super secret workload. If we click on this, there's not a lot to see here, but if we go to the logs, we have a password.

19:18 Configuring Log Redaction

19:36 How do we prevent Komodor from leaking such information? Now it doesn't have to be just an or standard out logging although when applications crash sometimes they do dump the environment revealing very sensitive information but also how do we redact this from config maps and other sources of sensitive information. Fortunately by default Komodor hashes all the information that it pulls from secret resources But we do need to put in a couple of extra steps to protect standard IO and our config maps. Although I hope you're not storing too much sensitive information in a config map. And I'm gonna configure this

20:16 through the Commodore UI. So the first thing we want to do is to go to configuration and config maps where I can select the Commodore namespace. Here we have the Kubernetes watcher config and I'm going to go straight to the edit page. You'll see we have two settings here on lines twenty and twenty one called redact and redact logs. These take a list of expressions to redact from Kubernetes resources and from log data. Now this accepts regex pattern matching like you can do with any sophisticated login library but I'm going to keep it very simple

20:54 for this demo. I'm going to explicitly say that I want my password one two three omitted and we'll add one more. This time we'll do a regex match for anything that looks like password equals then we'll open a matcher to dot star and we'll stick a space on the end. So now we will have to kick the Komodor agent so that reloads its configuration. We can just delete. And, already, we can see we have a new one running. So let's go back to our pod and the default namespace where we have our super secret workload.

21:38 Demo: Redaction in Action

21:51 And if we view the logs, our password one two three has been redacted. So let's modify this at the deployment level where we can say edit YAML. And we have password one two three, but let's also add password equals hello not secret like so. We'll cause this pod, which we can see here, to be terminated with a new one running. And if we pop open the logs for here, we can see that both values have been properly redacted. So this is a very important feature, but also a very cumbersome feature because security is hard. It's never easy.

22:42 It would probably be worthwhile for your team or organization to have convention to, well, not log sensitive values. But if you do, always make sure there's some marker in place so you can configure tools like Komodor and other logging systems to redact that information as fast as possible. And you can even run the container locally to test your redactions before pushing them to your Kubernetes cluster. Let's take a look. If I go to my terminal, I have a just fail. It's like a make fail. However, it allows positional arguments on targets and is generally just a little bit nicer

23:01 Testing Redaction Locally

23:23 to work with. From here we can see redact and you can already see the autocomplete here in the documentation from the just file but we provide a redaction phrase and a logline. I'm gonna say this redact raw code and then the logline that I wanna test is to say I am raw code beer me type. This pulls down the container image, does a little bit of plumbing and then shows you the input log and output and you can see what it was before and after the redaction. This means that you can test your regex patterns all you want.

24:03 Say you want to do password equals dot star question mark Rawkode because maybe we got it wrong and then in our test we'll do password equals blah raw code without any. Now this won't redact, but we have a problem. We can fix it. So let's run that again with the e on the end and oh shit, it's still broken. Well clearly, don't know how to spell Rawkode. There we go. We did do it right. It works. That string was redacted because we were able to test the regex. Well, that's it's really cool. You can actually

24:48 plumb in a shell script a whole bunch of example log lines that you have from your application with redactions that you know always have to be satisfied and hook this into your CI system and that way you know right away whenever you've got secrets leaking that should be redacted. So that's a quick overview of Komodor. There's an awful lot to love and it gives you great visibility into the wonderfully complex system that is Kubernetes running our wonderfully complex applications which are our microservices. This is just part one, there's going to be a part two of this video dropping

25:05 Summary & What's Next

25:22 early next week. In part two we'll be taking a look at more of the Komodor integrations. We will see in how to integrate your source control via GitHub. We'll also take a look at hooking up to Sentry for exception tracking and Grafana and Alert Manager giving you full visibility across all of your observability stack. And then at the end of next week, part three will drop where we take a look at two final features. One, the humble webhook and how we can get information from Komodor to do whatever the hell we please. And then one of my favorite features,

25:58 the vCluster integration. Deploying Komodor to all your virtual clusters and multitenant Kubernetes environments. We'll be back next week with the next video. Until then, have a wonderful day, and I'll see you all soon.

Technologies featured

Meet the Cast

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Comments, transcript, and resources

More from Rawkode Live

View all 173 episodes

More about Komodor

View technology
Kubernetes

More about Kubernetes

View all 172 videos
Helm

More about Helm

View all 49 videos