Klustered (Part III) | Rawkode Academy

Watch / Klustered Live

The embedded player needs JavaScript.

Open the video stream (HLS) Download captions (VTT)

Expand player Shrink player

Overview

About this video

What You'll Learn

Inspect expired Kubernetes control-plane certificates over Teleport and identify why kubectl/API access fails.
Fix a malicious host-level SSHD issue then restore WordPress workloads by repairing deployment and controller flow.
Troubleshoot MetalLB load balancing and DNS problems after node access, then validate application reachability from within the cluster.

Michael Hausenblas joins to debug two broken Kubernetes clusters: Justin Garrison's cluster with expired control-plane certificates and MetalLB trouble, and SIG Honk's cluster with a malicious sshd, a swapped API server image, and disabled controller-manager flags.

Chapters

Jump to a chapter

Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

0:56 Introduction and Show Premise

0:56 Hello, and welcome to today's episode of Rawkode Live. This is episode three of the series, I guess, clustered. A show where we give perfectly functioning and working Kubernetes clusters to members of the community, and ask them to break them with the intention of me and a friend joining me to try and fix them as best as we can. Now, before we get started, I would please encourage you all to subscribe and click the bell on YouTube. This helps other people find this content The YouTube algorithms need a little bit of nudge to show people that this is good

1:32 TV. And finally, there is a Discord channel where you can come and participate in the chat. We're also giving away a clustered t shirt today. So you've got about one hour to join a clustered channel and register or by enter the draw by clicking that little emoji guy on the post. Oh, please do that. Now, today I am joined by Michael from AWS. We're gonna do our best to fix these broken clusters. Would you like to just give us a quick introduction about yourself and then we'll pick up from there? Absolutely. Thank you so much for having me on

2:11 the show, Dave. That's really a big honor. And, yeah, I'm excited. I don't know quite what to expect. I watched the previous installment, so I have some idea. But I also saw the messages on Twitter, and I was like, oh, I'm really scared. So yeah. Now I'm I'm just excited. Let's let's get into that. Yeah. Yeah. It's one of those things that, you know, there there's I I don't wanna say infinite number of ways to break a clusters, but there there's a lot of ways that someone can break these clusters. And I think we've seen a very good variety

2:46 so far, and I think there's there's so much more that we can be put up against. But I'm glad you're here because you have a wealth of Kubernetes experience there that I'm hoping we can tap into and share some of that knowledge with the people that are viewing and joining in for them Alright. Cool. So let's Let me share my screen. So one of the things we want these episodes not just to be entertaining, but to also help other people. So there is a clustered repository on gitlab.com. We add little reports that show kinda what went wrong.

3:25 You feel free to read through these. Sometimes there's a lot of detail here that really show what was done to the cluster, the best way to fix it. So hopefully this is the viable resource moving forward as well. But of course there are no readmes, there are no post mortems for the two clusters that we're working with today. So our first cluster is cluster three. This is by Justin Garrison, who has, I don't know if kindly broken it is the right word, but has dedicated some of their time to break those words and we're gonna do our best to

3:43 Introducing Cluster 1 (Cluster Three by Justin Garrison)

3:59 fix it. The way that we're gonna do that is by hopefully lots of communication and one of my favorite projects, Teleport. We both have access to Teleport here, meaning we can and hopefully can click connect, join us route, have an SSH session that we share and verbalize as much of our thoughts as possible. So normally the first thing I would do is jump into this directory here. This is the tooling that I use to prepare all of the clusters. And there's a kubect config here. And I always like to run a get nodes and see what happens. Should we try

4:20 Initial Diagnosis: kubectl Fails (Certificate Error)

4:33 that first? Sure. Is that is that the first thing you would do? Have you got any special tricks, any commands you'd prefer? I usually, as a first step, look at version. So, you know, just version. Just to have an idea, you know, am I on a 1.15 cluster or 1.2 or whatever. So okay. That's something. Well, at least we have our first problem. That's very nice. Yes. Yes. Cool. I also usually look around in terms of, you know, CRDs and and stuff like that. But, you know, keep cutting that notes is also a good good way to see coming

5:20 from the infrastructure perspective. Well, that's that's not gonna work then, is that? Don't think based on the certificate error. Yeah. Probably not. So it looks like our certificate has expired. We cannot speak to the API server. As pair, we're gonna have to just get onto the box. So let's get this teleport session available. Alright. Let me zoom in on this a little bit. Now, you wanna just type something. Let's make sure we've got a a working pairing setup. Okay. Did you click connect or did you go to join session? I am on o o three. I am

5:34 SSH Access via Teleport & Finding the MOTD Hint

6:06 on that session. In there. Maybe I need to reconnect. I know you've got a different session. I'll join yours. Okay. Let me It's okay. Okay. Let me close this Yeah. We're now here, echo. Well Yep. I don't need to actually fix that and type it, but alright. Well, what do you wanna do first? So all that we know so far is that we can't talk to the control plane. Right? And what we've done now, we have SSH into one of the nodes where control plane components are located. Right? That's what we that's what we did with

6:56 with the the SSH. I just seen a comment from Justin's and he was as kind as possible, which is not filling me with any confidence No. Whatsoever. Not at So we are now in that session that we see on the screen. We are on a control plane node. Right? We are on a control plane node. That is correct. So why don't we just have a look around what is running there? So, you know, PS and then looking for control plane components like, you know, the Yeah. I normally look for the first. Yeah. Yeah. That's that's a pretty pretty so

7:40 it's there. It is there. Yeah. We we do have an API server. Yes. Yes. Yes. We see what we would expect there. Right? All all the things. I love how calm you are. I'd say in my head, I am screaming already and you're just so collected. I'm no. The thing is I'm looking at the tiny screen on our internal one rather than on the big one. So if that if that if I increase the font size, then you also you will literally have the same right here. So I shouldn't be doing that. Sorry. Okay. Cool.

8:24 So what do we see here? Okay. Anything that stands out for you? Well, we we did see that the start had expired. Justin's jumped in just to say that because we're using teleport, there is no MOTD. That is correct. So, let's let's run MOTD. Okay. Oh, why is my thing being a bit weird? Well, that's not a command. So we need to I'm just gonna reload this. My session's been a bit funny. There we go. There should be an MOTD file. Right? Yeah. There we go. Help. So, I can't get the cluster to work. I

9:19 have tried running trying to deploy some Kubernetes example. Right. And then exposing it. And then there's another problem. Can't resolve DNS. Right. But given that we have not yet access to the API server, there is really no point. Right? I mean, that's that's once we have fixed the access to the API server. Yeah. I think we're gonna have to take a look at these SERPs and see if we can see what the expiration time on them is and then let me just move that up. I can see that it's kinda clapping on the stream. Do you wanna try it then? We'll go

10:04 take a look at these search. Go ahead. That's fine. I'm I'm trying to assess how evil how evil Justin might be. What what beyond the obvious could be there? Yeah. So we have this inside, etcetera, Kubernetes. We have the PKI folder. This is where I would expect to find all of the servers. Now, the API server is probably the one we wanna kind of inspect. I can never ever ever remember the open SSL commands to query that. Do you know it or should I pop open a quick to go? I know. Yes. We we have to look at that.

10:44 Know. It's either in my history it's either in my history or I have some some cheat sheet somewhere. I don't remember these things. Oh, what? Paste is a bit funny. Let's try one more time and I'll save it manually. Yeah. That's not working. Okay. So it's open SSL x five zero nine. Oh, no. It's testing. It's just the file doesn't exist. I'm just being really silly. Alright. Now we wanna take a look at the API server key, I believe. Nope. That is not our trusted certificate. Let's try this there. Alright. There we go. Our certificate expired. Oh,

11:36 he's tamed it well. He's had expire right before. Yeah. Nice. Nice. Nicely done. Thank you. So, we're gonna have to renew this certificate. Nice. How do we do that? Open SSL. I wish it was just is there a cheat sheet? That's what we need. There we go. I always like when I kinda came up with the idea for this show, I always said to anyone that I mentioned it to, I really hope people don't mess with the certificates. It's like the one thing. It's I mean, nobody ever remembers this What what people really heard was messing with

11:43 Fixing Cluster 1 Expired Certificates

12:23 the certificate. Gotcha. I I have you covered. I'm gonna mess with the certificate. So we don't wanna generate any key. We just wanna It's it's it's not it's not super unlikely. Right? These things happen. Right? I mean, it's it's not totally, you know, made up. So So we're getting lots of, I mean, the chat's hard to keep up with, but Justin has dropped us. We hadn't, maybe I don't need to renew this after all, because he just left us one liner and yeah. Oh, he did. Very kind of him. So That's a good question. Are they expired

13:20 on all control plane nodes? Yeah. That came in from Duffy. Yeah. We should probably have to check that as well. So I think I think Justin has been very kind to us. There is this .APIserver.CRT. I mean, I could just blatantly copy that over the top and hope, but you know, we do have that open SSL command that we check the expiration date. Why not? Yeah. Why don't don't we check that first? Open SSL and run it against the the stock version and see if that's a better date. You're far too kind. Okay. I was just, you know, reading from the

14:04 left, like, February 18, like, you and then, oh, '22. Okay. We're good. Alright. So we could just Thank you very much. Yeah. That that I'm glad I didn't have to try and remember those comments. So now that we have replaced, we had a broken cert. We have replaced it. We just have to restart the API server. Is that all I have to do? I'm trying to catch up with the Yes. Yes. So we can just Well, don't know. Maybe the kubelet's already restarted up in the search change. I'm not sure. So we can always just restart the kubelet.

14:46 Oh, there's a key too. Thanks Justin. We need to move apiserver. Key to apiserver. Key. Alright. Let's restart the cubelet and hope for the best. Yes. He was very kind, but that just means that the main problem he wants us to fix isn't the steps. We still got two more things. Exactly. All right. Now we're on a control plane node. That means we have access to kubernetes/ecc/kubernetesadmin.com. Ever API server is working, I would expect this to be able to run get nodes now. Computer still says no data API server restart. Let's just check. No. That hasn't restarted, has it? March start.

15:35 Troubleshooting API Server Restart (Static Pod Issue)

15:45 Okay. My favorite command, till nine. Sure. Why why don't you do the sledgehammer? That that always helps. I could've said, I think, guess, but, you know, like so the capability should detect that missing is a static manifest. It should be restarted automatically. I'm just trying to rewatch on this n one. And hopefully that goes back up. I've never killed minus nine to API server before, so I'm hoping there was no other I hope that wasn't too harsh. We also got a comment. So yeah, QPDM and it faces search API server would have, I believe, we proficient or regenerated

16:34 the search. I believe if we had done that, we'd probably have to do it across all the nodes. I'm assuming we maybe won't have to do that by using the ones that were preferred. Thanks. My accent is kind of Scottish and kind of not. That kind of worried me there. Where's my API server, Mako? It didn't come back. Right. Okay. So Oh, the the hint there also mentioned it. Right? You need to move the manifest QAPI server out and then in again and then put it back again. But we restarted the Qplet. But apparently, the the static pod is not

17:30 picked up if if if it's not moved out in again. So if you go to Oh, yeah. The comments seem pretty everybody is yelling that. So I will I will just do as I'm told. Okay. So I think that's interesting then. I I would have I assumed restarting the kuplet would have checked to make sure that all the static manifests were running and it would appear that that's not the case. So that was just a silly assumption on my part. So it's a valuable it's a valuable insight. Right? Yeah. Yeah. Definitely. So, I've moved it back in and now

18:06 we do the wait. I'm not very good at the waiting stage. Justin says, is the cubelet running? Oh, didn't touch it, but we can check. We do have a cubelet. Yes. It was restarted a few moments ago, which is what we expect. Okay. I mean, I definitely could Should we check the API server? Adrian's just went, I've broken it more. Yeah. I mean, that's always I was actually more worried about copying the key on top of the cert or the cert on top of the key or messing up something when I was restoring the certs, but I think I got

18:50 that okay. So I'm not sure what the problem is gonna be right now. Like one more. Okay, that's definitely not coming back. So, I need to stop watching the comments and focus on fixing this. This isn't part of the plan. No, it's not part of my plan either. Right. Let's get this API server back up and running. A restart of the kubelet should check all the static manifests are running. We have even removed the manifest. So, let's try one more thing before we go into panic, before I go into panic mode, sorry. You're pretty cool.

19:33 I'm gonna move this out, restart the Qplet and then drop it back in just to see if we can kinda take things along a little bit here. So Just just that this wasn't part of the fact. Alright. Stay the same. Ian, no worries worries. We said forty minutes each. So no matter what, after forty minutes, we we switch over. And it looks like we're gonna spend the first forty minutes in the chat. Have servers on it. Something. Alright. Where where where are we? We're not gonna split this over too because we're very lucky and that we have all the tech

20:10 conk and Justin joining us in the comments. We will just really leverage any help if we start to run out of time. So Alright. Kubernetes Kubernetes is a team sports. Right? I mean, that's that's clearly like I don't wanna go walk alone. So moving it back I moved it back and I'm trying to work out whether I want to restart the kubel one more time or just kinda see if it's gonna detect that. Now we could always pull up the logs, right? So let's do In fact, we should have done this sooner. Let's see what the kubelet is actually

20:40 dropping into our terminal here. So it's not finding the node. Potentially messing with the certs is a little bit more dangerous than we expect. Messing with the certs, never do that. So the kubelet is trying to speak to the API server. The API server is not there. The problem is not the kill nine Adrian, come on. We've all killed nine a few processes before. All right. The problem could be that we just need to do the search on all of these. So I'm gonna quickly jump on to the other two control plane nodes. Let's just check the search, see if we've

21:27 Applying Certificate Fixes to All Control Planes

21:37 got the same thing. Let's see if we can get something online here. So PKI, do we have dot files? We do. Okay. So, cert on top of cert and key it on top of key and API server. Okay. Do we know the number for SIGHUP? I'll do this I'll do it a bit more gently this time. Can I just do stick? I can't remember if there's a dash flag for doing that, but let's see. SIGHUP number. One. Got it. Okay. Cal one. And the process isn't there anymore. So it looks like it was restarted this time. Yeah.

22:37 Okay. So control plane node two seems to be coming together a bit better. Let's do this one, back into our PKI directory. We've got one more dot cert to put over dot cert and one more dot key to put over. Oh, I copied. I panicked there so much. Right, okay. I was like, don't do that thing you just said you weren't gonna do. PS AUX. And I'm gonna monitor this one as well to see if this is auto restarted. So this was started thirty one minutes ago. Nope. Okay. Let's do kill one and process. Okay. So that should be a sick hop

23:26 here. That's been up for three seconds. Okay. Now let's come back onto this one. This is the one where nothing worked. Correct. And still nothing. Alright. We can ignore that one tonight. Do you wanna join the session on K67? K67, yep. Coming. And we'll see if, I'll just reload it. It's really weird. Like when it must be a bug. I don't know if it's Firefox or something else. Okay. Kubernetesadmin.conf get nodes. Okay. Cool. I possibly That's the one that is yeah. I possibly broken that one. I'm not sure how I've managed it. Not sure what's gonna with that static manifest,

24:04 Cluster 1 API Server Access Restored

24:17 but I'm gonna ignore it for now. Do we have the MOTD on all the machines? Okay. We do. So let's just work from this one. Sorry. You need to pass in we should alias. Let me do alias k equals cube control dash dash cube config and I'll just save as a lot of pin version. Oh. There we go. Sorry. I I remembered 1.2 that was from from your machine locally. All good. All good. I I thought maybe he has he has messed with that as well. Okay. Cool. Cool. So we should tackle problem one first.

25:07 Troubleshooting Cluster 1 Load Balancer (MetalLB/CCM)

25:10 Alright? Let's Yeah. Let's see if we can work this in order. So it says that they've tried to do an apply of this Kubernetes deployment. So should we and that's probably worked, but they've tried to expose it and that hasn't worked. If we run a get pods, we we actually do have the deployment. So there's engine next. Can we quickly before we do that, can we quickly do a it's just a a really wide look. Oh, okay. There's something. Okay. That's maybe a little bit I like to do get all minus a, meaning across all namespaces

25:51 to get a, like, wide look to see what's running across everything. Just to have a bit of a feeling, you know, okay. Using Cilium here, even Hubble is installed, The WordPress and other things. So one of the things, like, you know, just looking around a little bit to see if anything, you know, stands out. Okay. So when I did the m o t d and it says it's expecting a load balancer, I'm assuming then that the problem is is this pending what they want us to debug? Yeah. Let's have a let let's describe the engine x deployment.

26:44 Okay. No events. No nothing. Looks pretty standard. Right. And the service? Yeah. As we see, internet deployment. Location field. Alright. So we have in the event an a message that the metal l b controller is unhappy. So let's have a look at the metal l b controller. What do we have? Yeah. Metal l b system. Name space. So let's let's use that. Right? Config. Use this is add on paper. Config. Use oh, without all these plugins, I'm really Well, we have our alias. So why don't we just do this and then for the sake of

27:51 not remembering how to use that thing, we can just do metal l b system. So now your case should be there. So there you go. Fire away. Cool. So shall we have so what is the controller using the deployment here? Right? Are we in the We have a we have a speaker daemon set and a controller deployment. Our controller deployment is failing because of a node affinity. Oh, we are not in the in the middle. So get context essentially lists all the context there. Right? And we are currently, as you can see here, not in the metal.

28:46 So We are. I just I just changed the alias to pass in the namespace rather than update the context. Oh, okay. Okay. Oh, I I I forgot that you prefer sledgehammers. Okay. So what I tried to figure out was what the owner of so you did get pods. Right? So we have these pods. Right? Yes. And I wanted to see who owns these pods. Is it I'm I'm not that familiar with with metal l b. The speaker is a daemon set and the controller is a deployment code controller. Okay. So if you say it's gonna play here. Right? Yeah. And I

29:33 think it's got affinity that it doesn't want scheduled together, which is why we're seeing so many of them failing to schedule. These are nice. So, given I'm pretty familiar with running Kubernetes on Equinix Medal. I I think there's no defending on the controller is potentially a red heading and what I want to confirm is that our CCM is actually running, which is what would be requesting the Elastic IP from Equinix Medal. Right. But you're essentially you you know already, like, but how do you get there? Right? How do you get from seeing that message in service?

30:32 Right? But the the message we saw in in the service that says, okay. You know, the the metal will be controller has some issue. Jumping to the conclusion that you just did, I I guess that's that's maybe useful for people to to see. Right? How did you get there? How did you besides that you know it. Yeah. So, like, when I see this, I think, yeah, the no definite is a problem, we do have one running controller. So my thought process is, like, I'm actually not that worried about that. The speakers are okay. The error that we see when we describe

31:04 Identifying and Attempting to Fix CCM Authentication Secret

31:06 the service sorry, I'm gonna have to retweet our alias, my sledgehammer, sorry. So I can just get that back. So if we describe our service, which we know is failing, it says it can't get an IP address. Now, the thing that is responsible for getting an IP address here for a load balancer service is the cloud controller manager. So my hypothesis says he's removed our cloud controller manager or broken it in some way. But then let's check if that So we should see our packet cloud controller manager running here. Yep. And it's been running for fourteen days.

31:42 I was kinda hoping to see that broken. Have a look at the logs. Right? Let's let's have a look at the logs. If there is anything in in the logs of the packet cloud controller manager. Yeah. Maybe we should do it. I've control c that but we'll need to give it a second I think. Right. I was gonna add a sense to it. So when is that from? That's from that's from today. Okay. So we ah. Okay. So the API token that our cloud controller manager is using to speak to the cloud provider API to get us an Elastic IP

32:24 is now unauthenticated. Right. So we have the line here that there is an invalid authentication token there. Right? Yes. But the controller tries to speak with the packet API and so something in the controller. There should be a config map and the cube system namespace that has the details that we need for that to work. Now we could theoretically get into and retrieve. Okay. I'm I'm just kidding. Let's let's let's where where does that config map come from? Like, it's how do you create it? I'm not that familiar with Yeah. So I think we can just recreate

33:13 it manually. Now, this is where Justin doesn't know the setup and I do and this is actually more painful than I actually wanted it to be because, you know, as much as I trust the people I give these clusters to, I don't trust them. No. Don't trust them. So, there's project keys for each, but let's just check. In fact, Waleed, I think you and I just have to say my dear at the same time there. It's not actually a config map. It's a secret, isn't it? So, yeah. There is a packet cloud config. Right. So But it's also fourteen days old, right?

33:47 There is nothing nothing changed there. That is true. Yeah. Just that you wanna confirm that we are working on the right problem here and I'm not working on something that isn't relevant before I go trying to before I go flashing the auth token from this config anyway and then trying to fix something that isn't actually correct. But we looked at the CCM logs. We know that it's trying to request an IP and it can't. Right. The authentication is the problem. That's config map. That's secret, sorry. Feels to me like we're looking in the right place, but you're right. The fourteen days

34:28 on it is when I I I wonder if he's just modified the time in I wonder if he's been that sneaky. Justin confirmed it's it's a missing token. Right? But somehow I think it has not much to do with that secret here. I mean, given that it hasn't been touched. Ah, what if you modify the CCM deployment and then change them the configuration? You think he's that sneaky? Yeah. Let's have a look at the deployment. That's I think yeah. He he may be mean, but he's not like cool. Fourteen days as well. And he's also I

35:09 know we're we're pushing our forty minute mark here. Right? So thank you for the tips, Justin. But he said, he's surprised the secret doesn't say it was modified and edited. That's probably a bug. So I hope we found the bug. Would be awesome. So this secret has I see what you been modified. Let's let's fix it. Right? So we can do edit secret. And I don't mind flashing this. This token is not gonna live much more than this episode anyway. So, edit secrets. This there is my token cloud dash s a dot json. Well, no. It's not a bug because it

35:50 what it lists what the list view does, it shows the creation timestamp, not the the, you know Last modified one. That's right. Right. So, is cloud dot se. Json, is that what we're looking for? Let's check kube system, edit, deploy, packet, cloud, CCM. What was the packet cloud? Let's just run get deploys again. I'm not gonna remember. Packet cloud control machine. Edit deploy. Yeah. And I'll just copy and paste. Alright. So what secret is this trying to bring in? Secret. So that's just the I could cut configure. Yep. Packet cloud config. Yeah, that's the right name.

36:49 It's not specifying sub keys. So let's just go to the volume mount. So we can just add dot JSON. Okay. Get deploy. Is that restarted yet? Get pods. Probably need to yeah. Oh, No. Nice. Don't we all love to see crash loop back up? I thought I got that okay. Hold on. Oh yeah. It looked it looked good. Did you did you store? Oh, wait. That's the mount path. That is wrong. Dot JSON. Yeah. Dot JSON. Yeah. No. No. That's the volume then. I'm not gonna use that. Volume. Okay, let's revert this so that I've not touched

37:51 it. Yeah. Yeah. So, main path is to a directory called CloudSA. That is the name of the volume, isn't it? Yeah. Yeah. So, the file name that the CCM is affecting us. Alright, let's get the logs again. I feel like we're missing something here. Yeah, what are we missing? Sends 10 ms, invalid authentication token. Now he might have just changed the path. Don't know that off the top of my head. I'm gonna So just just said he deleted data dot token from the secret. So let's get to the secret. Alright. I never told you how much I hate.

38:55 That's sure. We're all together in it. No worries. So is this just supposed to be called token? I'm gonna check another one. I can't do that. Don't have one handy. Let's have another broken one. Okay. I'm not entirely sure. Where does the CCM live? There we go. Is there a Where is the secret? I'll just check the code because I'm gonna drive myself around in circles here. So I'm going to declare this a, does this have an example of kind secret? Nope. Alright. Let's grab this. That looks fine. That's the unbase 64 what? The value of cloud is a JSON.

40:53 Marcus has a nice suggestion. Okay. Alright. Let's tackle his second problem. Okay. So we theoretically have an answer for the search third one. Right? So that's that's something. Prompt is there's no DNS. Yeah. Okay. So Justin changed the token. Yeah. That's what I was trying to alluding to when I said that I'm aware of the setup and you're not, but I used very specific tokens for each project and then Okay. Dropping that back in a little bit more difficult than I'd actually like it to be. Anyway. How are we doing time wise? You said Yeah. We'll we'll give let's see if we

41:21 Troubleshooting Cluster 1 DNS (dig & resolv.conf)

42:06 can fix this tech problem in five minutes and move on to the other cluster. Yep. Let's time that. That that makes sense. Yeah. Yeah. So, there's no DNS to say the cluster. Me jump on this. They want us to run this command. And I need to use case. I get the token. We have a command prompt and then to dig Kubernetes and we didn't get a response. So our cluster DNS is using the correct IP address, looks like for internal DNS, but not resolving. So, what's step one here then, Michael? Sorry. I tried to to get onto that

42:58 session but somehow it's it's disconnected. Are we not in the same session? Oh, okay. I k f I thought but I if I say join here then it says now I am. Right. That's the one. Right? Yeah. He's I'll just reload. Okay. Yes. So Cool. We got a hint. Okay. So let's use the hint since we're time boxed here to take a look at the resolve.conf. Default sbccluster.localsbcdot. What's that name server here? I'm pretty sure that's the core d a k d n s service IP. Yeah. Good. Okay. But Looks okay. The resolve dot conf.

44:23 Let's go back in the container. Let's try doing a dig at mean, that's pretty much the same thing anyway. This could just be an NDots problem. Kubernetes. Svc. Custard local. Yeah. That works. Okay. Yeah. I think the n dot five I think that was the default. I don't know. Cluster DNS worked though. Just not with the dots. Let's change. Get I think Justin, sorry, a comment flew past there. That's not what I had. I don't know why it's broken. Again, I really hate this show. Okay. Edit core DNS conflict map. Centimeters. No end up configuration in here.

45:52 Alright. Let's see. Duffy says it's not ended. Okay. And Brent says, look at the search again. I am assuming the dig here. Yeah. So what have we got? Anything that stands out? Okay. Just as I said, he changed resolve conf on the worker nodes and updated the cubit config map to use my modified result. Okay. Yeah. I agree. Definitely. It's interesting. Cool. But we're kind of like we we can't really reproduce the error or the actual brokenness. Is that an English word? I don't know. You know, I was I I was still sure we we were

47:16 gonna do really well today that I brought myself a victory beer that I don't think I'm gonna get to drink anymore. Nice. Well Alright. I would say I think that was fun and we learned something. How about we we prepare ourselves for a little bit more pain? Yeah. Let's jump onto cluster two. We wanna make sure we, you know, allocate enough time to both. Justin has a repository of an Ansible playbook that he used to break this. We'll also do the write up in the post mortem. So I think we got the first issue, which Well, we got the service fixed. Thank

47:28 Concluding Cluster 1 (Partially Unsolved)

47:53 you for leaving nice working service on that directory. The second one was the API token, which we could fix, but I can't get an API token right now. And then the third one is the DNS. I'm not entirely sure what's broken. I'm really surprised to see that a deg. Coup, like a fully resolved name works, but not the, but not with the fluffy match. I don't even know if that's a technical term, it's that's no, but I'm always gonna call it. Not sure why that's working. I'm looking forward to seeing the Ansible playbook. You have a beat, Justin. Well done.

48:00 Kluster 006 by SIG Honk (Ian Coldwater, Duffie Cooley, Rory McCune, Brad Geesaman)

48:32 Absolutely. Alright. Let's move on to cluster number two, which is 006 broken by team SIGHONK. Alright. So So, what did we say there? We first need to Well, I'm worried. We have our teleport screen which is only listing our worker nodes. Right. No control plane nodes. No control plane nodes. Yep. Let me grab an IP address for one of these control plane nodes. I have a terminal here. Let's export the cube config. I know I'm gonna run and get notes and just hope that it works, but you know, it's not looking good. No. Okay. You'll need to use the

48:52 Initial Diagnosis: kubectl Works, SSH Behaves Strangely

49:24 broadcast view. Wait. Oh, wait. That did work. Are you sure that you're in the right oh, yeah. O six. Yeah. I I I remember that one, being in the right in the right directory in the for the right cluster. Right? That's a gotcha. But why is it is that really let me double check. Is that really yeah. No. Those are actually everything is everything is ready. That's too nice. That's I almost don't believe it. Alright. Well, we have an issue report. Oh, okay. And a directory. Hi admins. I can see from kubectl get pods dash a there, pods and things running

50:07 on the control play notes. But when I SSHN, I can't see anything running. Signed a confused guess. Maybe Specsaver, if you can't see. But but it only works in The UK. Alright. So what have we got here? There are some really nice are you contained binaries? Are you familiar with are you contained? I am a little bit. Yes. Okay. Has our SSH process been replaced? I feel like I wanna go into the proc and start digging her in. You know what? If if sick if sick honk has replaced that one, we are really screwed. We have pretty much

50:53 Investigating Containerized SSH Session

51:15 no chance to no. That's so that's your hypothesis. You you you think that the SSH team is not what we think it is. I mean Is that your I don't know. Do you think like, I'm pretty sure We do. Our root fail system is an overlay fail system. We yeah. This SSH has not taken us into the host. Because our root fail system is an overlay. We've got c groups leaking through. Yeah. This is interesting. Let's run a PSWW. That should give us that without cutting anything off. So that's our SSHD binary. Right. Which doesn't look right either.

52:26 And there's our yeah, we're I'm not really sure I know where to start with this one. How do we get out of this container? Well, oh, we can do that, right? Because our cube control worked. So we can get on the host with a privileged pod and fix SSH. Well, let's give it a try. Why not? Yeah. If that is if that is your hypothesis list, let's check it. Or do you agree with me? Do you have something Yeah. No. That like I haven't seen anything. Well, I just We we can run git pods, we can run

53:13 git nodes. Our control plane is fine. Obviously, SSH doesn't work. We were in a container. They've got themselves access to the host, replaced SSH. Right. So I think if we just run a privileged container, get onto the host, try and fix SSH. Right. I would say let's give it a try, but I'm wondering if if they really are that nice to us. But we well, let's give it a try. Why not? Yeah. Let's okay. So let's get a deployment. Yaml. If we are if we are going on the wrong path, sec honk, please interject. Help us out. Yeah. Yeah. Give give us

54:06 a little bit of of feedback if we are working in the right direction or not. That would So we can create a deployment. I know. We can spec this out. I'm not really caring what it's called. I don't even care what image we particularly run. Let's just do let's just do it out pane. Duffy just gave us a hint saying proxy or service and take a peek. Yeah. Why not? That's a low hanging fruit. Yeah. Shall we do that? Yeah. So pass our SS agent SS agent through to one of the working nodes and use

54:33 Accessing Host Filesystem via SSH Forwarding

54:47 that to go onto one of the other machines. Is that what we're saying? Yes. That would work. I mean, think the plan of running a privileged container and using that to get onto the host is more fun. But that would involve Another hint. Check your deployment. We need to see that WordPress site. Okay. Right. Six days. Port forward. Yeah. That's the cluster IP. I noticed. I used to extend all IP but Yeah. It's just I can't remember if I actually sell working anyways. Yeah. Of course. Of course. Thank you. Thank you. Thank you. Thanks. Thanks, honk. We have

55:43 been Always a pleasure. Honk. We have been honked. Okay. So you had your fun. Thank you very much. I'm I'm kind of like, you know, trusting in Duffy here. Kind of like, he he was just messing with us. That was awesome. Thank you. Okay. So we can Good thing that'll be full and paid forward. That's fine. Let's see if our kind of workers are affected. I'm assuming with the fact that they're listed and teleport means they're yeah. They're system d and stuff. Okay. How do I how do I put through my SS agent again? Or how do I

56:15 proxy for a machine? I can never remember the flags for this. Wow. Forward agent. No. Minus t minus t, was it? What's the flag? We could just do forward agent. There must be a flag for that though. Scroll up again. Alright. Thanks for the hint, Duffy. So if we go back into this, this is our I just wanna make sure it's still the broken one. No, that was the other one. Is that the broken one? Yeah. Duffy suggested using a different port number. Oh, it's a small p. Yeah. Okay. Phew. Okay. Okay. We're on the host.

57:29 Fixing Malicious SSHD Binary on Host

57:32 I'm gonna they disabled my thing. Okay. Let's get us back onto our shared terminal. Okay. We have our control plane node. Okay. Alright. You should be able to join the session. It's gonna break my screen maybe. Let's see. Did it break anything? No. It's it's still working. Looks like it works. Okay. What create pod objects? Well, a lot a lot of things create pod objects. Right? Replication controllers or replica sets. Well, let's have a look. Oh, sorry. I'll I'll lay off the tape in. You'll need to set an alias for key though. Words. I can't have control.

58:36 And then dash dash kube config slash e c c kubernetes admin. I have that somewhere. Do you want me to do it? You got it. Yeah. If you if you have it somewhere. Putting everything. I just I I'm I'm getting really good at typing this one line. Does it need Yeah. You need yep. That's right. There you go. Okay. Okay. Cool. So Okay. This is okay. And rapid cassettes. So there is one that's so something changed there. Right? So Six days ago versus forty five hours. It looks like the scale down my replica set and then spun up their own replica

59:17 Troubleshooting WordPress Deployment (No Pods Created)

59:40 set with the desired one, which must be a broken one. Mhmm. Can you do a get replica sets? Purchase it. Yeah. Just that. Okay. I think I said I just did get alright. Does that return? Yeah. Right. Oh, that was already doing it. Right. Okay. Let's I mean, I don't I mean, can we just delete their one and and scale up our old one? Like, that would be a Well, if the deployment that owns that replica set hasn't been tampered with, then, you know, one one only lives once. Yes. Slash Hammer. Come on. Embrace team slash.

1:00:09 Identifying Permissions Issue (Cannot Delete ReplicaSet)

1:00:34 Yeah. Why why not? Right? Do I have to do yeah. I do have to do a copy paste here. Right? Yeah. Nice. Okay. So Kubernetes admin cannot delete replica sets. Nice. Thanks. So, shall we check our Rawkode binding, our rule bindings Yeah. For the user? Yeah. Yeah. Let's do that. Alright. So, we should have a cluster rule binding. No. Of course, we're not allowed to do anything. Right? Nope. Sorry. I I dual taping. Platform. Okay. Anyways. Let's look at this deployment we've got. Let's see. Yeah. You see that he or they didn't touch the deployment. Right? The deployment is still the

1:01:50 the original one six days ago. It's really just a replica set. Right? Aren't you rude, love? There is a yeah. Duffy taught me this command and he's also just been very kind enough to put it here. The off can I? Let's see. So should those be stars or is Blank also a star? Oh, no. Wait. We've only got get list watch and all of this. Yeah. Yeah. Okay. Something. Let's I will not be beaten by Geese. So So, there's no touch to adminconf. Let's check the API server configuration. The plot thickens. So there must be something here

1:03:04 Identifying Malicious API Server Image & Admission Plugins

1:03:38 that is modifying the way. Openload our back. Looks okay. Not privileged. Do we have any PSPs? Well, I mean, the image doesn't look very trustworthy, does it now? I mean How's it I missed that? I mean, don't take it personally Duffy but Right? Yeah. Maybe let's let's replace that with a proper like upstream clean, you know, real API server. Yeah. Okay. I actually don't know what that is off the top of my head to be fair. Kubernetes. Probably cube dash API server but probably the the full URL. Sneaky bastard. You're placing the API server there. I'm gonna

1:05:01 assume it's gonna be the one with the most. No, it's not the one with the most downloads. Oh, well, maybe it's just not on Docker Hub. Let's check the oh no. What would our API server be? I'm gonna look it up. Hold on. IP address from the other cluster. That's where it's coming to. Okay. So let me just do this quickly. Yes. Kubernetes manifests cube API. The image is very cool. And let's go back to our shared terminal. We can fix it here. Okay. Cool. Now. Running for 62. So kill dash one, not nine. So

1:05:56 Fixing API Server Image on Control Planes

1:06:20 Duffy says the rest is fine, which means we should not trust him and double check everything else. Like, that's just that's just me being paranoid. Okay. So we've replaced the image and what you were trying to do prior to that was delete this new Right. Replica set. Right. In the hope that still for a bit. Nice. Did it pick it up? Do we actually are we running the new API server? The sec up may have just I was was looking at the at the chat. I didn't follow along if you actually applied that. Well, I modified the manifest and I sent a

1:07:07 sec up to the API server, but that one actually work as the sync up to the API server. Can we just do a PS and see? Oh, yeah. Oh, yeah. It's it's there, but Oh, yeah. You know, it's just a thing. I don't even know if it's using the right image. Now, when we try and speak to the API server, we could be getting thrown to a different machine. So, I guess we're gonna have to SSH onto the other ones and fix the image there too. So, me grab the IP addresses. Okay. So, SSH port

1:07:50 22 Kubernetes manifests API server that he changed them all, that they changed them all. Sorry, forget your team just because Duffy's comment. Copy that. Good. Okay. Let's restart this one. Yeah. Why not? One more tab and we need to do the other control plane node. Now, we also found a subtitle for clustered. It's the sledgehammer parade. Alright. Manifest. Kip APIs there were image. Delete and copy the image name. There we go. This proves it again. Kubernetes engineers are YAML engineers. Okay. Hopefully, that's like I I mean I I'm not entirely sure that that's a cup is actually

1:09:08 doing an image pool, but we can run a get r s. Must be the wrong one. Wrong one. Oh, no. It's in teleport. No. Delete RS. No. Okay. So that setup may not actually be pulling or expecting the image. It could just be restarting the process. William suggested let's try running the auth again. Good idea. We still got get list watch. So let's assume the hypothesis here is that the sec hop Well, we has to change the image. We don't have delete. That's why we can't delete it. We only have get list watch. Much. Duffies will just look in this file

1:10:05 again. Quite a blatant hint. Wow. Okay. So I knew there. Mean What we wanna do is check our kubelet configuration. Let's see if the static manifest path has been changed. Let's look at the kubeletconf. Won't be anything in there actually, that's silly. So we need to look in the flags and maybe this configuration or if I is it this one? I don't know. I'll just check them all. You fail. Alright. Next. This one. Manifest. Nope. Alright. Let's check the flags, etcetera, default. What was that out there? Oh, garlic. It's probably Nothing. Where would the static manifest?

1:11:49 Where's that path configured? Must be one of those files we've looked in. Let's run a status on our kubelet. This service file. Oh, wait, there's logs there too. I feel like I'm chasing my tail. Brent just pointed out container path looked off. Okay. Alright. So did I miss that in the the configs? I certainly missed it. Whatever. Let's let's make I searched for manifest. I should have set for static. Okay. Right. Well, we've already updated the image and I don't wanna have to do that again. So I'm going back to this. Yeah. Kudos to Eagle Eye.

1:13:30 Thank you very much. Alright. And then we will need to restart the kubelet because of that. We have a new API server. We still have no permissions. And do we have a new API server? Is that nineteen seconds or nineteen minutes? That's Nineteen minutes. Okay. That's still the old one. Right? Eighteen o five. That was the old one. Yeah. Yeah. That looks better. Alright. Let's give it a try. Nope. I guess we're gonna have to update them all. I love how it keeps on my actual SSH session. It keeps sending me messages. Well, you have

1:14:25 What was that fail called? It was the barlep tuplet. Okay. I think no one. No problem. Yeah. Config static. Oh, that one's okay. Vembar the kubelet config static. Yep. That one's broken. I am not frustrated. No, I definitely am. It's like, I think every time someone has broken a cluster, they have like, maybe it's just me. Right? But I I don't think I would break all of the the control plane nodes. I think I would be more forgiving than that. Everyone has been on it as a breaker has been so consistent in their their breaks.

1:15:22 Mean. Yeah. They are mean, aren't they? Kubernetes manifest. Okay. Hang on. Hang on. No. That's o o six. Okay. I will earn this beer. Alright. Okay. All those Oh, you mean the home. Okay. Alright. Let's see how that goes. Oh, yeah. One of the machines ah. This is Justin's wall command. Okay. So one of my machines Now now I got it as well. Alright. Well, hopefully that's them all fixed. I don't even know anymore. What day is it? So API server. Okay. So let me just close that with three. Like, go away. Config. Let's try from another machine and see if

1:16:30 we're getting a little bit better success. Oh, can I? No, we're still getting the same thing across them all. So even though we have a new API server with the correct manifests It looks like nothing but love or heart. Heart, is that a hint? Duffy said he has nothing but heart or love. I don't know which it is for you. I'm not sure if that's a hint. Yeah. Not there. Just making sure it wasn't overwritten on the command line items or something. So we're still trying to figure out if we're actually talking with clean API service. Right?

1:17:49 I think something yeah. They pass over, not the kubelet. You need to hack the output, which output or the the yeah, which output. There's an additional role that gives you view, but we do view. Right? Oh. Oh. Oh, so ah, okay. Now I get it. Yeah. Okay. Gotcha. So that So do we think this is gonna work? Hey. Hey. Confusing ourselves there by not viewing the top of the can I output? Yeah. Sorry. You can be kind. Thank you. Yes. They are right, Ian. It's it's your your combined evilness. It's not just individuals. Right? Alright. So we have deleted They conquer

1:18:57 making us Wait. Did it come up again? Okay. So let's do get service port forward. Oh no. Why am I doing this here? Zero zero six export to config port forward WordPress. Do I at least have my normal WordPress back? Yes. Let's see. No. Nope. I've pressed That would have been too easy. Why not? Oh, no. Because it's skilled then. Yeah. This is the deployment, doesn't it? Deployment. No. Edit, deploy WordPress. Replicas one. Okay. So the I guess we can just make an artificial change. Oh no. We can just fix the actual image. WordPress. I can't remember if that's the right image.

1:20:19 Let's find out. We've got a new replica set. Desired as one. That should be pulling the image. I'm confident. Hopefully. I better. I guess I should confirm the image is definitely called WordPress. In fact, I have it locally. We deploy workload automated here. Teamwork makes the dream work. Yep. I agree with them. Woah. Why is it trying to open it? Computers, just hate them. Yeah. Yeah. I can totally agree. Alright. That I said damaged the right thing. Close enough. Okay. I think the problem is something else. So we took our s. Yeah. That's definitely. Oh, hang on.

1:22:00 I'm gonna set the tool policy to always in case they've messed with something there too. Okay. And I'm gonna set that up just to force it. So how do we think this is going? Awesome. Alright. So something is wrong. Scrape, deploy WordPress. We've got our scaling events. But there are no pods. Right? Oh, you're you're on the Yeah. We're not getting any pods. I think we need to take a look at cubit logs. Do you think that they maybe even replace the controller manager? Maybe? Screwing with the replica set. Shall we have a look? We got a few errors here but I'm

1:23:03 Troubleshooting Controller Manager (No Pods Created)

1:23:31 now at a stage of delusion of debugging this. I have no idea what is a real symptom and what is is noise. Well, Okay. We we take hint. What's going on here? What turns the replica set into pods? Well, deployments for example. Right? So, if we looked at the deployment We're not getting any pods showing up. So Right. But I guess what Duffy hinted at was that there's something wrong with the deployment. That's my interpretation. Should we get the logs on the controller manager? I guess. I think he literally meant just the the WordPress deployment. Right?

1:24:36 No. I think we have fixed that. I think we have a replica set that's being created. I think then the controller manager's not able or not Okay. Doing something with that to create the pods. Okay. Let's test that. But why is it cube system get pods log dash f. Why did that not get any logs? Oh, there we go. Okay. We do have a problem. Okay. So, We're getting a connection refused when the controller manager is trying to speak to. Okay. So cannot get resource endpoints. Right. But that looks more like an RPIC issue, no?

1:25:41 So I think Let's take a look at the cube controller manager. Okay. Deploy. Oh, it's a static manifest, isn't it? Yeah. What's the change here? Look look look out for honks. Right? Always look out for honks. That would've been too easy. The image looks okay. It's like conquer off my Christmas card list from now on. Okay. So that is that nothing wrong with the controller manager permissions or binary. Okay. Well, that's good and bad. Right. Deployment deployment controller, replica set replica set controller. So but the replica set controller is part of the controller manager. Right?

1:27:01 I it is. Yeah. As far as I know. So those controllers, though. Yeah. Bad controllers. What are we looking at? User system cube controller manager cannot get resource endpoint. This beer was supposed to be a victory beer. Now it's a consolidation beer. Alright. Yes, Ian. I will I will get you a card for all major holidays. So let's get back to the the replica set controller. Controller. What? What's the star? I do not know. I mean, should we even be overriding the controllers? If I had move that, is it gonna default to something sensible? Will we comment that out and see what

1:28:18 Fixing Controller Manager Flags (Enabling ReplicaSet Controller)

1:28:28 happens? Sure. Kubernetes has absolutely same default. Why not? I mean, I'm just I'm absolutely I'm guessing I mean, it's worse than I guess now, to be honest. Alright. Let's see. Where's our controller manager? It has been running. Okay. It's been restarted now. So, do we have any pods? No. Wait. Oh, that's my SQL. Yeah. Okay. So maybe we don't have sensible defaults here. Oh, wait. So It's a dash replica set. It's so small I cannot Oh, I I can actually because that mean don't run the replica set controller. I've never used that syntax before. Did it restart yet?

1:29:33 Don't know off the top of my head if that is the case. Oh, now I don't have a controller manager. Oh, no. It's back. Okay. Do I have pods? Maybe it's just not ready yet. Thought sure that was gonna be it. Yeah. Let's let's let's follow valid advice regarding the manifest. Let's go back to the manifest and double check again. The options there. Yeah. I don't like that star. Should it all be star? You know what? Have a let's have a look at the actual dogs. I I was told that this is a lot of fun. Let me have a look

1:30:26 at the at the reference here. Kubernetes. That's the control. Oh yeah, star is a sensible default. Okay. So, of controllers to enable star, enables all owned by default controllers, foo enables controller foo, dash foo disables foo. Okay. Let's go with so if we ah, right. Got it. So by setting any of them here except for just star, that'll be deleted there. So the default should have been Yeah. Just just do star which is the default which means all. Right? That's what the docs say. I guess this might not be the active control plane controller. So, I really need to do this on

1:31:21 the other two machines as well. Right. Which is probably why, in fact, removing it should run the default and I think the reason I never got the default behavior when I tested it last time was purely because it wasn't the leader. So that's j g. That's the six w, which means I need to get onto the machine. Let me look over here. P. Alright. Just this one. Yeah. I know William. This is it's a it almost feels like cheating. Right? Looking at the dogs and actually RTFM, we need to find manual. That's I know. Yeah. That was a good idea. I'm not

1:32:03 sure how long I would have given that before I gave Keith then and finally reached for the dog. I should just get in the habit of doing that sooner. Alright. All of our controller managers should hopefully have restarted and I'm not I'm gonna ask them the default should okay. So k get pods. Duffy asks how many controller managers do you have? Well, how many do we need? I wonder how far behind the comments are. I feel like they're giving me advice right after I say something. Time stamp looks okay. Oh, wait. Controller manager. Okay. 32.

1:32:51 WordPress Service Accessible

1:32:51 Ta da. Well done. Great job, David. Well, let's not let's not, you know, we're not yet allowed to grab a pint because we wanna see we wanna see if we have been de honked. Right? Duffy, I'm not that sure if if if I agree that's oh, there you go. That's what is it? But is it? Oh, I don't I don't think once you said? Let's let's let's move on to the next. I think Yeah. I I bet that you you you folks have a lot of fun seeing us sweating here, our pants for how we doing time work. I didn't

1:33:34 realize it was still dark in my room. Oh, damn it. Let me quickly let me quickly Alright. SecHonk, please let us know if we got everything there. Is there anything else we need to look at? Oh, so Michael, you've got a hard stop. I'm sorry. We went two minutes I just it's a fucking wrap up. Alright. So let's those were two great clusters there. Right. I think we had a a very varied set of problems there. I'm not sure we've fixed everything on Justin's cluster. Cluster. I'm looking forward to reading the Ansible playbook and compiling that post mortem together to really show

1:33:59 Final Wrap-up

1:34:20 what happened and how to fix it. It looks like we have some sort of weird discrepancy between what's happening on the teleport SSH versus the standard SSH. So something I just I just say hi to Bartek who was so kind joining us now. I'll be over in a a moment. That was really fun. Thanks a lot. That I I I definitely learned a lot, like, how mean people can be. So and and hope it was useful and and fun for others as well. I certainly have. Alright. Well, thank you, Justin, teams like Honk, and Michael for joining me today.

1:34:56 It it's so challenging being on this site. I thought like, you were so calm. I don't know how you how you just keep so collected through that. Like, inside my head, I am literally screaming all the worst words in the world to our breakers. So I I learned stuff today about some of the configuration and stuff. And that's the point of this. So thank you to our breakers. We're all learning together and hopefully having some fun while we do it. Michael, thank you again for spending this session with me, and I will speak to you soon. Thanks to everyone

1:35:25 who joined I've not seen who won the but whoever it was, congratulations. Alright. Thanks a lot. Bye bye. Cheers. Bye now.

Meet the Cast

David Flanagan

@rawkode

Michael Hausenblas

@mhausenblas

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Code

Klustered GitLab repository

Justin Garrison's Ansible playbook repository for breaking the cluster

More from Klustered

View all 45 episodes

Alex Jones & Alistair Hey

Alex Jones & Alistair Hey

Hans Kristian Flaatten & Zach Wachtel

Hans Kristian Flaatten & Zach Wachtel

Jetstack & CrashBeerBackOff

Jetstack & CrashBeerBackOff

The Community Vs. Rawkode

The Community Vs. Rawkode

IBM & Nisum

IBM & Nisum

Marino Wijay & John Anderson

Marino Wijay & John Anderson

More about Kubernetes

View all 172 videos

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Kubernetes Security Scanning: The 4 Tools You Actually Need

Kubernetes Security Scanning: The 4 Tools You Actually Need

More about Teleport

View all 38 videos

Alex Jones & Alistair Hey

Alex Jones & Alistair Hey

Hans Kristian Flaatten & Zach Wachtel

Hans Kristian Flaatten & Zach Wachtel

Jetstack & CrashBeerBackOff

Jetstack & CrashBeerBackOff