About this video
What You'll Learn
- Reproduce and fix broken node reporting by restarting kubelet services, checking describe node output, and validating worker registration.
- Detect RBAC and scheduler failures by decoding Kubernetes API errors, finding missing roles, and repairing role bindings.
- Diagnose image and DNS failures by tracing webhook mutations, containerd registry rewrites, and CoreDNS Corefile misconfigurations.
Walid Shaari and Noel Georgi join Rawkode to debug three broken Kubernetes clusters, tackling kubelet failures, RBAC misconfigurations, a missing scheduler role, an AlwaysDeny admission controller, mutating webhooks, containerd registry rewrites, and CoreDNS Corefile sabotage.
Jump to a chapter
- 0:00 Holding screen
- 0:55 Introductions
- 1:14 Housekeeping (Subscribe, Discord, Sponsors)
- 2:14 Introducing Walid and Noel
- 3:34 Debugging Walid's Cluster (Cluster 19 Begins)
- 3:40 Cluster by Walid Shaari
- 5:30 Initial Cluster State & RBAC Issue (Describe nodes fails)
- 7:03 Fixing the Worker Kubelet (Restarting dead kubelet)
- 29:40 Cluster by Noel Georgi
- 1:00:52 Investigating Pod Pending State (Cilium/Postgres)
- 1:02:00 Cluster by Rawkode
- 1:05:30 Identifying Cluster-Level RBAC Issue (Can get, cannot describe)
- 1:11:06 Debugging RBAC Configuration and Roles
- 1:17:22 Finding & Using the Admin Kubeconfig (.honk hint)
- 1:19:56 Debugging the Kube Scheduler (Pods pending due to scheduler)
- 1:24:18 Identifying Missing Scheduler Role
- 1:25:22 Fixing Core RBAC via API Server Restart
- 1:29:26 Walid's Cluster Fixed
- 1:29:43 Debugging Noel's Cluster (Cluster 20 Begins)
- 1:31:09 kubectl Binary Hijacked (honkctl alias found)
- 1:32:04 Identifying the AlwaysDeny Admission Controller (Discussing finding the second plugin)
- 1:32:22 Fixing the AlwaysDeny Admission Controller
- 1:33:25 Post-Fix Discussion & Learnings
- 1:33:30 Initial Pod State & Rollout Failure (Forbidden)
- 1:34:58 Conclusion and Next Episodes
- 1:37:48 Investigating Mutating Webhooks (Finding honk webhook)
- 1:43:59 ImagePullBackOff Error (Incorrect image version)
- 1:46:57 Debugging Port Forward Failure (Investigating Iptables)
- 1:52:27 Mutating Webhook Causing Image Revert (Revisiting the webhook config)
- 1:54:18 Containerd Registry Rewrite Issue (Image pull fails)
- 1:57:36 Fixing Containerd Registry Configuration (Deleting hosts.toml)
- 2:00:00 Debugging Coredns (Corefile Rewrite)
- 2:01:25 Fixing Coredns Configuration
- 2:02:32 Moving to David's Cluster (Cluster 21 Begins)
- 2:04:14 Initial Cluster State & Slowness (API server timing out)
- 2:10:00 Investigating API Server Errors & Missing Logs
- 2:16:01 Fixing Log Directory Permissions
- 2:21:20 Identifying the Source of Slowness (Throttling Proxy)
- 2:23:47 Fixing the Throttling Proxy (Renamed SE Linux process)
- 2:24:26 Pod Still Creating & Admission Denied (LimitRange forbidden)
Full transcript
Generated from the English captions. Timestamps jump the player to that moment.
Read the full transcript
0:55 Introductions
0:55 Hello. Welcome to Rawkode Live. Today is part one of three of a clustered KubeCon special. We're mixing things up a little bit today and doing it slightly differently than we have done in the past. I hope you all enjoy the new format. Feel free to leave comments as we go, but do our best to get them on the screen and so forth. Now before we get started, there's a little bit of housekeeping. First, please subscribe to the YouTube if you haven't already. Also remember to click the bell. That way you'll get notifications whenever I go
1:14 Housekeeping (Subscribe, Discord, Sponsors)
1:23 live covering clustered or any of the other cloud native technologies out there that we'll be doing learning materials on. If you're not watching live or even if you are, but you just like to chat, join us in the Discord at Rawkode.chat. There's now almost 300 of us in there talking cloud native Kubernetes and everything in between. So come and say hello. And lastly, I wanna thank my employer, Equinix Metal. They allow me to do this on their time, which is great. They provide the resources for me to use for all of these many, many clusters also. So thank you
1:54 very much Equinix Metal. You can check it out with the code Rawkode. This will get you $200 That's up to four hundred hours of bare metal compute. Check it out, have some fun and let me know how you get on. Alright. Like I said, today we are mixing things up a little bit. This is not breakers well, it is breakers versus fixers, but a slightly different format. I am joined by Walid and Noel. We have all brought a broken cluster with us today, and we're gonna pair up and fix where the breaker laughs at us along the way. If we need
2:14 Introducing Walid and Noel
2:24 any help, I'm sure we'll all be helpy. We'll all be happy to throw in some advice as well. So first, hey, Noel. Hey, Walid. How are you both? Hi. Hey, good. How are you? I know. I just said like hello to you both, and then, like, give you the opportunity to fight each other to say hello back. So no. Why don't you say hello and tell us a little bit about you first? Yep. CubeCon week, so deprived of sleep. That's our only downside, and otherwise, I'm doing good and, like, catching up all the talks and
2:55 being in the hallway section. Yeah. Definitely. A really packed week. So much going on just now. So hopefully, this gives you a little bit of a a break from all the talks and a little bit of fun in between. Walid, say hello and let us know who you are, please. Hey, everybody. I'm Walid, the cloud native janitor. I'm still early in my journey to cloud native, watching David videos as I am, and he's and hoping to learn something today. Awesome. Well, we have three clusters. We we've each broken one. And we're gonna start with cluster 19,
3:34 Debugging Walid's Cluster (Cluster 19 Begins)
3:37 which is Waleed's cluster. So Noel and I will be taking point on this and trying to debug our way to get this cluster happy and healthy. Walid, we'll let you know if we need any advice, but I'm sure this is gonna be very interesting. I'm curious to see just how many ways we can break clusters. Like, there's one thing that continues to amaze me with every episode of clustered is that we're always seeing new weird ways of breaking the system. Alright. Noel. And probably after watching some talks, like, over this, you've gone, I probably have
3:40 Cluster by Walid Shaari
4:10 more ideas. Yeah. Hopefully, keep going has given us all a little bit of advice to try and fix these things. I'm gonna join onto the control plane of Wally's cluster. If you can join that session, please know. Yeah. And just type echo hello. Let us know that you're there. There's mine. And I'm sure Wally just made this nice and easy for us as well. Right? Use cube control. So Alright. I didn't even use Carol. Alright. Let's just get our setup working with the Kubernetes clusters. We'll just export the cube config. We'll do alias k. Otherwise, I'm gonna be
4:49 typing on MacMod all day. And then, Noel, I will give you the honors of running get nodes or get pods whenever you wish. Oh, I'll just have that auto completion. And the completion. I always forget about the completion. Yes. Oh, what did I do? Oh, it's on the Yes, Lewis. Yeah. Completion bash. Right? We're learning from you, Lewis. Glad and typing. That's because you're being watched. Look at that. We have a control plan. Thank you, Vallid. We appreciate that. It looks like we have one worker node offline. We just check-in our yeah. Go for it. I'm just describe
5:30 Initial Cluster State & RBAC Issue (Describe nodes fails)
5:44 it to see if it's like something obvious. Oh, come on. Oh. So we got an access for Ben on describing in node. RBAC. Oh, I see Wow. I thought this was a CKE issue. All right. So I mean, taking down one of our our work and all that's obviously a problem. Shall we check to see if our workload pod is also running? Just to see if there's more than one issue. There's Yep. Three issues. Three issues. Alright. Let's do a get pods. Alright. Our postgres has been terminating for a rather long time. I suspect that probably because
6:34 the node. Yep. That's not ready probably. Yeah. That's what I'm thinking too. Yeah. Okay. Yep. That's what you think. And not yep. What's it like just sitting there and getting to watch a sweat a little while you do? Is it good on that side of the fence? You know, you laugh last, no? Anyway. Alright. So I've opened a shell on the broken worker, if you wanna join that. And we'll we'll start debugging. Just give me an echo or something when you're there. Did you join, or did you start a new session? I my Firefox is acting weird. I can't
7:03 Fixing the Worker Kubelet (Restarting dead kubelet)
7:22 see the other taps. It's kind of like even better. Alright. I'll poke around just now while you sort that out. So first thing I'm gonna look for might have to close and rejoin. Yep. Give me a minute. Like, I just picked in and rejoined. You take your time. I'm just gonna poke around right now. So I'm just checking the status of our system d service for the kubelet. It appears to be loaded, enabled, but dead. I wonder, I'm just gonna try the completely naive thing first but I'm just gonna restart it. I guess we're pulling up oh, where's my
8:15 You're not on the controller blink. You're in the node. Yeah. I was just seeing a Your your muscle memory is working here. So this is I saw some I saw some folks basically asking that they wanted to watch this for doing the CKA, and I watched you and and the CTF saying that what brought this ID was the CKN, CKAD troubleshooting. So I said, okay. Let's have some CKA troubleshooting session. So this would be simple. Okay. Well, it appears. There we go. We have no back. Great. I always make the mistake that I sometimes I think complicated,
9:08 but things I learned from Lewis from control plane that thinks think simple first. We have some might. We have We do have some errors. Right? We do. So we I restarted the kubelet. It's now running, which is good. I don't know why I was p s a u x and grabbing for API server because I'm I'm silly. But the kubelet is running, and and we do have some errors that we probably Oh. Check. That's, I think, fine because oh, let's see if the ceiling ports are running. Yeah. We'll do that from the control plan, I guess. So oh, sorry. On you go.
9:57 You just take I've it's all yours. It's not cube system. Right? But I'll just check just in case. I guess it's called CapEx. So the first issue was fixed in less than ten minutes. That's very good. Alright. It looks like we have a I've got quite a few pending things going on here. Do you wanna describe one of those pendings? Yep. Oh, man. Alright. Can get the pods, but I can't describe it. Strange. So Okay. So we need to Especially the guy watching KubeCon. Okay. So we let's do a kube control off. Can I list?
11:14 Yes. So for node, we don't have get permissions. And what so the Alright. Custer role. But he have certificates and certificate signing requests. I wonder what can you do with these. Okay. So you haven't modified the cluster admin role. Role. So Did I or didn't I? You'd well I just opened it. No. There is something called, like, a something I found recently about this cluster role aggregation. Right. Right? The policies can be actually merging from a different role, but I don't think it's happening here. So it's fine. It could be an aggregation. Could be. It couldn't.
12:03 I don't think I'm enjoying having a breaker on the stream now. This is fun. Yep. I want to give a shout out to Mark and his RR back session. It was very helpful and very Oh, I missed it. Yeah. Never saw that one. No. No. I'm really disappointed with myself, man. Alright. Okay. So we looked at the cluster admin rule, which I believe is the one given to us by Kubernetes. So now we're gonna have to either take a look at the system aggregates. And there's no way to get modified times on any of these unless
12:42 Oh, you can actually. If you scroll back. Yeah. In the YAML. Yeah. I was gonna get the YAML of all of them and just kind of grep for Oh, what? Maybe not in cluster or maybe. That was super helpful. Okay. So Let's get only one. Yep. Okay. Let's confirm something. I'll I'll like to do some type of note, and I'll just kind of get some so it could be a cluster role, but we haven't actually tested to see if the describe was all namespaces or not. We tried Selium and cube system. Do you want did we
13:23 try default as well? I think we did. We tried to describe Postgres. Maybe. So you are on the right track, by the way. So if I do this, it shows up and describe. I don't care which board. So this is happening at a cluster level. So it's like app app yeah. So it's all applicable across the namespace, everything. Okay. So And when we did the role, cluster role. Right? Let me just look at it again just to confirm we didn't miss anything. I mean, all I see are scars. This doesn't have the last applied configuration and
14:05 rotation. I think that might be, like, for cluster roles. It would actually tell us, like, when it was modified at least. I haven't played with that one. Alright. So let's take a look at do we see edits in the event log? Do wanna try kubectl get events? Sneaky. Okay. Let's go through the system rules and actually work out what we have access to. So I think we should start by taking a look at first admin, and then we'll do the aggregation. Yep. So we did it. Look at we did look at the admin. Right? So let's
14:58 But we looked at cluster admin, but not just admin. Remember, I was inspired by capture the flag and by Mark session. I wish I had seen Mark's our back session, but I didn't. Okay. So let's do edit cluster rule. Oh, no. Off you go. Sorry. Just a minute. I just want to confirm if this is actually, like, the cluster admin itself. And something Rory said on his session with you, David. Okay. This was not modified. Let me give you a hint. How about cluster role binding? Maybe they would Ah. Let you let you come closer.
15:51 Hello. It's on the same issue, mate. I wish I could remember the CTF. It feels like a it feels like a month ago. Ah, so in the cluster on binding, we might be given a different service account. Okay. So we should probably take a look at the customer admin binding and see if the roles are sent through. On the on this now? This one here. I'm still looking through. There's a 50 layer. Okay. Kubernetes. Let's not delete it. So let's see, actually. No. Don't delete anything. Yep. I I I like delete sledgehammer approach. So Check first before deleting.
16:40 Yep. So for the Kubernetes admin user, add cluster sys admin. Oh, let's describe cluster oh, we can't describe. Let's edit cluster sys admin or get all YAML. Yep. In fact, we don't need to scrape. Could just be doing get all YAML and everything probably to get around the the scrape. What is it called? Cluster sys admin. Cluster sys admin. There we go. See? Yep. So it's, like, limited. So we just need to update the role binding to just use a system sorry. Cluster admin. I think we can just delete the role binding, and we should be
17:17 No. No. Don't delete the role binding. If you delete the role binding, you'll miss the hints. There are hints somewhere here. Oh, you've you've taken this on a journey, are you? Okay. There is a flag as Lewis said. Oh. Alright. Let's take a look at that rule binding again. Maybe not the rule binding, but it starts okay. Alright. Look. There's an original fail. No. I was just thinking so if we just Do you see the annotation? No? Ah. Come on. So it looks like it's in slash e t c slash Kubernetes. Think slash e t c slash Kubernetes.
17:58 Ordinary file slash honk? Okay. I don't think there's a honk in there, but it might be somewhere else. Dot honk. L s slash dot honk. And a honk too. Don't look at the honk too yet. Alright. So you have created a new user with rule binding and swapped out Yeah. I created the new rules there, but I didn't give him the group. Yeah. So Yes. But the file times but wait. Didn't we export it to default one? And the file doesn't seem to be You you modify this the this timestamp on it. I yeah. I don't
18:43 yep. Never mind. I just want to confirm it's actually saying Kubernetes admin. Yep. Anyway, so what I will do is give better dash q config.admin.conf. It's Good. Lewis, you'll be on the Caesar's stream soon. So So we could just switch to the default. I I had a challenge because, basically, the Kubernetes administrator is part of system masters. And you cannot basically redefine the roles for sys for Kubernetes administrator because system master is in the code. You cannot change it. So I have to basically override it. Okay. So now we should be to able to cube cuddle,
19:42 describe no. I just get the note because we don't did we fix that? Oh oh, that was fixed. Why does that wait? One issue per ten minutes. That's great. Yep. Q cuddle minus and cilium. Get pods. Alright. So we still need to describe these and work out why they're pending. Yeah. Hi, Philip. Hi. Come on. I suspect we really wanna look at honk too soon. No. Don't, please. I don't know it is in honk too, actually. It's it's not yet even scheduled. I'm sorry. I was too busy. It was in last Right? It didn't even get scheduled. It's still, like,
20:31 in a pending stage, like, there are no events or anything. Okay. So it's not been scheduled. Yeah. You're right. So the scheduler may be done. Wanna check for a scheduler in cube system? Yep. Gosh. Oh, look. It's not there. Okay. One of the oh, we don't have a schedule there. Yeah. Let's go check the static manifest. Let's see. Kubernetes manifest. And let's get it. Oh, come on. We do have a group contributor, but it's not running. It's probably the mount okay. Let's look at c r I cuddle just to make sure, like, it's probably c r I cuddle dash
21:27 dash runtime endpoint. It's working. It's working. C r I cuddle. It I fixed it for you. So you have I am very nice So the cube scheduler is not running, so we can actually lock. There's two. There's one running and one exit it. Okay. So let's but it didn't register itself. So Well, it's not showing up. I hate these silly things on Linux, like, not able to, like, copy paste. Just go to varlog containers. Yep. It's Rback. Oh, Rback again. Mark session must be very good. Alright. Let's pop open the cube scheduler and check the service account.
22:19 Oh, come on. Stop. Yep. Okay. Cube scheduler. Yes. Does it has a service account associated with that? No. It doesn't. But it uses a volume on. So Yeah. The schedule.com. Okay. Let's go check that file. Authentication. Yep. That's the authentication file it uses. Yep. Now those two have the same file. So Etsy, Kubernetes, should do that account. B Etsy. You're smashing this now. I've got faith in you. Should do that account. I have I am successful. I see you guys. Oh, so this must be, like, a similar one. So they're all binding for a system cube
23:07 scheduler with the different one probably. So you'll check other cat. No. No. No. This is fine. Ah, okay. I wanted to change it, but that's this one is fine. But it has some hints also. Not my hints. Yeah. Certificate of RT. Let's Yeah. Is the cluster name correct? I just want to confirm. Yep. It looks for it. Yeah. But what was the error message again? I don't know. We don't pay attention to it. That was, like, too long. So I can't do that. That sense. I don't know. Does that work? Actually, we could just oh, man. This is
24:02 like so it says fail to watch. So the watch permission is mostly missing. Oh, there we go. And what else it says? It is telling us that the cluster role system keep scheduler is not found. Okay. So we should look at a cluster. Right? So let's look at the custom. I pay his renamed error. So this is remake, by the way. If you remember, I don't know which cluster was it. But Phil did the same thing, but you fixed it by mistake. So I wanted to see why it was fixed by mistake. So if you edit the Kube API server
24:53 and add our back and remove it again or something, there are certain roles gets recreated. Yep. Because those are, like, admin things. Right? Yes. The because they are built in. So I deleted the role, and it's in oh, man. It it did it get created again? No. No. That's the volume scheduler. Yeah. You're So the the so the role is in the hunk hunk two folder. Yep. I guess I mean but another way to redo it is just edit the Kube API server. Alec loses the base. Make the same mistake twice. And just let it complete.
25:32 Right? Thanks for saving the manifest. Yeah. We appreciate that. Yeah. Let's just apply that then. I can just do you gotta apply that. Just I think that's useful to share and make sure we we get that across to everyone watching. What you're saying is if we delete any of the core cluster rules, if you remove authorization from the API server and add it back, it will recreate all of those rules. Yeah. Why don't you try it? Can we just restart the API server, or do we actually need to disable authentic authorization? It will restart. The moment you change
26:11 something, it will restart anyway. I see Kubernetes manifest, and I just need to touch it. Right? No. No. You need to edit it. Let's try touching it and see if it does it. I'm curious if just a restart would do Yes. Okay. Yeah. Then we can try that. Yes. Go ahead, Noel. That cube. It's sure. Right? Yes. Yeah. Rep. I just API. It's not. Not restarted yet. Give us a second. It does it does take a click. Yeah. It's like sometimes it takes a while. That's what I've seen when I was breaking clusters. Would have killed my just named it by
26:56 now. The bad point I think the recreation was actually, we had a question about the same thing in the chat, our Discord chat. Like, someone lost the cluster admin role or something in a case cluster. They just restarted it, and it just came back. Yeah. Let's do Oh, it's a static manifest, so I'll just kill it. Last time I called this thing, got bad stuff happened. Let's not kill it. We could just do CRC till, like, delete the board. Simple. Let's just yeah. Add an You touch it. Correct? We tried that. It doesn't we try to do anything.
27:36 So we'll add a label. It should, at least. There we go. Oh, there we go. I think you broke your own cluster now. No. There we go. No. We shit. No. This would take a little bit of time. We go. Nope. It came back up. Right? It takes a little bit of time. The watch is faster, but, it's here. Alright. So that's really cool to know. By restarting the API server, any core RBAC stuff will be fixed for you. That's awesome. I like that. And that was the unintentional fix you did with Phil, I mean.
28:20 I'll take as many unintentional fixes as I can. Wonder if it has any security implications. Right? I don't know where the default roles are stored. If we could actually, like, manipulate that, it means, like, if someone restart, like, a malicious something could come up. Right? Yes. Now if you check, most likely, every vending item is running now. Yep. Looks good. Fixed it. Yep. Easy. Could you check the Selium? One thing I didn't understand I mean, I killed all Selium, but they still have the network as the nodes as ready. With other network plug ins, you will not
28:56 see network ready. Yeah. CRM is probably doing more I don't know. Is it running in IP tables mode or, like, without the Q proxy? It is no. The Q proxy is saying IBVFS. I can I think I checked that? Yes. Oh, okay. Yeah. I think it's probably So you think it's basically works. Right? Yeah. Alright. Walid, have we fixed everything on this cluster? Yeah. Yeah. Everything is done. So, basically, it was two issues. Our back end assemble cubelet was down. Awesome. Well, thank you for that. I don't have good learning. I always fear the RBAC stuff, but it was we learned a
29:40 Cluster by Noel Georgi
29:41 lot in that one. I really appreciate that. Alright. Let's get let's jump into nulls. Well, each of you can get that session up and echo something else. It's terminal. No. No. Okay. So active sessions. Alright. Let's see. Got a couple of Join session. Magna wants to see Andy lose their cluster. We are gonna make that happen. Don't worry. And Mahmoud says, hey. What's up, Alit? Hey, Mahmoud. GitHub ing, man. That's what happens when you don't have GitHub. Alright. Let's get the basics set up. I really should just bake this into the provisioning steps. Alright. Well, it will give you the honors
30:29 of the get nodes, get pods, whatever you fancy. Okay. Complete. Oh, the completion. Yeah. We need the I still don't understand how batch completion works. It's a mistake. Oh, he didn't do it. Just a second. Yeah. I thought he did. No. I just did the a cube config export and a key alias. Okay. I don't think that worked. I didn't? It says dash command not found. I'm assuming no has messed with our kubectl binary. Yeah. He is. Oh. Oh, we haven't broke Okay. No. We have a broken control plane. So that means the completion doesn't work? Really?
31:28 Yeah. Because it Maybe? Yeah. It needs to talk to something, isn't it? Just gotta check something before we take. Thank you, Noah. Not that I don't trust you, Honk, man. How many counts? We're going to rec roll us on cluster. That is harsh. That's it. So this is looking for a count equals three. Yep. You might want to, like, close the session and rejoin. It messes a little bit with the terminal. So I think the session today is called don't trust. Or whoever gives you anything, command configuration file, don't trust it. Double check. Exactly. Yep. Exactly. Because I knew that the completion
32:34 command doesn't require a working server. So And that's why the completion didn't work. Yeah. So we need to download. Or you have You don't need to download. If you play I played nice. If you look in there, you'll find it maybe. Okay. Can we look at the script again? If there was nothing in the script, it tells where it was, and then I deleted it. No. It doesn't. But just grab for some similar keywords. Not cube, maybe cuddle. Just cuddle. Maybe honk. Honk CTL. Honk CTL. Yeah. It's a nice CTL, by the way. Okay. That's better. Or list list them by
33:27 time. We have a working cube control now. Okay. Good. Yep. That's fixed. So the first issue is fixed. It looks like I lost my connection. Oh, yeah. You'll have to join the other session because no work loaders. Oh, the Yeah. It's outside. I always do that. Oh. Okay. Okay. Why don't you go, Ravi? Go Go ahead. No. No. Thanks. You got the completion. Thanks, chat. We got no laughing. So we fixed the first issue or not? No? Complete. First issue is fixed. How many issues we have? Probably three or four. I didn't really count. So, basically,
34:32 like, when you said, like, before, right, we canceled the session. So that time, I actually got time to play with it. So I just had, like, a script ready and, like, did some breakage. I never touched it again. Alright. So that already looked wrong. Yep. It's trying to connect the local host 8080. So I think he's I think Noel has modified our I didn't think No. I didn't touch it. No. No. Sorry. Sorry. Sorry. You're not exporting it. Got it. Yep. Cool. Lewis, master rec role. Pardon me. What do we have? So we have one more. I don't like
35:33 by product of my speech. I didn't touch Yeah. Any of that group thing. So it might be a byproduct of the other breakages I did. So could be. I didn't touch it. That's Well, it looks like clustered and postgres are both running on the default namespace. So I guess we should just be upgrading clustered to see if we can get the new version. But let's look at this why it's not starting. Let I let's not waste time on talking about that. I don't think Node touched that. Yeah. Okay. Yeah. I didn't touch it. So yep.
36:07 So let's follow the usual things that we usually do. We're gonna roll it restart first. It's just called clustered deployment clustered. Yep. And if that works, we should just be able to upgrade it. It looks like a timeout. Just wait a minute. Yeah. Yeah. I wait wait for my time out because they can provide actually meaningful message. Failed to batch. Cluster is forbidden. Not yet ready to handle request. That's nice. That's new to me. So this is a very nice thing I found out. Yeah. It's nice. Like it. Okay. So And just to be what?
37:06 Nice. I didn't touch our back. Sorry? Just to be nice, I'm giving you a hint. I didn't touch our back. Yeah. Yeah. I know you are a nice guy, Noel. Okay. So what's the problem here? Let's get once again. Yeah. And okay. Let's see. So could this be an admission control? Mean, the failed if it was an admission controller, maybe. Let let's check for mutating webhooks and validating webhooks, I guess. Okay. Get Mutating webhook configurations and yeah. Yeah. The auto oh, yeah. K. Yeah. Queue catalog. Why is it not completion? No. It's called mutating. Well, the style is getting unmuting, and I
38:05 probably broke the completion because I'm a terrible programmer. That's okay, though. Wow. That's strange. Everything mutating that code configuration if you want to, like, type custom session? Okay. Oh, sorry. Can type it. Mutating that code configuration. Get. Yep. There we go. We got a honk. Yes. Can we edit it? See what what did he do inside? Yeah. I I think oh, sorry. I'll take it too, Balit. Oops. I forgot I'm not in there. Resent them. Bad. I don't know why the auto completion is not working, but, yeah, it's, like, tricky to work with all the extra CRDs
39:02 if auto completion is not working. Prior, they've been all swapped out bash for dash. That's what it is. I know. I didn't do a go that route. Okay. Let's see the Let me see and see from the rules. Create update. Oh, you're running it on your own server. Sneaky. Yep. I mean, I think it's safe just to change the failure policy on that to Avail allow? What what is it? I don't see. Fail or ignore? Ignore. I I I think ignore. Or we could just delete the web hook. I'm not fussy either way. Okay. Yeah. Let's just let's You might want to
39:51 rethink that. What? Why? Is it if you got finalized or done stuff? No. But it's something yep. Alright. Hold on. Let's just It doesn't look rotating. Webhook configuration. Yeah. Let's look at the documentation. It's fail or ignore, I think. But It's enabled here. Match labels, honk enabled. So name selector. Feature policy failed. So, basically Yeah. Ignore. Here we go. Oh, ignore. Let's make it ignored. Yep. Okay. And Yeah. If we just save that. It again. We should be able to do that rollout command. Yeah. Interesting. So basically, he's changing any deployment that we recreate or and he's doing it for deployments. Yep.
40:51 Well, I think we still got something else to find because this is taking too long for Woah. Yep. Why? This is I found this by accident. It does, like deployment in our cluster is forbidden. Not yet ready to handle request. Maybe okay. Think it might take some time. Or two before no. It doesn't. It's something's not yet. I'll give a hint. I'll wait for a minute, and then I'll give a hint. No. Those are validating. Okay. Actually, I just want to check. Okay. A cube control, get events, get events minus a, and all namespaces. No
41:40 resources found because it's over an hour, I guess. Can we go check out the static manifest for the API server to see if there's any other static admission controllers? Yeah. Checkpoint. Check the time. By the way, he didn't check the time there. But I think maybe he will do the same thing. He will touch them with the old time. I think that's gonna be a common name in all customers. Maybe. Most likely. I know you know him. Okay. So we have our back. And Search for admission. Oh, there we go. Admission. And Oh, yeah. It's just normal, though.
42:25 Or is it? Mutation admission web mode. Okay. What else? Okay. So I have mode restrictions, indicating admission work and validation work, which is okay. Oh, I think those are on by default. Thought you would think they need to list it. I just called yep. Just read through it and see if you find anything. There we go. Run time config. What's the run time config? Admins, mission registration. Okay. Man, you're thinking there. I think we can call it now. Yeah. Yep. So if you either make it true or delete it, the mutating repo could start working.
43:28 It's okay. They do this for the beta ones and the alpha ones. Yeah. And Yep. API server. It didn't do Yeah. It will take a minute. It will take a bit. Yeah? Yeah. It will take, like, around ten, twenty seconds. Yep. It's back. Okay. Let's try again. There we go. Okay. Thank you. No. Get bots minus eight. Oh, no. I don't need to. Okay. So it's running. The bot screen is running. You want to do your You actually missed by restarting, you actually missed one thing I put it in there, but it's fine. Well, we have an error image pool, so
44:18 it's not vain. Oh, okay. View control. Describe. What? So how do I do this copy? Or it should be clustered six seven. You could, like, select, select, right click, copy, paste. GSHR, Freshbow. Ah. Alright. So we need set our those Yeah. I we started No. So There's too much Rick Astley on this episode. Don't know. Lewis gave me a lot of Rick Astley. Yeah. He's a bad influence. Yep. Bad influence. Yeah. Lewis is saying the next episode of Cluster he is involved with will be named Untitled Rawkode. I'm signing up for it. Let's do it.
45:18 So so we should change this image to what? To Rawkode clustered colon v two. Is it where is it? Here? Also in GitHub? Rawk. Yep. Yep. Slash Cluster. Rawkode Cluster. And tag v two. Why did it move? Yep. It happens. But then I don't know why Vim does that. Like, I always have to fix it. Two more? Oh, one more. Two more. It's not fixed yet. Yeah. The image needs to go back. Yeah. One more level back. Vim does something funky when you edit it. Yep. Delete two more spaces so the hyphen is in left the c. Yeah. That's it. Mhmm.
46:02 Yep. Perfect. And you'll need to delete the oh, I mean yeah. You'll need to delete that. Oh, you might need to do a port forward on the running one. Oh, did you change the current image as well? Maybe. I thought forward. The postgres one, you mean? The postgres one? No. I think The other one that's running. Right? The one that's running, just your port forward. Yeah. Yeah. It it will not Then let's come back to the error image policy. Yeah. Let me pull up the terminal. Let's do that port forward so we can see what what all this dropped on us.
47:02 This is three seconds. I I oh, just a second. Roll out. We need to give it a different version. Was it version two before? Version one before. Version one. Okay. Did you change the portal? I had seen support, but I did something. Yeah. Just wait until it fails. You might get some idea. And Darren from Rancher is giving me this idea. From when? Darren. Right? From Rancher. So he was doing a deep dive on K3S, and I learned about that thing, why port forward is not working. Yep. Just hold on a bit so it actually shows you the error. Don't cancel it
48:01 yet. Oh, wow. So the error here, it says that it's the same image as before, but I I so we did the rollout. So the rollout should take us to No. Let's come back to the rollout the next thing. Otherwise, we would actually miss out a thing I did. So let's fix the port forward maybe first. I didn't touch the replica sets. He didn't touch that. Yep. So assess the error we should be seeing? Unable to upgrade connection? Yep. That's our that should give you a hint. Should it? Yep. I can see the error. So how does
48:48 bot forward work? Bot forward, it's reverse. No? Reverse. Yep. So how does it work? The the the message we got, I'll I'll just paste here so you can see it. When I do the port forward to my local machine is that. Bot forward is basically connecting to the directly to the IP of the bot. So it it will not use a service. It will connect Like, is doing the bot flow, like, from the host? It could proxy. Right? No? Yep. Just look at the error and the port number. You might get a hint. So 1025O
49:32 is the read only port number for the API server? The cube the q no. For the cubelet. For the cubelet. So this So how port forward works is, basically, it actually goes to hit that Qubelet API on 1250. It uses that port number to the port forward. This is also how Kubernetes does some kind of help text and stuff. So 12 1 0 2 5 0 should be open. And is it open? That's a question. Well, I'm gonna assume not. So you're made on the Cubelet? System control. Not nothing to do with the Cubelet. It's
50:12 just log in the board and Probably IP tables or firewall. Stock. Are you playing with the IP tables and stuff? Or UFW? Do you wanna just stop UFW, and then we can or or we could just check IP tables. Yeah. Yep. You're on the wrong host. This is a control key. It doesn't matter. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Get bots minus. Oh, wait. So which host it is? This host, which is WorkerRQH4M. Okay. There's a session on that device open. You can join it. Okay. I'll just take your IP tools command. Oh, look at that. I've got a drop.
51:03 Yep. Should I give hint hint on how to delete? Or are you I could just flush it, or did you persist it? No. I didn't process, like, just that entry. Yeah. You can just flush it. There we go. You might need to do it on all that work and not Yeah. It's working there. Yeah. Wanted to stay from host controls. Yes. Oh, doesn't it play? That's sad. Oh, it's it's it's Firefox. Yeah. Firefox, I don't know why it just seems to struggle. I know you need to operate a new port forward. I don't know why it happens.
51:51 Like, if a Firefox gets stuck I didn't oh, that's sad. Next time another Rectrol, by the way. Yep. But it was another withdrawal. Scream. Okay. So we've flushed the IP tables. We can now do a rollout. No. We need to fix this. Why yep. Try describing it again. That's the old image. Yep. So who is changing the image on the fly? We have the whole CTF team, by the way. We have Lewis. We have Andy. We have Magnum. Nice. So something is changing in the image. And it wasn't a mutate and wipe it because we deleted all of them. Right?
52:54 But it's it's it's still referring to the old system. Oh, no. We don't still need log in. Have Yep. Yep. Just delete it. It's fine. Yeah. We only told it we had allowed it to fail. So that was actually doing two things. You sneak It was actually valid endpoint. It was not, like, something yeah. I actually wrote a webhook. Actually, it's actually very easy to write one. I didn't realize it was very easy to write one. It, like, probably takes, like, fifteen to twenty minutes. Like, you can find samples and just modify stuff. Okay. Now you could probably do the rollout
53:34 restart or edit the deep silence. Need to connect. K pirate symbol. What happened there? Oh, I'm trying to get one. Yeah. There we go. Yeah. That was a brittle combination Strange. Issues there. Although we're still getting our image pool error. Yep. Why? We deleted the mutating webhook. Yep. This is Now we should have the right image. Okay. But the problem is Not found. Faint to resolve reference. Okay. So did you play with the container's configuration? Registry is like no. You don't have it. Go. You're on the right track. Okay. So you have maybe web web
54:54 what is it called? Okay. Can you get the resources? Let's get the resources. No. No. Go back. Back. It was right on the track, checking the container d stuff. Let's pop open that config. Okay. Config. Because in actually, in cluster, they don't have the configuration. Okay. Login. Next. So go into that search dot d. So cc containers, search of the and s minus m. Six. Man, I think with that. Just check the search dot d thing. Maybe you'll get a hint. Yeah. What was it? The search .d? Yep. Search.d. Just a second. Containers d. The search oh, it doesn't exist?
56:12 It does exist. Okay. And s. Not access. CT into it? Yep. That's CT. I am in the right one. I am in the control plane. But I am on the control plane. Yep. So where is CD's getting pulled? Sorry? Yep. You are right. Yeah. If we go through the worker node that we had opened earlier, but it there's a GHCR override. Okay. Let's connect to the other one, the worker node. Sneaking. So if you so this came out in continuity version one point five point five. So this is a new feature in one point five point five
57:13 where you could actually, like, set capabilities for a registry. So I had to actually upgrade continue to one five five. And it was just released two days ago, and I was actually using the RC version before it. So check You're off my Christmas card list though. I don't think he was in. So if you delete this file, container d would pick it on the fly, and the image would get pulled. Just delete the file. You don't even need to restart continuity. I delete the okay. I deleted it already. Yeah. I'm gonna delete the pods, and then,
57:52 hopefully that's the last little surprise Noel has left for us. Are we done, Noel? Yes. Yay. It's up. Alright. Nice one, Noel. But still there there too. I don't I had no idea about that container d change. No. I didn't know that. I think someone tweeted about it or something, then I saw red with the restarts and found it. Alright. Last cluster then. You two are up. Ah, I think that was the application is working. I had another ID based on 01/21, and it was a hint from Rory, but I didn't I wasn't evil enough to do
58:40 it. You might want to check whether this is also working. The port forward, there's one more screen is here. Okay. So now we connect to another cluster. Yeah. Hold on. What no one wants us to check. I have one more little snarky stuff left there, and you would be good to go. That should be easy to yep. Okay. Back on we go. Alright. Yep. It's a easy fix. So yeah. So Yeah. It's working. I failed to connect. I created that. I clicked this. Let's check the services and endpoints. I'm out of the cluster. Let me connect
59:40 again. This is no. Join the session. Yeah. Okay. We have an endpoint. Maybe check Call DNS? Let's I'll just do a change. It's much easier rather than letting you go down to the epic call. Call DNS, twenty six hours. Yes. He did change something. Still do it from DNS because we it was covered, I mean, in breadth and depth. I didn't touch this? No. You didn't touch that. You just modified the deployment. Maybe. Most likely the c m and d m. Like, creation map. DNS rolling update. Oh, I'm assuming this dog goose shouldn't be here. Goose.
1:00:52 Investigating Pod Pending State (Cilium/Postgres)
1:00:54 Is that the right path, Nel? Yep. That h c call file. That's the DNS. Alright. Oh, it's just Etsy d call file. There's no DNS in there. Thank you. And if you scroll down, you can actually look at the original call like, the modified call file. Just take the volume bounce. You would get an idea. Alright. Minute, Rawkode. I don't need to even delete it. It doesn't matter. Yeah. It will update that file. Yeah. I see what you did. Okay. Yeah. I just need a few cuddle patch and that's it. We may have to rotate the accordion f
1:01:38 parts. Oh, it's probably because of the you are just delete that hotspot also because I think we had we haven't typed on the yep. Call DNS. Yeah. It's the your DNS. It goes to the volume. Alright. Yeah. That's it, though. We go. Turning one. I think we probably need to, like, delete the port for it to pick up stuff. I don't know how really the DNS starts working. Tell you what. I'll just delete the whole cluster. Oh, yeah. Crash the I was actually trying to do the You know what? I'm just going we'll we'll just ignore this. Let's move on to the
1:02:00 Cluster by Rawkode
1:02:33 last cluster. Yeah. Let's ignore it. At least that was the issue. It should have fixed it. Alright. We have we have time. Correct? Yes. We have Yep. You you have twenty four minutes to fix my cluster. I've I I didn't make it too elaborate because I wanted it to be a bonus cluster. You know, I wanted to focus all of our time on both of yours. So I don't think it'll take you too long, but if you need any hints, I will I will let you know. I've opened the session on the control plane. You can both
1:03:01 join, and I'll just sit back and laugh. Cluster. Yep. David is going to have a lot of loss today. This is the first time I've broken a cluster session. Excited. And it's really hard. All the experience of all the bakers. Yes. I I I did respect Vale's wishes, and I stopped almost entirely to kube control. Almost. Cool. So my DNS hack. Right? I was actually trying to do a DNS hijacking, but that doesn't seem to work with CDM. So I just left out that box. That's why I just, like, went with the whole DNS thing.
1:03:46 Alright. Let me connect them. Activity. Activation. What happened? Oh, you might need to just refresh. I did not. Sorry. It's okay. Deshaun is saying if you wanna live on the bleeding edge of container d, you can use 155 1555, but you make a cut. View control. Get notes. Let's see if we have an API server. Yep. And it is timing. Oh, no. We didn't oh, no. We didn't export it properly. Right? Oh, yes. Only at SQL where it is. I did export it, but then you override it. Oh, I'm sorry. So we are clustering ourselves here.
1:04:38 Yeah. Okay. Okay. But it takes a while. Yeah. I think okay. Now it's cached now. Now okay. I like being on that site. At one point, I actually thought of doing some network shaping. Takes a while. Then I got it. The second one? To get it fixed. Sorry, Nolan. I was speaking while you're speaking. But it does take a while. Let me remove the cache, and let's see again. Oops. Do you want to take control, or do you want me to take control? You you drive. Go ahead. That's fine. Okay. I just want to remove the cache and
1:05:28 see again. Oops. Go ahead. Go ahead. Okay. And try again, but do it with verbosity if you can do some verbosity. Okay. This is going super fast. I think it's going fine because the response is pretty fast. It must be, like, some I know. Okay. So why container is creating? Can you describe? Know it's like, detect your cuddle get not, and, like, panic forever. Kevin is asking how evil I can be. I haven't been too evil. So it is all cube control? Almost. Okay. Let's check the network policies. Oh, the API was modified. So at least during case, the same steps
1:05:30 Identifying Cluster-Level RBAC Issue (Can get, cannot describe)
1:06:25 like I did. Yeah. I sent the same steps to my birthday, and I'm one in the future just just for the. Of my test clusters, I actually set the time forward. No no restrictions only. That was a bad idea. Admission. And it looks fine. Can you go down? Lives. Lives. Just a second. It's not lives. Oh. I think it's healthy. It's health. Health. I I don't know. Is this? This is the liveness. And the other one is ready. Ready. Just for rest of the I never changed either of those just for the record. I didn't change either of those.
1:07:22 Really? Oh, yeah. Okay. Okay. Why is this look at the request. And 02:50PM, that seems fine. It's okay. 02:50 is okay. Oh, the failure threshold is 24? Is that, like, normal? So after 24 failure threshold, we get something. Yeah. I think that's fine. But but it the bot restart, we didn't see that. I'm assuming there is I okay. Before checking this manifest, let's check if there are restarts or something. Yeah. Yeah. There is. Oh, yeah. House snipled through. The scene's fine for almost everything. Let's get our facts correct first. So we have a container creating, but it's not creating.
1:08:29 Okay? And I think we have some slowness. Shouldn't we worry? Yeah. We have a slowness, which we should So we need to figure out the slowness, I mean. Yeah. You you can't describe the project. Right? You you don't know why it's not in container creating. Yeah. I can do that. Just before that, I'll just check whether x c d is also fine. And hundred m? We can check from the notes. I think it's a little bit low. Let me just copy to at least, like I just matter, I guess. I'll just create it without any changes,
1:09:06 and share this Yarekuddle. Run time. And point units. So which problem are you trying to solve just now? That's what I'm saying. We don't know the problem. So the only problem we see is the container are still creating. And I think there is a slowness also. Yeah. I'm working. Yeah. I I mean Let's look at the slowness first. So you have everything running sixteen hours. S three. Sending update to Oh, look. Hi. Our Google namespaces. Nice. Oh, I had it in my notes to actually install XNS controller and do something, but then I thought it's, like, too much work.
1:10:07 Yeah. I think you need to decide which problem these wanna fix. I think you're both working on different problems right now. So pick which one you wanna fix first. So there is a context that line exceeded that's happening, and it says custom resource definitions hierarchy list would be done. I don't think this is an issue for now, But we probably need to fix the alright. Oh, okay. He's laughing. It's just nice being on the safe at all. But but it's admission control. It's denying all modification. Oh, so it must be that admission control is typing out, and then, like, it's set
1:10:50 up created policy is set to ignore, and it's actually typing out. So that's why it's taking a little bit there. Can you describe? Sure. Let's no. Can you describe the bot? Yep. Ah, it's better. At parts, I hope it returns faster. I actually knew that list. Thanks for sharing. Wish it was something better. That slowness is really killing me. Yeah. I think the slowness was a nice touch. I really enjoyed that. Does this happen? Did I Did Okay. Can we get events? Oh. I mean, it's a Oh, it didn't get scheduled. So yes. Let me check if the scheduler is running.
1:11:06 Debugging RBAC Configuration and Roles
1:11:49 Yeah. It's running. You start to get emails. Right? Did you know events are stored as resources in FTD and can be deleted? Yeah. They can be deleted. I think they get deleted anyway, but after an hour or something. There is a time for it. There is a threshold by default. I've not left you any events or logs. Sorry about that. Really? I can send oh, it's called validating web config. Right? Evaluating web config configurations. Yep. It could can't they have anything like v, w c, or something? I need to resolve this timing problem first. Yep.
1:12:48 Okay. That's good. Try the mutating. Yep. This is so much fun. It's good to enjoy once in a while. I want to check the logs, Noel. The port? Okay. Yeah. Let's go. Yeah. David said he deleted the logs, but we can get them easily. Yeah. So we need to restart. Okay. So That was a cruel touch, I must say, but I joined. No. It's okay. You cannot get rid of the logs. I think we should just restart your play. This should do the trick probably. Not yet. You know when when you restart the Qubelet, what happens?
1:13:58 You play the Ah, if you restart it if you restart it quite fast, sometimes it would actually cause issues with continuity and, like, gives you weird errors. I just learned it yesterday in the talk. So if you keep on restarting QBlack, content will start to, like, be errors. Okay. So things are up. Yes. Things are up, but I don't see logs. They should start logging already unless he changed all the logging. Yeah. Oh, it's a container deconfiguration? So where's the configuration for logging? I think it must be the container d configuration. The back is what is it?
1:14:47 Is it how do I know the files? The back is query, is it? Oh, I don't know the You can run container d config dump to see the config. Yep. Container d. P. A space Oh, I know. It's just containerds. Yep. Config config dump. A space dump. Config default. Oh, you just started containerd. Yeah. Delete the control c that. You're cool. Yeah. It's config space dump, I think. Space dump. Percentage one. Yep. So percentage one is, the previous process. And it's doing that. It's like time reading the container data configuration. No. No? There's nothing in regards to
1:15:45 Okay. Run container d sock. Okay. I want to try something, actually. Find var. Oh, let's take a permission of that folder, and maybe it's not able to write to that container var log container d. Okay. That would be too easy now. You You said that it would be too. And let's add a I didn't modify any attributes on that directory. So just the permissions? Yeah. It's just it's just permissions. I'll give you the you you got it. Yeah. So it's not able to write. Yep. Where's the little log. Ah, yep. Yep. It should have w two actually do
1:16:46 something in there. See, sometimes it's the simple things. Yes. We always ignore the simple things. Not yet. Yeah. I think restarting a easier. Not yet. Yes. I got it. Manager is up. ABI server is up. Yeah. So because the log file can't be created when the container starts, you'll actually need to rotate all the pods. But you don't Let it say I gave you a head. You I don't think you need the logs to fix the other two problems. Okay. So what are the two problems right now? What is something said about I need to check.
1:17:22 Finding & Using the Admin Kubeconfig (.honk hint)
1:18:05 I need just to check anything with minus v. Let's say Yeah. That one should have done in the first place. But okay. Accept. This looks fast, and there is no one? Okay. Let's try with 11. 11 is really verbose. Wow. Okay. So account application. Q1820. 5. 9 Seconds. Okay. So and so Should we check at CD help? Have not touched the CD. Okay. Thanks. I wouldn't tell you. I want to get AI resources. Is it AI resources or get AI resources? Yeah. AI resources. That is also yep. I want to check these things. What happens? It's taking, like
1:19:41 I think it's taking time. So the API server is, like, taking a long time to respond. One of the reasons could have been the limits, but the limits what are the default settings in there seems fine. So thank god. That was something that I think Jim did. Right? Changing the limits. Which is, what the hell is with the intermittent slowness? Wait. Yeah. No. No. I mean, slowness could be network, could be disk. I'm guessing it's network here, But I wanted to see I wanted to see now it should be cached. So it should be faster. It's not faster while it's cached.
1:19:56 Debugging the Kube Scheduler (Pods pending due to scheduler)
1:20:33 How come? Or some or some governance, some checking. But we check the mutating hooks. Correct? We check the mutating. We check the validating. Couldn't find any mutating or validating. It might be that, like, a static one rather than oh, but I don't think we see the mutating or that we should that talks enabled in the manifest, so I think we should be okay. By default, are they enabled? I don't know which version you called it. Oh, yeah. This is something about the HNC, hierarchical namespieces. So the intermittent slowness the is the one thing that's I didn't do with cube control.
1:21:30 You promised, man. One thing is and it's not broken, and I finished just slowed you down. That's it. Okay. Okay. Okay. So you're still not I mean, the real problem is the container creating. The slowness was just for my amusement. That was all. Let's see if there are drops. So nothing. Okay. So it didn't put it. So there is an alias. Is this normal, the alias? Yes. Okay. The problem when you said the verbose control get pods, it was there. The verbose, the 11. I I was there with seven too. It's just subtle. Okay. Let's
1:22:24 yeah. Let's go one by one. Yeah. Loaded from Etsy. Get API version namespaces default. Okay. Request header looks fine. User agent response status. Now setting didn't show us anything, does it? It is Kubernetes admin comp. Yeah. I wonder if the audience can see it. The board, 6443. This is 6334. Oh. So did you set up, like, a IP tables forward? No. It's a bit more sinister than that. Let's look at the manifest and see which part it's listening. So so do we have 643 or no? No. Let's look at the cube control manifest sorry. The API server manifest and see which
1:23:26 part it's also listening. Because this is working. Right? That's the Yes. Yes. That's better now. So okay. Now how did how did you do it? Yeah. That should be some kind of, like, a proxy or something that does support code. Right? Yeah. It's a it's a proxy. Run p s a u x. Throttling proxy. It's Toxiproxy with a 40% toxic toxicology on delays of two and a half seconds or longer. What's the name of the proxy? It's called SE Linux. SE Linux, man. I renamed it. Yes. Yes. Yes. Yes. Actually, we would have caught it if you
1:24:06 listed the process and look because this is a devian system and there is no way SE Linux could be in here. Oh, yeah. I thought it would be a nice hint, but then you got to like, if you don't look at the Yeah. You never let it do. Yeah. Alright. There you go. You fixed that. Okay. So this is one issue. Now what's the other issue? The the containers. The containers are still Yep. Loading. See. Okay. So the con we have the containers still creating, and this is create container error. Kube scheduler. Ah, scheduler. And okay. So the scheduler has an issue,
1:24:18 Identifying Missing Scheduler Role
1:24:51 the control manager also has an issue. Okay. Let's describe it. So you can ignore the controller, ma'am. I think we fixed that shit. You're right. Could it just Yeah. Got it. But it's okay. I have to go watch it. Wow. And if Lewis is watching, please don't do this for the next CPF. Yeah. Cluster road binding. Cluster road binding. I have not modified any RPac policies. Any roles. Oh, you haven't? No. This must be, like, a static admission controller that must be set up in the manifest. Just a second. Just a second. Just a second. Yeah. So the error
1:31:09 kubectl Binary Hijacked (honkctl alias found)
1:31:23 so the error is saying error from server forbidden limit ranges case is forbidden. Admission control is denying. It is an admission control. But the thing is that his that the ABI server, if we look at admission, I have only node restrictions. Okay. Node restriction is the cube plate to cube ABI things. Correct? Isn't isn't the static admission part of the cube plate config? No. It's 20. It's 24 I don't know. Yeah. Yeah. I am not really I have really old questions. Oh, any plug mission plug ins always deny. Yeah. So I figured if I just one
1:32:04 Identifying the AlwaysDeny Admission Controller (Discussing finding the second plugin)
1:32:09 of the kills well, not cool. One of the annoying things with the API server is you can do admission these plugins separately. And I figured if I just put it below the bottom of the screen, you wouldn't notice the second one. Yes. Yep. Because I checked that first time we opened it, I checked. Okay. Let's give it a minute. Yeah. Still not up. Oh, no. It's restarting. Yep. Okay. Hey. It's up. Yay. That's get the bots. Let's delete it. I think we should probably just delete. Yep. Yes. Oops. Too many olds. Too many else in
1:32:22 Fixing the AlwaysDeny Admission Controller
1:33:00 old. Yep. Do we have another surprise, or that's it? No. No. That was it. You got them all. That's a nice one. I love that one. Oh. Because it's cascading. You know? But it's strange. It's not always been so it only applies to oh, it only applies to create or update operations. Yep. Because all gets Yeah. So the limit range actually my idea first. Go ahead. Yeah. The the limit range is just one of those resources. I don't think a lot of people know. I think Walid was very fortunate to pick that up. I'm not sure everyone else would have.
1:33:30 Initial Pod State & Rollout Failure (Forbidden)
1:33:36 What do mean unfortunate? No. No. I think you were I think you're experienced saying there that you knew a limit range was a thing. Like, I think many people could have had no idea. The intermittent It's that's what open shift exam, actually. The limit range is right. It's actually in the syllabus of the open shift exam. So I have played with it, but it does as a curriculum, then I I never did it in any actual systems or anything. It's anything, but I think no one really uses it. You got a a well done from Deshaun
1:34:03 there. So yeah. There was And when I was actually more breaking the cluster, one of my ideas was when you do a GET, I was thinking, like, of doing a GET your dating report, but I didn't realize that we couldn't do a GET on the rotating report. I was actually, like, showing you the right image, but it was actually running around image. Yeah. I mean, I mean yeah. Let me go with it. One thing that I'm happy today is that we made David happy as Lewis said. Yep. He has been miserable with this cluster thing,
1:34:37 but now I think today he's happy. You know what? I I think we all did really well. I think it's three broken clusters, all broken slightly differently. We all stuck mostly to the cube control plane, which is nice. We got to explore a bit more depth. I think that was a good job all around. And we didn't really kind of finish it in time too, so it's actually good. Yeah. We're only a little bit over, so all good for me. Yeah. A little bit over. And if it has not been all the stuff I did, I think, like,
1:34:58 Conclusion and Next Episodes
1:35:05 if I just stick with probably, like, remote IP tables on the bot for, we would have probably finished it in time. Alright. Awesome. Well, thank you both for joining me for part one of this KubeCon special. There's two more parts. We've got Thomas Stromberg and Chris Nova joining me this evening. And tomorrow, we have Gifei and Chris Carter. Chris Carter. Wait. The second one is today? It is. It is in a couple of hours. Yes. Yeah. So I'm I'm gonna let tomorrow. Cool. I'll let you get back to Cape Cod. Thank you for joining me again. That was
1:35:37 absolute fun. I will see speak to you soon. Thank you. Thanks, everyone. And I'm happy Thank you for having us. Bye. Thanks, everybody. Bye bye.
Technologies featured
Meet the Cast
Stay ahead in cloud native
Tutorials, deep dives, and curated events. No fluff.
Comments