About this video
What You'll Learn
- Install Cilium on a Cluster API Kubernetes cluster, then validate deployment state and initial pod readiness.
- Troubleshoot connectivity and IPAM errors, including range-full conditions, by reading Cilium logs and adjusting config.
- Demonstrate Cilium features like eBPF replacement for kube-proxy, pod-level identity in Cilium Endpoints, and Hubble visibility.
Ilya Dmitrichenko joins David to install Cilium on a Cluster API cluster, debug IPAM and stale CiliumNode CRDs, and walk through eBPF-powered kube-proxy replacement, L7 policy, and Hubble visibility via the Star Wars demo.
Jump to a chapter
- 0:00 Holding screen
- 3:05 Introductions
- 3:06 Introduction and Plan (Install First, Explain Later)
- 4:06 Initial Cilium Installation Attempt (Quickstart)
- 4:30 Installing Cilium with the Quickstart
- 6:53 Troubleshooting Installation Issues (Wrong Cluster, Initial Crashes)
- 8:50 Running the Cilium connectivity tests
- 8:56 Running the Cilium Connectivity Test
- 11:41 Debugging IPAM Configuration (Range Full Error)
- 11:50 Connectivity test failures: IPAM range is full
- 15:00 Changing the IPv4 CIDR
- 15:26 Modifying Cilium Configuration (Changing IPAM CIDR)
- 16:00 Deleting the Cilium pods to force a config reload
- 16:16 Re-applying Configuration and Further Debugging
- 17:56 Connectivity Test Still Failing
- 18:50 Using the Cilium CLI to fetch Cilium status
- 20:20 Lets just delete everything and start again
- 24:28 Switching Strategy: Using Helm for Installation
- 26:26 Attempting Installation on Minikube
- 26:30 Lets try minikube ...
- 30:30 What is the Container Networking Interface (CNI)?
- 30:41 Understanding CNI and Cilium's Capabilities (eBPF, Kube-proxy, L7 Policy)
- 35:00 Advantages of Cilium
- 39:30 Deploying the Star Wars demo to our cluster
- 41:00 What is Hubble?
- 41:17 Introducing Hubble (Visibility Tool)
- 42:00 Back to Star Wars demo
- 42:57 Attempting the Star Wars Demo Application
- 43:30 Debugging Cilum Endpoints / What are Cilium Endpoints
- 45:30 Minikube Troubleshooting (Docker for Mac Driver Issues)
- 45:50 Lets delete minikube and start again
- 48:00 Now CoreDNS can't get an IP address
- 48:10 Returning to Packet Cluster Debugging
- 49:00 Lets go back to our Packet cluster
- 53:06 Advanced Debugging: IPAM and Lingering State
- 55:00 Deploying Cilium again, this time with Helm
- 58:00 IPAM range is full
- 1:06:00 Lets reboot all our nodes ...
- 1:06:19 Troubleshooting: Rebooting Nodes to Clear State
- 1:06:20 Summary of what has gone wrong thus far
- 1:12:14 Cluster Recovery and Post-Reboot Checks
- 1:16:00 Lets delete Cilium node CRDs
- 1:17:00 Cilium is working ... but DNS isn't
- 1:17:24 Troubleshooting: Clearing Stale CRDs (CiliumNode)
- 1:18:58 Final Cilium Pod Restart and Verification
- 1:21:13 Connectivity Test Results (Starting to Succeed)
- 1:21:54 Re-attempting Star Wars Demo
- 1:25:01 Troubleshooting Demo: DNS Resolution Issues
- 1:27:31 Conclusion and Future Steps (Addressing Remaining Issues)
Full transcript
Generated from the English captions. Timestamps jump the player to that moment.
Read the full transcript
3:06 Introduction and Plan (Install First, Explain Later)
3:06 Hello, and welcome to today's session. Today, we are taking a look at Cilium, a CNI implementation for Kubernetes. And I am very fortunate to be joined by Ilya here, who is a member of the team at ISO Valiant, who are the creators and maintainers of Cilium. Hello, Ilya. How are you? Hi, David. How are you doing? I'm rather quite well. I'm doing very well as well. Thank you very much. A lovely autumn day here in London. Well, yeah. It's getting pretty cold up here in Scotland too, actually. I think this is the coldest day we've had yet. Like, I'm
3:40 actually I'm gonna put the heating back on soon. That's where we're getting to now. That's right. Yeah. Yeah. So I think we're gonna break the formula a little bit today because I had done a little bit of preparation, which means I have used the cluster API to spin up a Kubernetes cluster. However, it's currently in a pending state because a cluster cannot become ready until it has a CNI implementation. So what I wanna do is we'll jump straight in to deploy and install Cilium. And then we'll jump back and we'll have a little bit of a conversation around what
4:06 Initial Cilium Installation Attempt (Quickstart)
4:14 Cilium is, why it exists, etcetera. You alright with that plan? Sounds good to me. Awesome. Alright. So that means we need my screen like so. There we go. And we pop over here. So as you can see, I have run cube control passing on the special cluster API cube config. I've ran get nodes and you can see we're still in our not ready state. So that means I have to install Cilium. Mhmm. But I assume that if I ask you how to do that, you're gonna tell me to go to the documentation. So That's right.
4:30 Installing Cilium with the Quickstart
5:02 There may be a guide for for Packet that you could have a look at. So I would say this is a self managed Kubernetes. So. You can see I've clicked on a few links already. I think I did. Yeah. And then managed etcd. Alright. So do you recommend Helm? Do you recommend something else? Yeah. So I think you could either go with Helm or Quickstart. Quickstart is basically just the the default kinda Helm setup. So it's just like the Helm chart rendered with Helm template. So, yeah, I think using Helm is fine. I don't think you wanna be on the
5:46 manage that CD page. No? Yeah. You wanna go to quick start. I think quick installation. Yeah. Gotcha. Okay. And I think there are two options. There is a flat manifest and there's a helm option from what I remember. There's just a flat manifest. Is there a helm option down below? No. Look like. Okay. Fine. Let's just try that. That that that should just work. Just work. That's my favorite sentence. That's the purpose of this flat manifest that it just has all the defaults. Alright. So we have a service account. We have a config map. We some r back
6:26 configuration. We've got a daemon set and a deployment. Oh, a Cilium operator as well. It's it's Cilium. Yeah. And it goes into the cube system namespace normally like this. Alright. I guess we should validate. Yeah. I I like this actually. I hadn't noticed this previously, but copy first line instead of copy all. That's pretty sweet. I like that. Goodness. Yeah. Alright. So this runs a cube system get pods. Oh, no. We got a crash lit. Hey. Let's have a look. It will probably tell us what I mean, it should tell us what what is happening. Is it just
6:53 Troubleshooting Installation Issues (Wrong Cluster, Initial Crashes)
7:03 because the operator is not ready yet? No. No. Yeah. The operator is not required for for Cilium to start. Alright. Let's take a look at some Yeah. I've not run Cilium on a packet. So, you know, it's a thing. Okay. So it's in the initialization. Oh, I'm not passing. Oh, I just deployed that to the wrong cluster. Oh, have you? Deployed that to Docker for Mac. Oh, that's why it's crashing. Okay. Yeah. It does crash on Docker for Mac. Sorry. Okay. So we need my special cube config. Copy. Quick start dot cube config. Is it
7:50 intentional dot company? Or is it Oh, no. It's it's definitely just me. Generally, when things go wrong, it's always my fault. So it would now install it to the correct cluster. And so we now need to go for the cube system namespace and get pods. Right? You need a you need, like, a speech bubble there, David. It's always my fault. It's always my fault. Yeah. Okay. Goes wrong. That's looking better. Right? That says running now. Not healthy, but running. That's progress. Oh, sweet. Yeah. Sweet. Which should stop. I'm sorry to worry. Sorry. No worries. So if you if you look at the
8:31 logs of one of the Cilium parts, just to verify that, it is truly happy. K. Let's grab one here. Yep. I mean Yeah. I'm assuming that looks alright. So I can now run that Yeah. Yeah. Validate one. Right? Yeah. That's right. Yeah. So there's I we did that. Okay. Yeah. The connectivity test. Oh, that one. Alright. Okay. So this is applying a manifest? That's right. And it's just, like, a bunch of pods that that try to talk to each other. So there's like a an echo server and and some crawl based readiness probes. Alright. I'll trust you. Let's let's just copy
8:56 Running the Cilium Connectivity Test
9:21 and paste. Only I'll remember. But I should just export that environment variable for my cube config, shouldn't I? I'll do it after. Does that apply? Alright. Oh, no. Dot cube config. And that's gonna get annoying quickly, so let's let's apply. And then oh, it creates a lot. Alright. And then export cube config. There. I've I've got it in my history already. Okay. Nice. So let's just run get pods, correct cluster. Good. I'm happy. Mhmm. And this is keep system there. Right? So if you do get pods minus a, capital a yeah. So the default namespace now has those
10:15 connectivity test pods. So we can see that they're still creating. Alright. Let's just run a rewatch. Yep. So I guess that's just pulling images right now. I'm gonna assume this is hopefully quickly gonna start parallel. Yeah. Yeah. I think it's fairly quick. Time to drink coffee. There we go. There's one. Alright. So we got so this connectivity test is is it testing network policies based on some of these names? It just it just test connectivity within the cluster and and outbound. Yeah. Because I see pod to it denied CNP. Yeah. I think I think there's there's one
11:14 policy tested that's CNP is a Cilium network policy. And and most of them are, you know, pods on different nodes, pods on the same node, pods to Google, pods to some other fairly reliable Internet addresses. Alright. So can I'm gonna do one thing, and that's just confirm that these are not stuck pending. Yep. Let's just check that then. Anything, and then we can I I actually do the the conversational part where we talk about Cilium? So so let's just describe that pod. Oh. Post IPAM failure range is full. Okay. So let's have a look at
11:50 Connectivity test failures: IPAM range is full
12:03 Cilium pods. Okay. Yeah. Okay. So they're running. Operator is also running. Do you want some logs from from what does the setter mean? Can we start with that? What does that mean if it's unable to allocate IP via local Cilium agent? Range is full. Right. So well, this is, you know, one of the so there are multiple different IPAM modes in Cilium. And the quick start picks the default IPAM mode, which I'm just going to double check now. Or you can double check it, actually. If you if you if you grab that Cilium quick start manifest.
13:05 Yeah. This one here. Right? Yeah. Yeah. So if you look for IPAM The pool? Cluster pool. Okay. Yeah. I think that has certain implications. Like, if you look in Cilium docs, the IPAM section, I think, basically, we just need something else for it's not so much packet, actually. It would be cluster API. IPAM. Your Azure page for some reason? I'll just click on documentation. I just Yeah. This go away. Didn't find anything for iPads. Oh, okay. So Yeah. Yeah. I would say that. I would say that. I'm not sure. Advanced networking, maybe? I've done cluster pool. So okay. I'm just
14:19 looking at the documentation for cluster pool. So we can check that. Let me So I've I've got two IPAM modes here in the docs. CRD backed IPAM or CRD backed Cilium cluster pool IPAM. Yeah. So let's have a look at cluster pool because that appears to be the one that we are currently using. Oh, yes. So, yeah, we need to check cluster pool IP before pod CIDR. So if you look back at the at that manifest Mhmm. What do we have here? Cluster pull I p v four CIDR. If you just go back to where I
15:00 Changing the IPv4 CIDR
15:04 found was. So it's 10 slash eight. Okay. And then p four mask size 24. Okay. Well, that's well, this should work unless you think this might challenge. That's clashing with the packet private network. Oh, okay. So if you if you basically grab this manifest and store it locally and then We'll make some changes. Yeah. I think we impromptu debug hackathon. So That's right. So if I search for IPAM, we're just gonna change this to be 68. Yeah. Yeah. We're alright with that? Yeah. And two slash 24 for the nodes is just fine. So the second parameter is how much a
16:00 Deleting the Cilium pods to force a config reload
16:00 node gets. So this is good. Alright. So do I need to uninstall or can I apply this over the top? Up apply and restart Cilium manifests Cilium pods. K. So I can do a Cube apply Cilium. Yeah. Yeah. This is fine. And then you want me to delete and restart those? Yeah. Alright. So cube system, delete pods. Can I do a label side? Yeah. So Kate's up. Cilium. Yeah. Yeah. Nice. And I forgot if Cilium operator had the same label or not, but I think you might want to to, yeah, to kick Cilium operator as well.
16:16 Re-applying Configuration and Further Debugging
17:01 Maybe. Yeah. I just believe it's two. Yeah. There's just two of them. Alright. So in theory, give that a minute, and we should potentially see some other ones here spanging up as the IPAM IPAM error goes away. Yep. Let's see how Cilium looks. Alright. Let's just pull down some logs. I think this seriously works now. See that? Yeah. Let's let's have a look at the at the connectivity test pods. I think they started coming up. Okay. No? Still still failing? They might just need to check as well, actually. They could be Oh, yeah. They could be still It's just
17:56 Connectivity Test Still Failing
18:05 kind of a limbo as it were. We'll delete the echoes and we'll see Mhmm. If that little nudge helps. No. They're not describing that. Well Still the same error. As we could always uninstall Cilium and Mhmm. Reinstall. Yeah. So we're still getting the same IPAM ranges full. How can we confirm that change took place? Yeah. So we can yeah. That's that's a good question. I think if you exact into this one of the Cilium parts. Yep. Grab this one. And you can run you could just run Cilium command. Yep. And it has a bunch of
18:50 Using the Cilium CLI to fetch Cilium status
19:19 things. Yes. For example okay. For example, take a look at node. Cilium status, actually. What are we saying? This 10 seems to still be here. That's And this. I okay. Right. So that's And there's there's a config map mounted in the Cilium pod. If you look at mounts I forgot what it was, but there's a there's config map mount. Yeah. That's a TMP sitting config map. Yep. And what does yeah. IPAM is still cluster pool cluster pool IP before CIDR. Okay. Let's just delete on the entire thing, and and then we'll just put it back up and see what
20:20 Lets just delete everything and start again
20:29 happens. Okay. So let's confirm. Quite over yet. Mhmm. Give it a minute. I wish you have to tell a joke. I don't have any jokes today. I I did hear good news. You probably saw my tweet, but not necessarily all of our the the attendance watched saw that tweet. And the they they they found that dogs can detect coronavirus. Oh, wow. Dogs detecting coronavirus. Sounds pretty cool. I mean, can you imagine, like, basically, within ten seconds, they can tell. Wow. Yeah. I'm impressed. Yep. I'm I'm pretty sure in the past, I've read or or seen a story about dogs
21:33 able to detect cancer as well and even Parkinson's. Wow. So I'm I mean, I guess I'm not entirely surprised, but still astounded. Okay. So let's see if Cilium is back up and running. Not yet. There's some parts that are pending. And I'll what I'll do is I'm gonna grab this network connectivity check. Fail. Okay. And delete that those pods and then just redeploy them again. I wanna give it the best opportunity I can. Mhmm. Where was that quick start guide? There we go. Yeah. Self managed, quick installations, scroll down the bottom. We want this one here.
22:26 So I can do a key delete. Mhmm. I've got a QPS detail command. They're all gone. Now should we confirm the IPAM range again? Or are you confident if I just apply this that we make it a bit further forward? Well, let's let's let's double check. Let's just, like, play safe. Alright. So you wanna exec into one of the Cilium pods. Yep. Keep system l, Kate's Epsilon. Oh, yeah. Yep. And bash. I can then go to cam cilium and cluster poo I p four cider. Alright. I mean, that Okay. That's that's good. And cilium status?
23:19 Oh. Yeah. That's Maybe maybe it reads this from from also okay. Well, I have a suggestion from my colleague here, one option that we could try. Okay. But could you could you also just list files in this config map real quick? Yeah. Is there Enable PPO. Uh-huh. Let's check here. I'm just looking out for a native native routing cider. It's not here. Shall we shall we switch to Helm now? Because, clearly, quick start doesn't work in this. So quick start definitely works on Minikube, for example, as is without any modifications. But, clearly, here, we we'd we'd benefit from using Helm now.
24:28 Switching Strategy: Using Helm for Installation
24:28 This Helm has all the all the options, and the suggestion that my colleague just made refers to a Helm value. Alright. So delete delete. Okay. We're now c and I list once more. Mhmm. So we're gonna deploy with help. Yeah. So, yeah, if you go for yeah. If you just with manage that CD, I guess. Right. So but you don't you just, yeah, just don't use those as CD flags, basically. Just you don't care about those. You said version and namespace, and then and then any values that you might wanna specify. And I think we'd be looking to
25:21 use Oh, I guess we should grab the default values. Right? That's right. Yeah. Let's just make sure I'm in the right place. Yeah. Oh, there's no values there. Alright. I guess I can just Yep. Download it. Oh, wait. The values file would be in the in the repo, maybe. Yeah. So if you go to the I mean, default values will will be there anyway. So but what I would suggest is that Search for helm now. Should I just use minikip? Oh, we could we could run it on minikip first. Yeah. Of course. I think it
26:30 Lets try minikube ...
26:32 is possible that the cluster API setup will need a little bit more work. Alright. Hopefully, that won't take too long. That should just take oh, do I need to destroy my old mini cube and pass in some sort of flag for this CNI or just apply some Good question. I think, Keith, you might want to I'm not entirely sure if mini cube enables CNI mode by default at the moment. As in it didn't used to do that, but maybe that had changed. Let's take a look at the option. Also, you wanna unexplored onset your cube config. Right?
27:22 Okay. Let's see what we got here. CNA. Yeah. Okay. So I can just do dash dash c and I equals Cilium. You can do that. And if it's an old version, we can upgrade. So I guess it's important to note here then that those the issues with the quick start YAML is just with the cluster API, and that's not something that people should really expect to happen on another cluster. Well, I was a little bit hopeful in suggesting that. I should make it work. It'd be nice to to have a blog post about that. Oh, yeah. We we we can we'll have
28:08 that later. Mhmm. So this Oh, there we go. Okay. So Cilium is now just spinning up. So that's happening. Oh, sweet. Do you wanna check what version are we getting there? It just described that part or whatever. Describe part 70 m b. Yep. Looks good. I'll grab Mitch. I took the scraper. No. Okay. Yeah. Of course. Okay. Image. Image. Image. We got one eight zero. Okay. Yeah. We can, you know, we can do a minor upgrade on that, but not not necessary for for the purpose of the demo anyway. Alright. I think we're good. I think we've
29:02 we've got a little bit installed on our ManyKube cluster. Sweet. So you wanna install the connectivity check? Yes. It should still be nope. Okay. I'll grab that URL again. That's the quick start. Connectivity. Oh, that was over here. So question to you. Like, does does the cluster API provider for Pocket support any of the CNIs with, like, just, you know, network name or whatever? Like, by specifying you could can you specify a name of a network and get it set up? I'm not entirely sure, to be honest. Yeah. I'm not sure. I I just I I wonder if this is
29:57 a packet specific thing. The fact that it was pulling out that ten zero six makes me think that something's going on there. I mean, I'll look into that after this session. Yeah. Of course. Yeah. At least probably I'll write about it for sure because I'm definitely my my interest is I'm very curious. My interest is peak. So this is much better now. It's running through all of these tests. Sweet. I guess that's a good thing that we've not got anything and I crashed it back off or any other That's looking good to me. Yeah. Yeah. Some of these take a little while
30:30 What is the Container Networking Interface (CNI)?
30:31 to stabilize. So yeah. Alright. So should we talk about Cilium just now then? Yeah. Absolutely. Yeah. Yeah. We got a little bit distracted there, but we are Okay. So why don't we start with just, you know, the the CNI and what that means for Kubernetes and people using Kubernetes? Yes. What is the container networking interface? Yeah. I mean, there are many many things I could say about it, but I just remember starting with Kubernetes back in the day when like, it was my first Kubernetes related project to make BeavNet at the time work with Kubernetes.
30:41 Understanding CNI and Cilium's Capabilities (eBPF, Kube-proxy, L7 Policy)
31:11 And and that basically required to install it required me to install Kubernetes in a certain manner. And that well, basically, you just had to make Docker networking somehow work for Kubernetes. You know, basically, you had to configure Docker bridge in a particular way, and then Kubernetes would just leverage that. But that didn't give you enough flexibility, and that was sort of implicit rather than explicit. You kind of implicitly configured the underlying infrastructure to to do certain things with regards to networking. The Kubernetes had zero knowledge of what is going on there. So that, you know, sort of worked for
31:59 the early days of Kubernetes, but then the docker plug ins proposal had been floated, the lead network plug ins. And that had challenges when when we tried to that that API had challenges when we tried to integrate it with Kubernetes. It required, essentially, its own k v store. It was designed for Docker Swarm Alright. And Docker Swarm's k v store effectively. And it's just a very rough summary. If anybody is hearing this and want to correct me, please feel feel free to tweet at me. But this is a very rough summary of the of the history
32:45 on this matter. And the CNI proposal had been floated by folks from CoreOS. And and then, you know, it got actually adopted by the the the wider Kubernetes community, and now all network plugins for Kubernetes implement CNI. And that's the way to interact with you know, it's it's like it's the interface between Cubelet and the the network plug in. So I guess what's important for people then to understand here is that the CNI gives them the flexibility, the option, the choice, whatever their you know, whatever's driving them to pick a CNI implementation. And, clearly, they can swap
33:24 them out as they wish to any them. Yeah. It's it's a little bit challenging at present to swap out a network in a in a live cluster. So migration is is still a the kind of one of the less mature features of of the the CNI spec. And, you know, it's like CNI is often easy when that's, I guess, one of my main challenges. When you say CNI, I'm thinking of the of the actual spec of the container network interface that that I referred to. But people use it much more broadly. And I think you were actually
34:06 hoping that I'll tell you something a bit more broad than that. And that that's that's kind of a problem, I guess, that people who've been around for a while have with some of this terminology. Anyway, what I'm trying to what I'm trying to say is that more most people actually refer to CNI as a a network implementation for Kubernetes and not the interface that that I seem to think about every time you mention it. Yeah. That that's too scary for me to try and understand the interface that I've been to. I've been so distanced from it. I
34:40 mean, I I didn't my knowledge of it is purely as a consumer and not as a contributor or That's right. Yeah. So a you know, so Cilium is one of the network implementations. It does implement the CNI interface, and you can install it in pretty much any Kubernetes cluster. You may need to tweak parameters as we just saw earlier for request API provider. The main advantage of Cilium is that it relies on the latest and greatest kernel based technology where where essentially Cilium hooks directly into the kernel using eBPF, which is the kernel extension mechanism, effectively.
35:00 Advantages of Cilium
35:24 One of ways I can describe eBPF is that, basically, with eBPF, you can reimplement the entire TCP IP stack if you really wish to. So, you know, that that enables you to do a lot. That enables you to optimize a lot of things. Right? So for example, one of very cool optimizations that Cilium offers is that when Istio when pods that are running on Istio are talking to to to their sidecars on local host, that doesn't have to hit the the bottom of the stack. We essentially short circuit the communication of local host to local host,
36:07 improving system Istio performance dramatically. I don't have figures of hand, but that's just one of the examples that I think illustrates pretty well what what sort of things that Cilium can do. Okay. That that that makes sense to me. Yeah. I mean, I guess it's some of it. Sorry. Anya, go ahead. Some of the other major elements are like, instead of relying on IP tables or something like IPVS for load balancing of services inside a cluster, essentially, we can replace Kube proxy with eBPF based implementation as well. Ah, very cool. Which effectively eliminates any any use of IP tables.
36:54 Well, I mean, as a former SRE has had to has had to deal with IP tables and contract over my my lifetime of administration, I'm happy to see the back of IP tables if that is an option. So Yeah. That's right. And one one of the one of the the challenges that that IP tables has at scale is that it just doesn't perform too well Yeah. For sure. When you're when you're doing network policy or or when you're doing a load balancing in a large Kubernetes cluster. Okay. So would you say that, you know, all not all CNI implementations are
37:29 alike. Right? I mean, they all have a similar API surface that they conform to, but Cilium offers stuff above and beyond that. It gives people That's right. Like, those performance benefits you were just talking about. Yeah. That's right. And Cilium does get a little bit into this the service mesh territory by integrating with Istio, but also so on its own, we have, for example, layer seven policies. So you could you could you could define policies on, like, HTTP level. So you could, for example, deny certain requests unless they contain a particular header. Right. Okay. I think I understand that. So
38:10 that's would you say that this the secret sauce of Cilium then is eBPF? Is that what's really allowing for all of these unique features to exist? Is that integration really tight integration with the with the kernel? Right. Absolutely. Okay. Excellent. Well, I think we're in a position let me pop this back up on the screen. Oh, nope. That one. There we go. I think all of those tests have run except for one. Oh, we still got a few pending. You know, you'd actually take a while. So will we just start playing with this and remove the connectivity test? Are you comfortable
38:50 with that? Sure. Yeah. Sure. Alright. Yeah. Look, you know, some of these will be pending because it's a single node cluster, David. Oh, yeah. Of course. Yeah. I mean, we're running on many no. We don't have access to, like, that worker pool. There's a whole bunch of constraints there. That's right. Because because these tests are testing node to node communication as well. Oh, but that would explain why these multi node and the node ones are probably never gonna be quite happy. So I can do delete pods all, and I should only Oh, they'll come back.
39:23 No. You you you just want to delete the pods. Yeah. Yeah. These are deployments. Okay. Yeah. Didn't notice the the triple. Delete deployments all. Delete deployment. Alright. Boom. So let me pop open an error. I'm assuming we're gonna need some YAML here. So For sure. We probably want something running on our cluster in order to play with it. So there's a there's there's the there's a Star Wars demo if you like. Oh, nice. Okay. And how do I find the Star Wars demo? See if you scroll down. I think the yeah. If you go to next.
39:30 Deploying the Star Wars demo to our cluster
40:12 Okay. No. It takes you to that was, like A tutorial? Maybe. Yeah. Yeah. Yeah. Yeah. Yeah. For example. Yeah. So identity aware, lock in, like Yeah. No. For example, identity aware, HTTP aware policy enforcement. That sounds fun. So this is as mini cube. Right? This is mini cube. Yep. Yeah. And if you just scroll down, this will deploy this will tell you how to deploy the Star Wars demo. I can see references to that already. Do I need to do this e d PDF, man? No. No. No. That that's that that's I'm assuming that's out of date, actually.
40:57 So Alright. So we're gonna take this first lane here, which deploys the Star Wars app. Okay. Cool. And there is a warning here at the top. What I'm gonna do is apply that YAML, and then it tells me that if I haven't read the introduction to Cilium and Hubble yet, you would encourage that. So I don't know what Hubble is. So maybe you could answer that question for us. Oh, yeah. Absolutely. We should probably install Hubble, actually. You see, we should have used the Helm chart that enables us to do more Humble as well. Anyway so,
41:17 Introducing Hubble (Visibility Tool)
41:31 yeah, Hubble offers visibility into network flows inside of your cluster and visibility into, you know, for example, policy verdicts and and some such things. So you can you can actually see what what is going on and when policies apply and other such things. It has a CLI and a UI. We can get to using Hubble later on. Let's let's let's try the Star Wars demo. Yeah. I think this would be in default namespace. Yep. It looks like we've got two desktops, one type error, and an x wing running in our cluster. Cool. And there are probably some services as
42:00 Back to Star Wars demo
42:15 well? Yeah. Okay. Yep. We still have the connectivity desk. We have the desktop service. Okay. Can you can you reach the the Death Star service or pull forward it? Can. I I get really weird with things lying around that shouldn't be there. Yeah. I know. Right. Okay. So we wanna port forward the SBC to the Death Star. Mhmm. On port eighty eight eighty eight. So this must be a web service. Yeah. Is that correct? Yeah. I think you could try slash v one. Yeah. Right. It has a few endpoints. Yep. Cool. Alright. So if you go back to the
42:57 Attempting the Star Wars Demo Application
43:01 to the tutorial Oh, I cannot I was navigating looking for the whole Of course. We'll go back. There we go. Right. Sorry. I won't touch anything else until I That's fine. Thought you tell me. Okay. So we did this step here. We've confirmed the parts of service. And we can take a look at the Cilium endpoints. Yeah. Or you could you could just do you yeah. You can just do Kukulo get CEB for Cilium endpoint. Okay. So get Cilium endpoints. All All namespaces, I suppose. Don't matter. I wonder why. Okay. If you exec into one of the
43:30 Debugging Cilum Endpoints / What are Cilium Endpoints
43:58 Cilium pods and do what the, no, the the command above Oh. It did an exec into the the Cilium pod and did Cilium endpoint Okay. So keep system get pods at IT Cilium v bash Cilium endpoint. Okay. So this is okay. So, well, it doesn't actually look like it looks like something isn't working actually, because we should see the the pods from from default namespace here. Can you do list dash help? Yep. Jason now has okay. So it actually So can you help me understand what's happened here? We have deployed some pods. Now it's Yeah. So for for each of for each
45:13 of the pods, Cilium creates an endpoint object. So it creates an identity for each of the pod. So and the the step in the in the tutorial is to confirm that we have these identities. And, clearly, we don't we don't have them here right now. I think it is possible. Unfortunately, suspecting that Minicube installed Cilium, but some configuration is missing. I wonder if that was because my Minicube was kind of existing. So I'm gonna feel brave and restart it and redeploy all of that stuff within a blink of an eye. Alright. Yeah. We can we can chat more about
45:50 Lets delete minikube and start again
45:59 e b p f and other cool tech. So I'm assuming I mean, maybe I should have copied that command from the actual quick start, but I think c n I c m should be it. And So that that that, you know, that brings about one of the challenges that that I mentioned earlier, that installing, any network into an existing cluster is a tricky business. Yeah. It's not something that you know, most most normal customers are not gonna be messing around with their CNI implementation once it's up and working. That's Right. You know, my crazy attitude. So
46:38 let's make sure Cilium is It is, however, something that that that has been done. There was actually a blog post recently where somebody migrated. Was it Monza? I I thought it was Sky Bet. JetStack wrote the blog post. Oh, there must be more than one then. There was a talk at cube con from the Monzo team where they were talking about doing a live migration of their CNI implementation. I can't remember what they're room for. I think they moved away from Flanagan into Calico, and they did that live without any downtime, which was pretty impressive.
47:17 Yes. So that that was one more more recent, like, couple of weeks ago, blog post that documented migration from I don't remember which provider to Cilium. They didn't if it it is, a multi step downs or shall I say many step downs. So I'm trying to work out why my core DNS has decided it gonna run there. I am having the worst luck with my cluster today. Uh-oh. A minus n cube system describe pod. Yeah. That's broken. It's a commonality of errors today. Okay. Maybe which sorry. Can you can you show me the output of the
48:10 Returning to Packet Cluster Debugging
48:24 of Minicube when you started it up? Yeah. Of course. So it said something about Docker driver, and I wasn't entirely sure what that means. Oh, yeah. My Minikube runs on Docker for Mac. Is that a problem? Oh. Why not kind? Kind actually works. But that runs in Docker for Mac too. Right? Yeah. But we we we we had a guide for that. Okay. Well, let's yeah. Let's do do you have some you have some hypervisor available? I'm thinking I I have no idea. I forgot many. We do have access to this. You know what? There's a there's a kind
49:00 Lets go back to our Packet cluster
49:25 there's a yeah. We can try and fix that one. Okay. Okay. So so tell me tell me a little bit about this. This is using IP before four addresses in the one four seven range on the host side. Right? So that's the public IP addresses. Every pop over here, it got the the private oh, sorry. So, yeah. It took a 1080. So 1080. Yeah. Okay. Oh, can you can you show me your cluster API configs? That I can do. Yeah. That'd be handy. Yep. Dot the m o s. So that specifies some CIDRs initially. Right?
50:16 It does. Yes. So if I pull up. Yeah. So pod CIDR is is that. So and service CIDR is 172. So what we can do is just just reuse this when we when we configure Cilium. I thought we did use 1921680016. I pull up the Cilium.YAML here. Yeah. But let's let's just use the Helm chart instead. Right? Okay. Let's let's take a look at that then. So Yeah. Let me just give you the exact command, actually, because I think I I have something handy that that might just might just work. So it would be well,
51:13 can use Helm template, I guess. Is there a chat window in this? Yeah. On the right hand side, there's private chat. Feel free to drop it in there. Oh, we actually have a few comments. So you oh, yeah. So if you try this let me just drop some of the things that are unrelated. Cluster pool so we can have a fairly explicit configuration. And did you say it's 19268Dot0Dot0Slash16. Dot 6 8 Slash 16. Yep. So and we're gonna do that for cluster pool a p v four and the native routing slider. Yeah. And then, yes, we can keep the p of
52:15 Yeah. And global endpoint routes too. Okay. Let's try this, and I think this is gonna be a good starting point. Mhmm. Awesome thing. I don't know about formatting. You might have to fix it up. Alright. So I already run this. Yeah. Cool. So I should be able to do this template. It should Yeah. Well, you wanna get rid of the backslashes or or insert. Yeah. Yep. Yeah. Yeah. Good idea to save that in a in a little file. Yep. We'll call this Helm template s h. Yep. That's that's our YAML. So we'll call this Cilium
53:00 Helm dot YAML. And And do you have something in this do you still have Cilium in the cube system namespace? So let's reexport our cube config. Let's get our pods. Are you in the same directory? Yes. I can do this manually. So kubect config. Why is it speaking to one two seven? Oh, you know what? Your mini cube has updated your copy click start cube config. Did it? I suspect. Let's have a look. Oh, no. I'm I'm please don't have done that. Oh, we can get it again. Hold on. So I need to to CTL.
53:06 Advanced Debugging: IPAM and Lingering State
54:07 Let's just do it this way. Woah. It's a lot of stuff. Alright. So there is a there is a guide here, and this has the command I need, which goes into my management cluster and gets me my KubeConfig. We will write this back to that directory. I am on the wrong cluster. Oh, because I'm on Munich. I'm a major disaster. Alright. So let's change the context, docker desktop. Now I can get my I don't know how you hit me. How many classes do you have, David? One to a few, it would appear. I use I'm gonna just drop this shell.
55:00 Deploying Cilium again, this time with Helm
55:05 Let's I I think it's just confused. Yeah. It is just confused. Cool. Yeah. Yeah. Really? So which means I should now be able to get my Your copy configuration. Yeah. Like, keep config. Yeah. Which is now my copy config. I'm gonna reexport this. I'm gonna then do get pods. Right. That Oh, yeah. And we have no Cilium. We're currently in the c and I free zone. How did we get back there? Oh, did you delete it from the previous attempt, or is this is your backup cluster? No. This is the first cluster we were playing with, and we deleted the Cilium implementation.
55:52 And I would just just confirm. Yeah. Yeah. Right? So this is a If you yeah. So if you if you do Cube Cuddle create an s Cilium. Yeah. Sweet. And now try that Helm template command. Yeah. You already did that, so just apply that. And now check pods in the Cilium namespace. Right. That looks good. Yep. Okay. Are we gonna brave the connectivity test again? So if you if you if you if you pop into one of the Cilium pods, and let's just check Cilium status. Okay. So we want Cilium exec. I should have given you this command
56:39 there now. I'm sure it's gonna work now. Cilium oh, IT. Yeah. Okay. Means I wanna run Cilium Health status. Was that right? No? Oh, just status. Status. Woo. I think it's looking good. Endpoints unreachable. Maybe that's I just know that. The the IPAM part looks looks good. It does look good. Yes. Yes, man. Yeah. We're getting there. Alright. So now the connectivity check. Let's deploy that to to the default namespace again. This one. Yes. Is this fish? No. Z Shell with a lot of plugins. A lot of plugins. I like mine without any. Cool. Yeah. It looks like
57:43 so if you do now k. Get c p in the default namespace. There should be some sentiment points there. No. Not yet. Okay. This is also kinda Oh, no. Again, really. Is this a is this a new one? 13 seconds. Yeah. It looks like. Okay. Yes. This is a new one. Yeah. It says the range is full. Yeah. I don't get why that happens. Okay. Let's back in here. So Mhmm. When I'm when I'm working with Cilium and we have problems like this, right, we have the Cilium CLI. What what can I do with the Cilium
58:00 IPAM range is full
58:41 CLI? We can have a look at Cilium status once again. So it is complaining about anything. Doesn't look like. It looks healthy. Looks like it's using the I don't know. So what does it mean? So the node is reachable. This is the private IPs of the node. That seems cool, but it can't because I have one of eight reachable. Does this mean the Cilium agents aren't able to speak to one another? Oh, do you have any security groups in there or anything like that? No. The these are these can these can all pretty much reach each other. There
59:29 shouldn't be So yeah. So, basically, between nodes, all ports are open That's correct. As far as bucket is concerned. Okay. So inside the Cilium pod, you could also run Cilium health. Cilium yeah. I think there's Cilium dash health, actually. There's a separate command, Cilium dash health. David dash health. Just sitting in dash health. Sorry. Alright. There we go. Status. Okay. So we got a couple of written to host issues here. So the ICMP stack is fine. HTTP is fine. And then the so what is endpoint connectivity? Where is this 1000214 coming from? Because that's not gonna work on
1:00:15 the packet. Yeah. I wonder why this is what we are seeing here once again. So this this would be a node IP. I guess, can you do get nodes? You mean control get notes. Right? Yeah. Yeah. Yeah. All wide. Yeah. No. 10 Can we can we change? I mean, I'm assuming this endpoint subnetsider is coming from the Cilium configuration. Can we override that to not be 10? So this is okay. So that's oh, these are these are pods that are that had been running before Cilium was installed. Right? So can you can you do get pod
1:01:11 pods in cube system or wide as well? Yeah. So but that that at CD pod, what about it? Can you show it to me? Is this is that at CD pod showing up here? Hold on. Let me just do a chip system get pods. Yeah. Okay. HCD copy. Yeah. And chip system get pods all wide. Yeah. Okay. Alright. The one is here. Oh, yeah. Where is this? Yeah. What are all those IPs now? What's the what what's the +1 080? 1 4 7 is public, basically. Is that the case? And +1 0801080 is your private one. Right? Yeah. +1 080
1:02:07 is our private network. 147 is our public network. So you you so you basically have private node public nodes and private control point. Is that Yes. Okay. Cool. Yeah. So it's trying to hit the and I guess yeah. So you are saying that wouldn't work on the packet network because of Oh, so when I said that, my concern was the let me just exit back into the computer. When we ran the health cluster, this endpoint coming pivot to 1002254, I mean, packet's not gonna let you hit the TCP stack on its private network range without blocking traffic because it'll think you're trying
1:02:55 to jump over to other projects or to other machines that aren't under your ownership. I assume this endpoint connectivity cider is something that Cilium is creating within the cluster, which I thought may have been causing confusing. But that was me speculating, to be honest. It's it's actually it's like all Cilium pods want to talk to one another on port 4240. So they're just pinging one another on those ports. But how where's this IP address coming from? It's what's got my That would be a I think it would be a node IP for one of the
1:03:32 one of the control plane nodes. No. Not in the ten zero two range. No? Okay. Let's just have a quick look. I can do that from get nodes. So Yeah. Yeah. Yeah. Yeah. No. 10. 10 eighties. Okay. Well, I I I then don't understand why there is a 10 o. Oh oh, wait. No. Didn't we see that in on one of the city pods or what was that? No. I don't believe so. So there's no IP address. Yeah. I know. We're we're not using that. We're using +1 92. It's right. Weird. Okay. So my end sync was telling me this is
1:04:25 in the Cilium configuration. So I could go to Yeah. Here. Yeah. That's right. Yeah. You can double check. We've configured yeah. Let's let's do this. Yeah. Yeah. No. It's not it's not there. Right? So one thing that you can, yeah, check once again, Cilium status. Yep. Just double check that it's reporting the IPAM as healthy, doesn't it? Yeah. It's it's reporting that two out of two fifty five are allocated. Right. So there there should be plenty of IPs to go about. Okay. So I Somehow the allocations are not working. So the hypothesis right now is that that
1:05:11 100 IP address is some sort of hungover thing from the previous deployment? I mean, we did reuse a cluster. So I Oh, yeah. That's right. So yeah. I see I see that that's actually possible that we tried to that that Cilium stored some state, and it's just trying to reuse that. Yeah. Because we deleted all the pods, but I'm assuming other secrets contact map. And we run Cilium command yeah. Exactly. They they could be they could be something stored in the on the machine. So if you run Cilium command once again, I think there was a a way
1:05:45 to to start a service preflight. Let's see. Clean up the Clean up. Yeah. Yeah. Do clean up. You might wanna do it on every On every pod? Okay. Alright. It doesn't work from within a part of a sling. If you reboot the machines that Uh-huh. Yeah. Fix it. You know, things are not going to plan, but this is fun. I'm enjoying It's it's quite fun, David. Yeah. You're challenging me, definitely. So we wanna reboot every note. Right? Yeah. What our views are saying. So a reboot probably gonna take us around one minute. But just to kinda provide an
1:06:20 Summary of what has gone wrong thus far
1:06:44 idea of what's going on to people that just joined us and going, what the then we had a cluster. So we deployed Cilium to it and Yeah. Used the default configuration. We were following a quick start gate. Unfortunately, that default configuration did have conflicts with the packets private network IP address range. So we abandoned our plan and went to mini cube, but then we ran into problems with mini cube because I am running it in Docker for Mac. Because, you know, I know well, I should know better. I didn't know better. And I'm putting earlier here through an awful lot. We've
1:07:19 come back to our packet Kubernetes cluster, and we customized using the helm command and templated our own unique Snowflake, you said they have those which should work with the packet. Private IP for IPv for network. Unfortunately, we think that that left some traces of the old configuration on the machine. So we are now rebooting them all, and we should be able to monitor that coming back when one of the control blend nodes comes back via get nodes. So this may just take another moment. Is there anything I missed? Do you think I covered This is very good. Yeah. Yeah. I mean, the
1:07:56 the the just to clarify, David hasn't installed Cilium on Packet or any other cluster API based setup. I have not installed Cilium in a packet or a cluster API based setup yet. I had been playing with other providers recently, but not cluster API and not packet. So here we go. It's been it's been a challenge for both of us, and I think I think we're getting close. I think we should be able to sort it out before before the show ends. Well, I mean, we can always chalk this one down as oops and and revisit this
1:08:38 another day once I've once I am more familiar with cluster API for sure. Yeah. Definitely. And, you know, it'd be nice to to to have a little blog post or guide on installing Cilium on cluster API and perhaps Packet or or possibly any other cluster API provider. I mean, really, I should have done is configure the cluster my cluster API cluster with a a VXLAN on the packet side so that we had free range of our own you know, we could do carte blanche for our IP address Mhmm. System there. Yeah. But you don't want
1:09:11 to do two two two like, you don't wanna tunnel over a tunnel. Right? Because Cilium sets up a VXLAN tunnel itself as well. Oh, okay. Right. So I guess if I had used overlay I should say. And cluster API. We still have that config there. Right? So, I mean Yeah. It it would have been a simple Alright. So has has anything come back? Are we seeing any any machines rebooting in packet consult? Let's take a look. Well, let's use the band and see how far then to the reboot this is. YubiKey. Normally, things would be talking to me by now.
1:10:23 Let's jump back down here where I'm getting notes. Oh, there we go. Okay. So that machine was just the rebate was taking a little bit longer. That's a kernel. Mhmm. I'll leave that out of band console running so we get a good idea when that control plane is happy. And that should mean we can start getting a response to the key, get nodes, and fingers crossed, all the machines roughly come up at the same time. So yeah. And then there's also one more option to try if we are not not seeing good results, but I think we
1:11:21 should be fine. I think I think there's just some some odd leftovers from the last from the first attempt of installing Cilium with configuration that wasn't that wasn't correct. So does Cilium use custom resources to store state, like endpoints and stuff? Yeah. There was It does use custom resources. That's right. Yeah. For a few things. We we will need to check those as well, but the it does it does save some state on the on on the machine as in inside the kernel. When you when you restart Cilium, it basically doesn't take the network down, right, as most
1:12:09 as most network plug ins for Kubernetes. Alright. We are back in business. Oh, sweet. Okay. Cool. Alright. And all of our nodes and things are back now? They are. They're all ready. Oh, that's that's that's magic. Yeah. That was a good bet. Alright. Let's go. At least one thing went our way. Right? Yeah. We can jump back into the Cilium one of the Cilium pods. The system's I guess that's Cilium pods aren't quite there yet. Oh, in Cilium namespace now. Right? Oh, yeah. That's right. Okay. Cilium. The first one, we want bash. And you want me to run add
1:12:14 Cluster Recovery and Post-Reboot Checks
1:13:00 test. Cilium status. Not quite ready yet. Oh, no. I've just had the the job. It would seem Internet down Cilium agent. Oh, yeah. Okay. The well, wait. Can you do describe one of them? Yeah. Okay. Cool. This seems to be maybe one of the yeah. I'm not sure what happened there. So Cilium, just do this. Yep. Okay. So and and Cilium dash health. Status. Yeah. I think this is looking a little bit better. There's still some odd oh, no. There's still some tender dose. Okay. So if we have a look at oh. I mean, this one local host is so
1:14:10 the problem is going to be connected to this. Yeah. David, what you do want is make you after installing Cilium into cluster, you you you really want to bring all the pods into into Cilium management. Right? So I think it is possible that some of those stale entries are due to due due to the fact that those pods aren't fully managed by Cilium yet. So can we can you just, like, try and kick some of those pods? We did just now, but the but some of those pods came up before Cilium started up. Right? That is, like, one of the challenge of
1:14:50 network migration and drop out from that pod. And, yeah, look in Cilium in the cube system namespace. And, basically, if you can just do cube cuddle delete pods and keep system dash dash all. Never run this in production, anyone. But what you can do no. What what you can do, David, is actually just, like, kick some of these pods first. I'm more than happy to kick them all. Nothing bad's gonna happen. Yeah. We just rebooted the whole thing. Okay. But the parts would have the containers would have been stopped and restarted rather than recreated, wouldn't I? So this actually will proper kick
1:15:31 them. Potentially. Alright. I'm not gonna wait for that to come back. We'll just come in here and we'll kinda watch that. I'll come back this way. Yeah. For twenty seconds, twenty seconds, twenty seconds. We still got a couple of proxies disappearing. CoreDNS. Yeah. I mean, some of them are having some issues. Some of them are in the in the host network namespace, but, you know, I don't need to worry about filtering those out. So, yeah, this this is looking like we're getting some of those pods restarting now. So if you jump back into that Cilium pod and check
1:16:00 Lets delete Cilium node CRDs
1:16:12 Cilium health status again, So let me So can do can you the the Cilium endpoint list now? So what endpoints do we have? Oh, yep. Oh, it's like that part. Yeah. Endpoint list. So there's there's what is that one? Yeah. No. It doesn't look like we have actually, yeah. We have a node. Right? And we don't have any any port endpoints here. Alright. So Where does Cilium store the IP addresses of all the agents? Oh, yeah. In Cilium node object. Let's have a look at those. So if you look at if you oh, yeah. Well, you can
1:17:00 Cilium is working ... but DNS isn't
1:17:05 have a look at here. Node list, yeah, for example. Yeah. So these these are wrong. Right? Yeah. These are wrong. What? Yeah. That is that is that is a problem. And one of them is is fine, but the the other ones are are old. So we want to probably try and clear some of those CRDs. I think that's, yeah, that's where it's gone wrong, man. So if I do get Oh, have a look have a look at the yeah. Not CEP. No. They do Cilium nodes. Yeah. So we just delete all of them. Right? I think it should be fine. Yeah.
1:17:24 Troubleshooting: Clearing Stale CRDs (CiliumNode)
1:17:56 An agent should recreate them, I would imagine. Yeah. It's not like a production cluster or anything. Right? We're kinda we're kinda learning. I think this is this is this is cool learning experience by the Seriously, I am learning a lot about debugging this stuff. Okay. I have one node come back. Okay. I mean, not just the agent parts. Would that help? I do get part I mean, if I just restart all the Operators rather. It's okay. Do you want me to restart the operators? Why not? I assumed, I guess, incorrectly, that Cilium nodes would be created by each of the agents.
1:18:44 Would they would they not manage their own node? I think it's it's operator, but from memory. Again. Okay. So now we have one Cilium node. That's brilliant. I'm gonna delete it. I'm I'm just gonna Yeah. Let's yeah. Let's delete all all the Cilium pods. Yeah. Okay. So now I wanna see the pods coming back, and I wanna see that Cilium node list looking really good, and I want this campaign to work. Okay. Alright. Oh, sweet. Okay. Good. Yeah. Okay. And now back into a any of the Cilium pods and just double check what are we seeing.
1:18:58 Final Cilium Pod Restart and Verification
1:19:32 Cilium yeah. And they're, like, Cilium health status on node list. Just looking good, David. Yeah. We we're getting to fix this. Alright. Okay. Oh, this is this is this is much closer. Right? There's there's some sort of happiness here. Where are those oh, wait. Okay. But this this is probably the That's better. This is good. Yeah. This is this is at least trying to use some existing IP addresses and not, like, non existing ones. Okay. Okay. Cool. But there's there's, like, connectivity issue between private nodes and public nodes. Right? So you're saying that the private node public
1:20:18 nodes wouldn't be able to reach private IPs? I don't understand. Sorry. So you you said there are two types of nodes in this cluster. You said there's there are there there are public Every node has a private IP and a public IP. Oh, oh, okay. Sorry. I misunderstood that earlier. Okay. Okay. So I think this is this is this is probably fine. And this is this is definitely better. And there's no confusing stuff in here that I can see anymore. And does this this seems like there's more entries now? Yeah. And so it's gonna default, namespace default. Yeah. So
1:21:07 you can now see these echo b pods from the default namespace with we we see the Cilium endpoints associated with So those yeah. Those tests are all actually running now. Oh, sweet. Okay. And they are some of them are even succeeding. They are. Yeah. They're they're getting there. So the connectivity test now seem to be doing as we expected. You you almost have the notes for the blog post now, David, which we can review. So okay. Okay. That's that's definitely way better than earlier. So we'd be brave and deploy the Star Wars demo? We can just
1:21:54 Re-attempting Star Wars Demo
1:21:58 oh, look. There's a crash look there. Multi note headless. Okay. Anyway, yeah, we can try the Star Wars demo. Why not? But if you can you can just also try, like, keep CQL get CEP and have a look if there are any Cilium endpoints just to sort of understand that. They're hard. Yeah. Sweet. And you could, like, basically inspect some of these, and you can find out some some of the the information that Cilium stores about each of the pods and confirm this whole idea of port identity. Yeah. So what do I what do I have
1:22:38 here? What's important? Okay. So ignore the managed bit. Alright. So So it's just a representation of Alright. Okay. Of a pod. And there there's an ID being assigned to it and a set of labels, and then you can use these labels to express policy. Okay. So let's I think we have, like, mostly working configuration then. I think there are few few few things that that we'll need to to do some more homework on, but I think this is this is pretty good for for a first time install, you know, in the new provider. Oh, yeah. Definitely. Like I said, this has
1:23:29 been very valuable for me just to get to kinda learn what things are are doing. Mhmm. So if I I'm gonna deploy the Star Wars thing again because I'm just Yeah. I'm feeling really brave now after all of that. I can't feel any braver than that. Alright. So what does the Star Wars demo do? Just a couple of parts. The the idea is that there is a there's a Death Star thing that has an API, and you can destroy the desk star by using the API. But you can implement policy that seals the the endpoint
1:24:22 that that allows you to destroy the death star. And you could override the response, for example. So it's like a potentially vulnerable service that you can protect through policy. Okay. So quickly scanning this page on the demo then. It's saying that any ship labeled any pods labeled with org equals empire are allowed to communicate with the desk there. Yeah. Means if I run this command Mhmm. Oh, no. That's not gonna work because core DNS. Oh, no. Core DNS is running there. Oh, why did that not Oh, great. Sweet. Okay. And it's all in default namespace now. Right?
1:25:01 Troubleshooting Demo: DNS Resolution Issues
1:25:15 No. It's in keep sis is it no. It's it's in default. Right? The the Star Wars demo is in default namespace, David? Yeah. So I'll just click second to this part. Yeah. Let's do that. Go into the x wing. I always like them. I'm gonna check if DNS is working then. So that's our default s c c cluster local. No. But that would be a service IP campaign game. But I just wanted to see if it would resolve. Yeah. I can use Oh, yeah. Alright. I know. I get you. I think Now let's look up maybe.
1:25:56 The store. Can you ping ping your DNS? Can you can you resolve Google or whatever? I remember right. The IP address for cluster DNS is 1000986. Why have we got oh, yeah. The service is so called QTNS. So, yeah, I can't But it's a service IP. You can't ping it again. Can you yeah. Check check resolve con. So this is name server, the same one that we just tried. Okay. And if you try a nest lookup for Kubernetes right. DNS is definitely broken. Somehow DNS is broken. That's Alright. Let's not try and fix. No. No. I think
1:27:17 I think you definitely need to investigate that. But one thing I'd recommend is that, you know, try and try and somehow hook Cilium installation directly into the cluster API provisioning phase. Right? So you you you you can you could get Cilium installed from the get go, and no funny business, hopefully. Well, I think I'll make that my homework for the next week or so and really dig into this and make sure that I can get Cilium working properly with the cluster API. I think that's really important, especially with the power and the features that Cilium brings to the table is the CNI
1:27:31 Conclusion and Future Steps (Addressing Remaining Issues)
1:28:00 implementation. I I really do want us to work well on packet with the cluster API. So I will continue to fight the good fate here, and I get a nice blog post out, making this easy for people to consume and use as well. I wanna just thank you for your time. I know that was a bit of a slog, and there was a lot of things thrown at you there that we kinda had to try and debug. But you know what? That's computers. Things don't always work, and you gotta be prepared for that battle. So I really
1:28:28 appreciate your help and perseverance as we try to get that working. It was good to see you, David. Yeah. You too. So I am thank you very much. For watching. Thanks. Thanks. Yeah. Thank you for your time, and we'll definitely do a part two of this once I I iron out the kinks of the cluster API implementation, and we'll see what happens then. So have a great day, everyone, and thanks again, Julio.
Technologies featured
Meet the Cast
Stay ahead in cloud native
Tutorials, deep dives, and curated events. No fluff.
Comments