Watch / Klustered Live
Overview

About this video

What You'll Learn

  1. Trace an application update failure in cluster 17 by isolating a container runtime error during pod startup.
  2. Find and stop a suspicious node-level debugger pod while validating other deployment and networking signals.
  3. Resolve cluster 18 by detecting CoreDNS NXDOMAIN behavior and removing a broken mutating webhook.

Marcos Nils joins to debug two Kubernetes clusters: cluster 17 from Sascha Grunert (a rogue node debugger pod and a containerd 'honk' error pointing at BPF) and cluster 18 from Billie Cleek (CoreDNS NXDOMAIN rule and a malicious mutating webhook).

Chapters

Jump to a chapter

  1. 0:00 Viewers Comments
  2. 1:23 Introductions
  3. 1:24 Introduction & Show Overview
  4. 2:56 Introducing Co-Host Marcos
  5. 3:47 Starting Cluster 17 Troubleshooting
  6. 3:50 Kluster 17 - Broken by Sascha Grunert
  7. 4:43 Initial Cluster 17 Checks
  8. 7:37 Cluster 17 Application Upgrade Failure (v2)
  9. 8:33 Debugging OCI Runtime Error ('Honk')
  10. 10:03 Investigating Cluster 17 Configurations
  11. 19:08 Investigating Node-Specific Issues
  12. 21:58 Discovering Rogue 'Node Debugger' Pod
  13. 24:04 Examining Rogue Pod Manifest
  14. 25:33 Debugging Inside Rogue Pod
  15. 26:40 Finding Suspicious Host File
  16. 27:50 Analyzing Rogue Killing Script
  17. 28:29 Stopping Rogue Service & Pods
  18. 32:16 Retesting Application Upgrade (Cluster 17)
  19. 33:15 Cluster 17 App Works, 'Honk' Remains Mystery
  20. 46:40 Switching to Cluster 18
  21. 46:55 Kluster 18 - Broken by Billie Cleek
  22. 47:16 Cluster 18 Initial Diagnosis (API Server Down)
  23. 49:08 Debugging Control Plane Components (Cluster 18)
  24. 54:39 Cluster 18 Application Networking Issue
  25. 55:48 Debugging Networking from App Pod
  26. 56:47 Identifying DNS Issue (Cluster 18)
  27. 57:35 Checking CoreDNS Configuration
  28. 58:30 Found: CoreDNS NXDOMAIN Rule
  29. 58:47 Fixing CoreDNS
  30. 1:00:59 Discovering Mutating Webhook Issue
  31. 1:05:40 Deleting Problematic Mutating Webhook
  32. 1:06:03 Verifying Cluster 18 Control Plane Health
  33. 1:08:17 Cluster 18 App Connectivity & Upgrade Test
  34. 1:08:40 Cluster 18 Resolved
  35. 1:10:00 Kluster 17 Revisited
  36. 1:10:13 Revisiting Cluster 17 ('Honk' Mystery)
  37. 1:12:46 Searching for 'Honk' Binary/Config
  38. 1:15:58 Debugging Containerd on Worker Node
  39. 1:21:45 Inspecting Containerd Configuration
  40. 1:24:22 Cluster 17 Conclusion (BPF Suspected)
  41. 1:27:18 Wrap-up & Thank You

Technologies featured

Meet the Cast

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Comments, transcript, and resources

Additional Resources

More from Klustered

View all 45 episodes
Kubernetes

More about Kubernetes

View all 172 videos
containerd

More about containerd

View all 23 videos
CoreDNS

More about CoreDNS

View all 21 videos

More about eBPF

View all 9 videos