Restrict Access to Secure Files with Tetragon

Watch / Tutorial On demand

The embedded player needs JavaScript.

Open the video stream (HLS) Download captions (VTT)

Overview

About this video

What You'll Learn

Write a kprobe tracing policy for sys_write with the correct syscall arguments.
Use selectors and matchArgs to filter file access by kernel path arguments.
Enforce secure-file rules in kernel with Sigkill or getURL actions.

Write Tetragon TracingPolicy CRDs with kprobes to observe file access in a Kubernetes cluster, then filter on sys_write and paths under /etc using matchArgs, and enforce policy in-kernel with the SIGKILL and getURL match actions.

Chapters

Jump to a chapter

Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

0:00 Introduction to File Access Enforcement with Tetragon

0:00 Hello, and welcome back to the Rawkode Academy. In this video, we're following on from our installation and overview guide of Tetragon, an eBPF security and observability tool for your Kubernetes cluster. In this video, we're going to go a little bit deeper than the installation and a general overview, and we're gonna look at one of the main use cases for adopting Tetragon within your clusters, and that is runtime enforcement and observability of file access across your container workloads and the host they run on. So let's get into the docs and have some fun. Alright. Here we are at the Tetragon documentation

0:41 Exploring Tetragon Concepts and Documentation

0:44 or at least we will be now. Now there's a few things I want to cover before we get to the use cases and the documentation itself well covers file name access and what we can do. But there is a little bit of vocabulary that you have to be comfortable with and a few concepts that you need to know before you should really start writing tracing policies. Fortunately, the documentation has us pretty well covered too. So if we go to concepts, tracing policy, we can now begin to dive in to what a tracing policy is composed of.

1:11 Understanding Tracing Policies and Hook Points (kprobe, uprobe, tracepoint)

1:18 So the first thing that every tracing policy needs is a hook point. A hook point can be one of three things. First, a k probe. This is a kernel probe that allows you to hook in to functions called within the kernel. Second, we have u probes. These are user probes. Go figure. And these allow you to hook in to any function called within a user land binary. And lastly, we have trace points. These are more generalized points within the kernel that you can hook into that are stable across kernel versions. Like the documentation says, going from kernel version to kernel version, k

2:01 probes are not guaranteed to be consistent, but trace points are. And we'll take a look at each of these as we explore how to do file access observability and runtime enforcement. And we're gonna kick things off with the k probe. Okay. So let's write our first tracing policy using a k probe. Now when you're working with a new Kubernetes resource, usually helps to find some examples online or just use kubectl explain followed by the name of your resource. Here, the tracing policy tells you that we need an API version kind metadata and spec, but, of course, you

2:40 Using kubectl explain for Tracing Policy Structure

2:50 already knew that. But at the top, we do get the group, the kind, and the version that we need to successfully write the start of our manifest. So let's do API version, cilium dot I o b one alpha one kind tracing policy. Like everything, we need a metadata with a name. I'm not fussed what that is. We leave it like so and then we need something. So let's jump back and do kubectl explain again. Only this time, we'll add spec. We can see all of the properties that our spec can take, and we're curious about k probes. So we'll

3:34 expand that like so. Now we can see that this is a list of k probes with an args, a call, return return ack, return ack, action, selectors, and syscall. So let's start to fill this out. We know that we have k probes. We know it's a list. It needs some sort of call. Well, the call is the name of a function to apply the k probe spec to. This just means the name of the kernel function or Cisco that you wanna monitor. Now we're doing file access stuff today, so we're gonna take a look at syswrite.

4:08 Finding Syscall Function Signatures

4:14 And this is a syscall, so we're gonna mark this as syscall true. We know this because we can see here that we can specify a billion to indicate whether this is a function or a syscall with the syscall being true. Now, unfortunately, this is not enough for us to start tracking and observing the syscall with Tetragon. We have to provide the RX, which is a list. Let's jump back to the terminal. And we can see the RX here needs index and it needs type. These are the two required properties that we need and this has to describe the function signature

4:55 for the kernel function or syscall that you want to observe. Now assuming you probably haven't memorized this stuff, you can look it up on the man page if you want or it is available on any good man page website like di.net. And here we can see the function signature for the right syscall. So let's just paste that here for now and work this out. So when we did kubectl explain, it said that we need an index and a type. The index just maps through the additive of the function signature. Meaning, we have an index one

5:25 Mapping Syscall Arguments in the Policy

5:39 and of course zero and index two because we have zero, one and two properties or fields on this function signature. Now, the type for the first property is int because we have an int. The type on the second is charbuff. Now the c programming language doesn't really have any concept of strings. It has arrays of bytes, characters. When we see this, we do need to provide a little bit of extra details so that it understands the size of that property which is where size args index comes in. This is the index of account, the size t.

6:34 And index two is of type size t like so. And now we have enough of a tracing policy which I don't know why I called trace points. Let's call this this right. In fact, it's not this right. Underscore right. So let's fix that while we're at it and we'll even rename the file just because these things will bug me. Like so. Now we have a tracing policy that will tell us every time the sys write, the sys call called write is executed within our kernel. So let's check. So we jump over to our terminal. We're going to do apply on that sys

7:12 Checking Tetragon Logs for Events

7:17 rate file where I got a proper entry wrong. Let's do the explain again, size, arg index, extra s, and apply. And, of course, resource names DNS compatible. Voila. Alright. So now we have a tracing policy. Wonderful. So the next thing we want to do is start pulling out all of the logs on our Tetragon pods so that we can see the feedback, observe the syscalls within our cluster. So if we do a cube control cube system get pods, you'll see that this is a three node cluster because we have a daemon set for Tetragon, which has three different pods.

7:55 Setting Up Log Tail and Test Pod

8:09 So we're gonna get the logs for everything so that we don't have to chase the logs that we're looking for. And to do that, we'll just do a kube system logs dash f label app dot Kubernetes IO instance equals Tetragon. Now if you remember from the first video, by default, Tetragon will show you all the process enter and exits. So we're just gonna skip past that so that we can start to filter on the logs that we care about. So I'm gonna split this pane and we're gonna apply my sleepy YAML. A sleepy YAML just runs Ubuntu with a

8:56 sleep infinity, but it does mean that we can exec into Ubuntu and run bash. So as you can see even by just exec ing into this container, our logs are already filling up again. So let's just clear that up. And now we're going to run echo. Hello. And let's write that to this file here. And we can actually see already we're seeing our process k probe here for sys write events already because, of course, not all writes happen against the file system. So let's write that file. Now we executed a write, but there's so much going on. We're actually getting a lot

9:38 Analyzing Basic Sys_write Log Output

9:43 more writes than we can fully understand, and it's actually gonna be rather difficult to find and pinpoint the one event that we want because let's jump back to here. When the right syscall is called first, it writes to a file descriptor rather than a file name. So we're not even gonna be able to see that we wrote to something called DEF because at the syscall level there's no such thing. What we can see and why don't we just grab one of these and throw it into a buffer. And we'll save this as output dot JSON and let Versus Code do

10:22 Detailed Examination of a Log Entry (JSON)

10:29 some formatting. So we can see here that we have some sort of exec ID. Now this looks like a base 64 encoded string. Whenever you see one of these, we can decode it and we see that this happens on one of our nodes. No surprise. We have the path, the UID, the current directory, the binary that executed it, and then we get our logs enriched web container and Kubernetes information. Actually, that is pretty sweet. So we can actually see here from Tetragon that this pod was the Ubuntu one in the default namespace with this image that started

11:10 here, its pet is here, it has these labels and so forth. So we're already getting a lot more information from what actually happened in the system. And if we scroll down, we can actually see the parent process, which in this case is run c and container d that is responsible for executing a workload. Now down here, we have the information that we matched again in our tracing policy. We can see the syscall, which is sys write. Now because these are architecture dependent, Tetragon has detected the architecture where the pod is running and prefixed it to get us

11:48 the right Cisco. Thank you, Tetragon. We can see that we are writing to fail descriptor two and here's our bytes. Again, we have a base 64 encoded string. Whenever you see one of these, decrypt it. And we can actually see that we didn't catch the write that we did with the echo instead we caught the prompt being written to my terminal. We then have the action and the policy name. So we need to add a little bit more information to our policy so that we're not getting bombarded with all of this noise. So let's dive in to selectors.

12:29 Filtering Events with Selectors

12:30 Okay. Let's jump back to our terminal, and we'll close down our Ubuntu for now, pull up our explain, and look at k probes. We can add one more property, which has our selectors. And this allows us to add actions, our explainer capabilities, capability changes, new space changes, new space puts, and return arcs. Now we're not gonna sit and dive through lots more explain commands to understand this. The Tetragon documentation is rather good, so it would be a shame not to use it. Let's jump over here and we can see that we have all of the documentation on

13:09 the selectors that we can apply to our tracing policy. So in order to look at match args, I've decided to fetch a rather more complete YAML from the Tetragon examples directory. You can find us on github.com/psyllium/tetragon and then find the examples folder. Now this is a really cool example because it has more k probes. We're now using kernel functions rather than just syscalls, and it already has selectors across each of these that it's matching against. And we're gonna take a look at match args first. So here, we're looking for the security path truncate kernel function.

13:13 Using matchArgs for Path Filtering

13:55 Now this has one arg and a return arg, but how do you get these? Right? There won't be a man page for every kernel function. Well, that's where Sourcegraph comes in if you wanna make this a bit easier in your life. You can use source graph to search the entire Linux code base. We can say type symbol and copy the function name. Now we can see the header where this is defined and the function itself. In both counts, we see the return value is an end and it takes some sort of path as its only argument.

14:02 Finding Kernel Function Signatures (Sourcegraph)

14:38 As such, our is a path and the return value is an int. Now because this has taken a path, we can filter using match args on the first parameter, the only parameter, and say equals, etcetera, password. So let's apply this to our cluster and pull up our logs. Let's get rid of all this. And then instead of splitting this vertically, let's just jump over to the site where I also have an Ubuntu bash in this cluster. And let's check our path, etcetera, password. Well, let's open them and add bash one two three. Now if we pop over here, let's grab

14:56 Testing Path Filtering with /etc/password

15:25 one of our key probes that we can see like so. Let's put this into our output dot JSON so that we can format it. We can see here and if you scroll to the bottom that our security file permission function was a match for our Etsy password file. So let's tweak our security file permission selectors. And instead of just filtering on Etsy password, let's do a prefix for everything and say that the Etsy directory, like so. We can then come back across here, apply, get our log, clear the screen and this time we're right to

16:11 Testing Prefix Matching with /etc/lsb-release

16:22 LSB release. Pretend this is a Ubuntu 25. Now we could pop over here and we'll see that we still have a k probe JSON, like so. And back to our JSON and paste. And if we scroll down, it is our security file permission this time against our Etsy release. So our prefix is working. Let's take this one step forward and block any rate access to this directory. So back on the selector documentation, let's go to match actions. From here, we can do a whole bunch of things, but the first thing I wanna show you is sick kill.

17:09 Adding Sigkill Action to the Policy

17:15 This just means stop the process. Now this example is fortunately exactly what we want to do. So let's copy our match action here and go back to Versus Code. We come down to the end of our selectors and add our match actions, saying that we want to kill the process rather than letting it succeed. So let's apply this to our cluster, assuming we format it correctly and get our logs. So let's open password, and it is killed before we even do anything. Perfect. Okay. So the last thing I want to show you is one more action.

18:20 Implementing getURL Action

18:25 There's a bunch here that we could work with, but I wanna take a look at the get URL because I think it's quite fun. So let's copy this and come back here. Now instead of killing this, let's use the get URL action and instead of going to ebpf.io, Let's create a box, a request box. This gives me a URL like so. We'll paste this then, go to the terminal and apply. Alright. Let's pull up our logs, not that we need some really this time and we delete our swap, and we get access to the file because this time we

19:09 Testing getURL Action (Access Allowed, URL Triggered)

19:18 didn't do a kill. So if we come back over to RBOX and hit refresh, you'll see that we got an issue to be request, multiple in fact, letting us know that somebody tried to open a prefix within that path. Now a simple GET request in this fashion is not that immediately helpful, but it would allow you to kick off some sort of Slack driven workflow that post a message saying that one of your security profiles failed and maybe you want to start looking at your logs. Now, of course, there's other ways to do that, but it's still

19:33 Potential Use Cases for getURL Action

19:53 fun to think about what ifs. And I think the get URL is what if. So that is our look at file access permissions with runtime enforcement using Tetragon. I feel like we've barely scratched the surface. There is so much awesome things that we can do with Tetragon. And as much as this may inspire you to go and play with it, and I definitely encourage you to do so, please remember that security is an onion. It has many layers. You're not gonna be able to secure your systems with 30 lines of YAML. You need to layer on various syscalls and

20:00 Conclusion, Security Layers, and Next Steps (Process Life Cycles)

20:28 functions and remediation patterns in order to fully protect and observe your system with a tool like Tetragon. But the best time to start is now. So get it installed. Start observing special paths within your cluster and then decide how to act. Best of luck. We'll be back for part three of this course as we explore process life cycles. I'll see you then.

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Documentation

Tetragon Tracing Policy Concepts

linux.die.net Man Pages

Code

Tetragon Tracing Policy Examples

Additional Resources

More about Tetragon

View technology

eBPF for Runtime Enforcement

eBPF for Runtime Enforcement

More about Kubernetes

View all 173 videos

Hands-on Introduction to Kueue

Hands-on Introduction to Kueue

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist