About this video
What You'll Learn
- Deploy Pixie Vizier on Kubernetes with the etcd operator.
- Write PxL scripts to query namespaces, HTTP events, and schema.
- Build Slack alerts from Pixie data using the Go client.
Natalie Serrino walks through Pixie, the eBPF-powered Kubernetes observability tool. We deploy Vizier with the etcd operator, explore PxL scripts, trace HTTP and gRPC traffic, view flame graphs, and wire up a Slack alert via the Go client.
Jump to a chapter
- 0:00 Holding screen
- 0:51 Introductions
- 0:53 Introduction and Housekeeping
- 1:38 Guest Introduction (Natalie Serino)
- 2:49 What is Pixie? Zero Code Instrumentation via eBPF
- 2:50 What is Pixie?
- 4:15 How eBPF Powers Pixie
- 6:28 CNCF Sandbox Project
- 6:33 Pixie Open Source & CNCF Sandbox Project
- 9:13 Deploying Pixie
- 13:23 Pixie Architecture Explained (Edge Storage, PEMs)
- 17:18 Running Scripts in the CLI & UI
- 17:22 Running First Pixie CLI Query (`px script list`, `px namespaces`)
- 19:33 Introduction to PixelScript (PXL)
- 20:50 Exploring the Pixie UI and Default Scripts (`px cluster`)
- 21:59 Pixie Stores Data on the Cluster
- 24:40 Inspect a Script
- 28:44 Flamegraph (Continuous Profiling)
- 29:51 Viewer Question: Profiling Impact and Performance
- 32:09 Supported Protocols (HTTP, gRPC, DBs, TLS, etc.)
- 32:27 Protocols Traced by Pixie
- 35:31 NetFlow Graph and Network Traffic Visibility
- 35:35 Network Flow Graph Script
- 39:56 HTTP Data Script (Full Request & Response Body)
- 40:05 Viewing Raw HTTP Request Data (`http_events`)
- 40:50 Editing a Script: Filter HTTP Requests for Errors
- 40:51 Filtering & Aggregating Data with PixelScript
- 47:30 Learning PixelScript and Discovering Schema (`px schema`)
- 47:52 Viewer Question: Go eBPF Contributions
- 49:08 Using the Pixie API to Create Slack Alerts
- 49:11 Client APIs and Automating Queries
- 50:49 Setting up an Error Alert via Slack Bot (API/SDK Demo)
- 59:00 Reviewing the Error Alert PixelScript
- 59:01 Data Source Tables
- 1:01:47 Workflow Summary: Build Scripts in UI/CLI, Automate with SDKs
- 1:03:01 Running Custom Scripts from CLI (`px run`)
- 1:03:53 Interactive CLI Exploration (`px live`)
- 1:04:16 Wrap Up
- 1:05:20 Wrap-up and Further Resources
- 1:06:10 Viewer Question: Self-Hosting Pixie
- 1:06:38 Conclusion and Farewell
Full transcript
Generated from the English captions. Timestamps jump the player to that moment.
Read the full transcript
0:53 Introduction and Housekeeping
0:53 Hello and welcome to another episode of Rawkode live. I am your host Rawkode. Today we are going to be taking a look at Pixie, an open source tool for zero code instrumentation of Kubernetes and your applications. Before we take a look at that, there's a little bit of housekeeping. If you're not already subscribed to the YouTube channel, I would encourage you to do so now. Click subscribe and click the bell and you will get notifications for all new episodes of Rawkode live. Every week, I try to explore as much of the cloud native ecosystem as I can to bring you all of
1:23 the knowledge that you need to learn this mess together. And of course, we have a Discord server available at Rawkode.chat. If you wanna chat about Kubernetes, Cloud Native, Rock Music or anything in between, come and say hello. There's over 400 of us in there now chatting away, having some fun. Alright. Now, Pixie. In order to guide me today, I'm joined by Natalie Serino from the Pixie team. Hi there, Natalie. How are you? Hey there. Doing great. How are you doing? I'm very well. Thank you. Thank you for joining us today. Can you just take a
1:38 Guest Introduction (Natalie Serino)
1:53 few moments to share a little bit about you and tell us who you are? Yeah, totally. So I am Natalie. I am a engineer on the Pixie team. Pixie was acquired by New Relic in December and so part of that company as well. Before I came in the observability space, was in the data space. So I guess that's my little micro background and wanted to go look for interesting data problems and I think observability is a great example of one of them. Awesome. Yeah. There's there's definitely a lot of data in the observability space especially as we
2:31 are starting to see more teams and organizations, adult micro services with distributed tracing, like that data is growing at crazy rates. Yep. And like most data problems like, you know, collecting the data is half the battle. Yes. Definitely. And I'm I mean, let let's just start talking about Pixie because I'm I'm really curious. This has been one of the the most requested technologies that I've been asked to cover on this show. So I'm really excited to bring this to people but it says here on the website that we can instantly debug applications on Kubernetes and it doesn't require any instrumentation.
2:50 What is Pixie?
3:06 Do you wanna share a little bit about that? Yeah. I think that that's a really exciting thing about Pixie that drew me to working on the project in the first place because I think that, you know, historically, there's been a lot of challenge with collecting the data that we need in order to successfully debug our production systems. And, you know, what, myself and the other members of Pixie found in the past is that debugging your developer environment is one thing, but debugging a production environment is completely different. And, it feels like you're riding blind a lot
3:42 of the time, and we'd like to have access to a lot of the same data that we can easily get in our development environments. So I think that was kind of one of the initial inspirations for it. And, you know, this coincided with the, you know, kind of popularization and, you know, adoption of eBPF in the Linux kernel, popularized, like, in large part by Brendan Gregg who has done a lot of amazing work in the space. And I guess just a little bit of background on eBPF for people who, you know, aren't as familiar with it.
4:15 How eBPF Powers Pixie
4:18 It's basically like, I think of it as a way of registering hooks in the Linux kernel. So anytime you open a file or anytime you drop a packet or anytime, you know, you receive a request on the network, all these things you can put hooks into and, you know, tell the kernel to run a function that you register with it. And that actually has huge, huge benefits for things like security where you can use it to put in really, advanced in networking rules, but it also has great implications for observability because it basically gives us a handle on the
4:58 total state of your system without having to put any code changes or anything like that in to collect that data. And we found that we can collect a very rich set of data and we're only scratching the surface. We can use eBPF and Pixie to automatically collect full body requests made by, you know, anywhere in your cluster, you know, DNS stuff, you know, MySQL, Postgres, and also things like network statistics and system metrics. So we're really excited because in, you know, basically what you'll see as a pretty quick deploy process, you basically get all that information immediately when it would have
5:39 previously taken months to get. Yeah. Definitely. I think EBPF is just one of the coolest technologies that we have right now. There's there's so much cool stuff you can do with it with regards to these probes and the kernel hooks and and such. I was I had never really thought about it before as an application for collecting telemetry but it's like without having to modify my code but you know what now that I've seen Pixie's website and I've heard you talk about it like I'm like yeah, like why why wouldn't we do this because you know,
6:09 the kernel is our our application is an interface to the kernel like we already know what's happening at some level within that stack. There's metrics from it and other things. Okay. Also, there's I should really just show my screen. Shouldn't I? Let's pop that up. There was a oh, no. It's gone. There was a banner at the top that says that Pixie is moving to the CNCF as well. So the Pixie is completely open source and soon to be a CNCF sandbox project. Project. Is is that correct? Yeah. I'm really glad you brought that up
6:33 Pixie Open Source & CNCF Sandbox Project
6:41 because it's, you know, recent news and we're so excited about it. So I guess just like little historically, Pixie was originally a startup. And then when, you know, we decided to partner up with New Relic and, you know, join them, you know, we were really excited that they shared our vision of reaching even more developers than before. And, you know, in this space, you know, open source is really important for people. People really like to build on top of open source stacks. And so we were really, really happy that they shared our vision of turning
7:16 Pixie into an open source project. And with the CNCF, with the amazing caliber of projects that they have in, you know, in the CNCF, we're so thrilled to be a sandbox project that was, you know, made a formal few weeks ago that they've decided to accept us as a sandbox project. And so we're really excited to have, you know, like, tighter relationships with those projects and also just show our commitment to remaining a truly open source project. Pixie is entirely open source and licensed under Apache two point o, and it's gonna stay that way. So that's our commitment to our
7:54 users that that is what the situation is gonna be going forward, and joining CNCF is a good way to show that. Alright. So everyone watching today has the opportunity to learn Pixie, use it in their own applications and never have to pay a dime, which I think is pretty sweet. And they don't have to use New Relic just to clarify, right? Like you can use Pixie without using New Relic at all. So New Relic are essentially just helping Pixie grow but not integrate and enter product or something like that? Like Pixie is its own thing, right? Yeah. It's a good question.
8:29 So, you know, if you if you were wanting to use the purely open source, like self hosted version of Pixie, all of those components are available and we have docs on how to get started with that. We also have a version of Pixie where the cloud component is hosted, by us, with a very, very generous free tier where, you know, if you want to do a little bit less legwork in getting started, then you can, you know, have us host the cloud side of that. Now New Relic has really awesome integrations with Pixie, but Pixie is its own thing. So
9:09 you don't have to use New Relic to use Pixie. Awesome. Okay. Should we get this installed into our cluster? Let's do it. Alright. Now I can already tell from the homepage here, we have every developer's favorite curl bash. Is that the way that you must install today? Yeah. I mean, if you go into our docs, you can see that there's other ways to install as well for people that prefer other modes. But I think that yeah. If you go to install schemes, and you can just click on the, like, a parent of that, like, schemes.
9:13 Deploying Pixie
9:45 Oh, yeah. Never mind. It just did what? But we have helm based and, you know, YAML based installs as well for people who prefer to go that route. But I think that for the purposes of this, I think that I recommend starting with the bash based install. Yep. You don't have to convince me. I'm so Okay. So, this is obviously running on my local machine. So, is this going to install Pixie CLI to my machine? Is it gonna install something to my Kubernetes cluster? What should I expect here? So what this is gonna do is install
10:20 the Pixie CLI locally to you and you're going to authenticate with oh, I, we use Gmail or G Suite for authentication in this version, so hopefully that's okay and you have a personal Gmail account you can link it to. Yeah. No problem. So it's gonna install a CLI and then basically ask you to sign up. Oh, you'll wanna sign up actually unless you've logged in before. Oh, yeah. Yeah. Okay. Good catch. Yeah. Yeah. Sign up with Google. And then if that's a good Gmail for you, then that's great for us. Oh, maybe you have actually signed up in
11:03 the past. Oh, no. Okay. Sorry. It just lagged. Okay. It looks like it worked. Okay. Sweet. Yeah. It may be okay. And so, you know, basically, the Pixie CLI has a lot of stuff that you can do. The first thing that most people do is install Pixie itself onto their Kubernetes cluster. But before you do that, I just wanna point out that we have, like, two modes that we can install Pixie. One way that, you know, is kind of the default and it uses, persistent volumes and PebbleDB is the backing store. But we also support
11:42 a version that uses the etcd operator, which doesn't require persistent volumes. So I think that for this cluster, what we might wanna do is use that version of Pixie. Yeah. Okay. Happy with that. Because I think that you mentioned that, the persistent volumes, like, may need to be set up. So, let's just skip that and just go straight to it. Yeah. I'm okay with that. This is a a bare metal cluster, and I do provision Rooka on them, but it's definitely not something I'm happy to try and debug live. So if there's a a no PVC
12:18 option, I'm definitely gonna gonna grab that. Yeah. So if you do PX deploy dash space dash h, then we can see the deploy option. So what you have there is all the options for, you know, the PX command. So we can see that dash o gives us use etcd operator. Operator, and that's the one that we are, you know, looking at right now. So let's see what happens. Okay. So that's all I need. It's just that flag to see who's the operator. Nothing else is required? Yeah. It's gonna try to deploy to your local Kubernetes context, your current
12:51 context. And so if you need to switch context, then, you know, that would be something good to do. Nope. I am on my Pixie one three node cluster, so we should be okay. Okay. Great. Okay. So this is okay because this is a bare metal Kubernetes cluster, which should still work even though it's giving you a scary warning message, and we like to ride dangerously. Alright. So just say yes. Life on the edge. Life on the edge. Quite literally because Pixie actually collects and stores all of its data on the edge rather than shipping it off
13:23 Pixie Architecture Explained (Edge Storage, PEMs)
13:31 to a cloud. Ah, okay. So it's creating a namespace, looking for a previous installation, I guess, and cleaning up. Now we're getting some secrets and some config maps. I guess this is just a good You know, Pixie is like a Kubernetes it's a set of Kubernetes services, basically. And so we have some dependencies, like, you know, we'll be deploying NATs and etcd, and then also deploying the, what we call the Vizier PEM nodes, which are the ones that are responsible for actually deploying the eBPF probes and collecting the data that you can query. Okay. Well, I'm a big fan of NAT, so
14:18 that's cool. So I'm assuming we've got collectors that are publishing messages to NATs and then are those being consumed by some sort of workers? I'm just gonna guess how your software works now. Sorry. Yeah. I know. It's a great question. We're also really big fans of NATs as well. So, basically, what we have is, a PEM node will be deployed on or a PEM pod will be deployed on every node in your cluster, unless you have something like a taint that blocks it. And so these PEM nodes are going to deploy the BPF probes and start collecting data
14:52 and storing it, you know, within the PEM. So the data is actually not being sent over NATs. It's actually just being stored on the node that it exists on because we really wanna be a low burden to the network and just kind of keep data where it's collected. That's one of the things that, you know, we're trying to do. So, but, you know, we use NATs for lots of different stuff, like communicating the current state of the PEMs and make sure they're healthy or sending messages about Kubernetes metadata that, you know, the PEMs need to
15:26 be aware of when a query is run. Alright. Okay. Well, it looks like everything has deployed and now we're waiting for those PEMs slash Kelvin. Yeah. So basically, at this point, all of the prerequisites for the cluster have been deployed and now the PEMs themselves are being launched. Kelvin is you can kind of think of it as like the collector node because when you run a query, you may be asking about data on multiple PEMs at once and they need a common place to send it to for the reduced stage of the query if you think
16:04 of it that way. So Kelvin fulfills that role. Okay. We have a question on the chat which I'll throw up for you. But Noel is asking what is Sentry? Oh, yeah. That's a good question. So, Sentry is, something that allows us to track errors in, Pixie when people run into problems. I think that, you know, that is something that, is, going to be optional. So people can kind of choose if they want to let us know when they're having a some kind of crash. Yeah. I think I've I've used Sentry before. It's like an SDK you can bundle in
16:47 your application which catches exceptions and then throws them off to the Sentry thing and can keep an eye on them and see how often they know recur and such like that. It's a it's a really cool open source tool as well, I believe. Yeah. It's it's a really good way of finding out, you know, what your problems are and what problems your users are running into that you may not be hearing about. Yeah. In a former life, I used to work on Android and iOS apps, and we always bundled Sentry. And it's it's overwhelming in
17:13 that context when you've got enough users. So I don't recommend it for that use case. So it looks like it's successfully deployed, which is great. The demo gods are with us. So I don't know before we use the UI if you wanna try running a query in the CLI just to kinda show off the fact that you don't even have to leave the terminal. So just just to clarify then, right, like that Yeah. That PX deploy dash o Mhmm. Has done everything. We're we're now collecting telemetry from this cluster. Yeah, that's right. Previously where you might have
17:22 Running First Pixie CLI Query (`px script list`, `px namespaces`)
17:49 had to spend quite a bit of time investing in the data collection, it's basically been entirely done for us in this process. You can always extend the data that we collect. We have the ability for you to deploy custom BPF traces and things like that. But, you know, right out of the box, you get access to all of your HTTP requests, your c SQL requests, your CPU usage, how loaded is your network, you know, just with this one command. Okay. So you suggested we can do something on the command line here to to actually introspect
18:27 that in some way. So Why don't you do PX script list and then we can kinda see, like, if anything pops out to you as something that's interesting. There's a lot of results here, but I can also recommend a few if that's easier. But, you know, I think a good one to maybe a really simple one to do is p x namespaces. Just kinda tells you what are the namespaces that you have running. So if you do p x run, you can just do space p x slash namespaces to execute that script. Right. We can kinda see this is a very
19:13 simple script. There's a lot fancier ones, but just didn't wanna overwhelm people all at once in the CLI. So we can see the different namespaces that we have, the number of pods and services that they have, as well as the resource consumption that we're seeing aggregated by namespace. Okay. So yeah. So I just want to make sure I understand this correctly. So PX run allows me to run Pixie scripts. What what what is a Pixie script look like? Can we can we look at what that that was? I'm so glad you asked. That's one of my favorite parts about Pixie actually
19:33 Introduction to PixelScript (PXL)
19:46 because we made it a 100% scriptable platform. And what that means is that the scripts that you write or the scripts that we have provided right out of the box are the thing that you use across the UI, the CLI, and our client APIs. So it's one consistent way to run queries. Pixie's language is Pythonic, so we use Python syntax so that people don't have to learn a brand new syntax. And we were really inspired by, Pandas, if anyone's familiar with that, which allows people to, you know, it's a really great tool for, cleaning up, preparing, and analyzing your datasets in
20:26 Python. And we found that all the stuff it does, we also need to do those things. So we decided to just follow their API rather than reinvent the wheel. So when you look at it, it will kinda just look like Python code. If you wanna look at scripts, one way that you could do that is you could clone our repo, and I can show you where the scripts are in there. Or we could just open the UI and then we can always inspect the script that's generating a given view. Okay. So if I click this link here
20:50 Exploring the Pixie UI and Default Scripts (`px cluster`)
20:57 that it gave me after I ran. Okay. Let's let's clarify the most important thing we'll clarify today. The command pronunciation. Is it Pixie or Pix or PX? I don't know, man. I mean, I kinda like to watch Flame Wars, so maybe we should explicitly not weigh in here and then see if people develop a strong consensus. But I I will admit I say PX. PX. Okay. I'll I'll go with PX. I'll I'll I'll go with you, but the audience can feel free to bring in their own flavor. So let's see if I can actually log
21:30 in after that little maybe full pie. Yeah. Flash. It worked. Okay. Cool. Yeah. So as you can see, when you run a script in the CLI, you have the ability to click and just access that same exact thing in the UI if you wanna send it to someone or something like that. Okay. So so many questions. So there's no data leaving my servers to go to the to go to your servers, right? Is this querying my cluster? I I how does the data transfer work here? Yeah. Let's dive in. So we want to store data on the edge that it
21:59 Pixie Stores Data on the Cluster
22:18 was collected. We think that that is a really good way to lower the network utilization because in your production system, you don't wanna be maxing out your network cards sending out telemetry data. So that's a really good thing from a system perspective, but it's also a good thing from a privacy perspective because we Pixie pedal. That's funny. Because, you know, we recognize that, you know, telemetry data is sensitive. And when you keep it where it was collected, that's a way of adding another layer of privacy. So we have two modes with Pixie. The first mode,
22:54 basically uses Pixie Cloud as a proxy so that when the UI has a query, it proxies it through Pixie Cloud. And in this form, the data is always traveled entirely encrypted, never persisted on the Pixie Cloud side of things. But for people who would prefer that this not be the case, we also have data isolation mode, which basically says when I'm running the UI, I will actually query my cluster directly, and there won't be any proxying. This can have downsides because you always need to be able to access the Kubernetes cluster when you use the UI.
23:31 And so we think that, you know, both modes are valid, but we default to the pass through mode because of the user experience it provides while still maintaining the promise that we don't persist it anywhere else. Okay. Perfect. We have another question in the chat, which doesn't really is very specific to this, but Vikram is asking, can we run the UI on our own cluster or local host? Is that an option? Yeah. It is. So if you do the fully self hosted version of Pixie and you can, you know, just check out the docs for how to run that, then
24:05 you would be basically running it yourself. Got it. Okay. So this is the same representation of what we got on the command line here where we have a kind of TLDR on our namespaces and what's running in them. Then we've got telemetry showing us disk throughput and yeah. Desk throughput and network stats. I'm sure there was some network there. Mhmm. Maybe. There might not be. Maybe I should change the script. But I wanted to start really simple because I think this is a good before we dive into kind of the rest of what's going on here, I wanted to just show
24:40 Inspect a Script
24:42 the script for this view. So if you scroll up and then you click that little bit, that little, like, cursor, yeah, that editor, basically, you can see the code that produces the script. So, basically, you can see the namespaces for cluster function will produce the table that is the top one. It's basically getting a list of namespaces in the current cluster. And then one, the next function is producing an overview of the resource consumption by namespace. So these are the two views that we're computing. Okay. Yeah. It does just look like Python code. Yeah. And that that is that is by
25:30 design because we're making this for developers and we, as developers, want our observability pipelines to be more like writing code. Okay. But for people that don't wanna mess with the custom scripts, you don't have to. We have a ton of scripts that ship right off the bat, but we do find that, you know, power users who really like to get into it do enjoy writing custom queries and drilling into stuff that's like especially important for them in particular. Well, yeah, I guess because, you know, Pixie understands what data is being collected that you can already produce all these scripts up front
26:04 that just represent that in the best possible way. So it kind of makes a lot of sense. What's a VED spec? Is this just a display we've got here? Exactly. It's just telling you how you wanna lay out the results of the query. So we have more than just tables as we'll see if, you know, we go into the cluster view, for example. But with the VisBag, you can specify what type of visualization you want. So I recommend switching to PX cluster because that's kind of like the starting view that we show to people in most cases.
26:38 And I can walk you through what we're seeing here if that would be helpful. Yes. I see a graph, but I'm not entirely sure what the graph is representing. Yeah. So this is our service map. So it looks like we're detecting services on your cluster, but we're not necessarily detecting the services talking to each other. And that can happen when a pod isn't part of a service and it's talking to things. So this is it. You know, it may not be the prettiest graph, but it, you know, we we often see results like that. If you scroll down,
27:16 you can see the list of nodes. And if you look at the node name, you can actually click that link and actually be taken to a view about that node. So we really wanna make a really easy drill down experience for people. And so we have these hyperlinks embedded in our tables in multiple places where you can just click a link to see more about that entity. So here we see the pods that are running on this node, the CPU usage of the node, and other things like the traffic and the bytes. Nice. I like it when you get shared cursors
27:53 like this across panels and graphs. Yeah. Because some the shared lining it up like these two things happen the same. So this is a bespoke UI. Right? This is this is written just for Pixie. Like, do people have the option to use Grafana if they want or would you encourage using the the Pixie dashboards? You know, it depends on, you know, your unique situation. So we do have a Grafana plugin that's available and open source. So if that's the environment you're comfortable in, you can configure Pixie as a data source and use it to help build your Grafana
28:29 dashboards. But if you kind of just want a super easy like, you don't have to, you know, set that up or whatnot. You can also just use the UI and, you know, we think that either one is a good thing to do. But the flame graph, yeah, I'm glad that's being called out. So, yeah, that's only a new feature we're pretty excited about because, you know, this kind of data can be kind of hard to get in most cases. So, we're basically running profilers on your, you know, various nodes and seeing like where is the time being spent.
28:44 Flamegraph (Continuous Profiling)
29:08 And, we have heard that people really like that because it can help isolate bottlenecks. Yeah. That's pretty nice. I like that. I wanna see what other scripts we've got. Oh. Oh. I just zoomed. That was cool. Oh, yeah. Yeah. Oh, that is nice. There we go. I kinda wish I had like planned ahead and stuck a few thousand workloads or something on this cluster to kinda see more but we're still getting a lot, mean considering this is a really vanilla Kubernetes cluster that's just running the control plane Rawk and Cilium. We're already getting a whole wealth of information out
29:45 of this immediately which is pretty nice. We have a question from Noel if you want to tackle that and Noel is asking does this profiling have an impact on you know, our CPU and memory and such? Yeah. For sure. So, you know, with these types of tools, we realized that it's so important for us to take as minimal of a footprint as possible while still providing really rich geometry. Because, you know, I take it very seriously what runs on my production cluster, and I don't want that to be something that in a bad time is eating, like, 50% of my
29:51 Viewer Question: Profiling Impact and Performance
30:21 CPU or something like that. And so with that being said, we we target that Pixie will use under 2% of the CPU of each of the nodes. But, you know, just for some buffer room and practice, we say that it should say under 5%. But we're always looking to drive that down. Okay. Can we confirm that through the what's being logged? Oh, hell hell yeah. Definitely. So if you wanna go click on the PL Vizier pen pod in this you don't even have to exit out of this view. You see that PL Vizier PEM pod? Yep. So let's let's
31:01 take a look and see if we're living up to that. If not, I can file some bugs. So Oh, wow. That's the CPU usage for this pod and it looks like we are seeing under 2%. So that 5% qualification, I guess, wasn't necessary here. Yeah. So we can actually see yeah. We've got all the major stats on this pod and what what is happening. Are these functions and syscalls? Pod two. Oh, sorry. I think I missed that. Are these the functions or syscalls that that application's calling? You know, so for compiled languages, we're actually able to get the function names.
31:43 And so if you're running a c plus plus c or go application for example, we're actually able to extract out the functions themselves and how much time that they are each causing, not just the system calls. Okay. So you'll see a lot of C plus plus in there because that's what the PEM is made on. That's pretty magic. I like that. I wanna look at something else now. Can I just change pod? Yeah. You can change pod for sure. You know, you can also like as you were doing browse the scripts and see what is up with those.
32:09 Supported Protocols (HTTP, gRPC, DBs, TLS, etc.)
32:21 Yeah. We can definitely take a look at more scripts. So this this is our Cilium pod which you know handles all of the networking because he's making HTTP requests. I'm curious then, like does Pixie understand all the major protocols like HTTP, gRPC, Kafka or does it does it not care about that stuff? Yeah. You know anything that uses HTTP you know, will automatically be traced. So sometimes those protocols will use HTTP, you know, kind of underneath the hood. But we also do support, like, many different protocols in addition to that. And we have a list on our doc
32:27 Protocols Traced by Pixie
33:00 page, but I can just rattle off some examples. We support MySQL, Cassandra, Postgres. We support DNS. We support encrypted encrypted HTTP traffic in addition to unencrypted HTTP traffic. We have support for other ones, and I'm just blanking a little bit. But if you wanna just pull up the docs, we can take a look. Yeah. Yeah. Definitely. I'd only expect you to remember all this off the top of your head. So if you go to supported protocols at the very top of the if you scroll up Yeah. Supported protocols. Yeah. Yeah. Oh, I forgot Redis. Redis is a
33:40 big one. And we're always looking to add more protocols. So Kafka is a very interesting one. Okay. So at my applications, all my cluster where Pixie is deployed is used in h t t p, h t p two, g r p c d n s, etcetera, telemetry is automatically gonna be collected for me. Mhmm. And we'll also ship Pixie with views that query that data and present it to you in an easy to consume fashion. So support that protocol, like, also includes like, you know, scripts that you can easily access. Okay. That is very very cool.
34:16 Okay. So we can see the processes on our pod. Network stats. Yeah. I really be able to see stuff by container too. Even just seeing the request being made, we've got the p 99 here. That's just invaluable information when you're debugging microservices. Very, Yeah. And I don't know how easy this is to discover, but if you see those little, like, white tick marks, if you click, like, let's say, like, the one in the middle of the bar in the latency p 99 section. So if you just click another one Oh. Oh, is it? Yeah. If you just click
34:52 that, it switches everything to p 50. So now you can see all the fifties. So you can drill down into p 50, p 90 and p 99 in this column. And you can sort as well. Okay. Got it. And then if I expand it, I can just actually see those values here too. Yeah. That's useful when you have a lot columns in the result table and you kind of want to look into a particular record in detail. Okay. That was impressive. I really liked that. Let's take a look at another script. Do you want me to
35:31 NetFlow Graph and Network Traffic Visibility
35:33 pick one or is there something you'd like to take a look at here? Let's do NetFlow graph. I really always enjoy that one. And you're gonna have to put a namespace in. So if you want, you can put in PL because I know that one exists, or you can put in one of the namespaces you're interested in looking at. The type one is not, like, giving me a If you click on the if you click on namespace to filter on. Is it if you so you see where it has the text namespace to filter on? Oh, yeah.
35:35 Network Flow Graph Script
36:07 Yeah. Alright. Okay. Yeah. You just type p l or or default. Default works too. There we go. So it's a little bit overwhelming, but what we're seeing is a map of the network traffic that we have in this namespace. And we try to resolve as best we can all the places that your different pods are talking to. Okay. So we can actually see which of the Pixie pods are communicating with one another through all these lanes. What's the difference between the blue and the gray kinda node representation? Mhmm. Yeah. So, basically, the blue is more like,
36:53 it's more considered like remote, I guess. So the non blue is the pod in your namespace, and then the blue is anyone else that it's talking to because we have that namespace filter in here and that filters the gray ones. But we know that these pods may be talking to destinations outside the namespace or even outside the cluster and we wanna still represent that information. Okay. What happens if I click enable hierarchy down here? Let's see. Sometimes that can be a nicer representation. Oh, I can zoom. Mhmm. Mhmm. And drag it around. Yep. So now we can see more clusters of
37:36 behavior. There we go. That's a bit better. Mhmm. Yeah. The other one was a little a little extra. That's cool. I mean, how does it get like the the domain names if it's encrypted traffic? Or is that all it gets? It doesn't get paths and such? I'm not actually sure. Well, we Pixie has access to be able to trace the the encrypted traffic when it's unencrypted by the system. So we can actually give you a full request because we're collecting it in the place that it is already encrypt unencrypted, if that makes sense. Yes. Okay.
38:16 There's a loud siren. Hopefully you can hear that. Okay. So then we've got so this is just all the requests. Okay. So we can see where where the source was, where the destination was and we've got bytes sent and bytes received. Yeah. This can often be a pretty eye opening view for our users and it can produce some unexpected results sometimes. We often hear things like, wait, those two things shouldn't be talking. What the heck? Yeah. Yeah. It's just good to get in, like these are things I think that are so easy to overlook in your clusters that you just,
38:55 you know, you don't have observability in them or you just forget or you're gonna do it later and like this took all this took me all of what ten seconds to deploy and get this. So yeah. Very very cool. EPPF is so cool. Really it's really cool stuff and we're really excited to be building this because, you know, as developers, we're trying to build the thing that we ourselves want. Oh, so this funcs is basically, like, trying to help the user if they wanna compose their own queries and they wanna know what functions are available.
39:29 Now, this also exists in our docs and I would argue it may be more easily consumable in the docs, but, just an example of the type of kind of like helper view that we can also provide in addition to the more typical views. Yeah. I I kind of accidentally clicked on that. Wasn't something I intended to click on so Oh, okay. But I was getting when I clicked on it, I like, oh, wonder what this is and then I had no idea so I'm glad you explained it. Alright. Let's take a look at maybe two
39:56 HTTP Data Script (Full Request & Response Body)
39:59 more and then we can see what other functionality we we have available here. Maybe HTTP data could be a good one. HTTP. I think that a lot of times, like, it's good to see the computer results, but I also just love to see the raw data sometimes. And this is a very raw view. It's a little I would expand a row because it's it's very wide. But this is the type of information that we're able to collect. We're able to collect the request and response body, the headers, the time, the pod that it happened on,
40:05 Viewing Raw HTTP Request Data (`http_events`)
40:32 the path. Yeah. So we've got full visibility and to each of the HTTP requests that were captured by Pixie. Right. Right. Okay. And each protocol will have its own kind of equivalent view of just looking at the raw data. So I'm curious, like, you know, we're using Pixie scripts that are just pre canned for us. Like, if I wanted to be able to see, you know, because this is, you know, HTTP aware, it possible for me to see all the four hundreds or five hundreds and be able to keep a track of things that are maybe broken
40:51 Filtering & Aggregating Data with PixelScript
41:07 in my custom? Oh, definitely. So we do capture the codes and that's, you know, one of the ways that we produce plots of, like, errors per service and things like that. Maybe it would be good to busy response status, c 200. So you can write a pixel script that, you know, aggregates on response status or, you know, does all the things with the code or counts the number of things that are not success. We do that in a lot of our scripts. And so, like, any of these parameters that you see, like, you can write scripts
41:43 to analyze. And we hope that the default scripts we provide cover, you know, maybe, like, all the common cases so that people don't have to do unnecessary work. But if there's something special you wanna ask, then, you know, this type of data you can just query out. Okay. So do you want to do something with that or do you want to take a look at one more script and then move on to that? What's your preference? I don't know. Both are pretty good. Why don't we edit this script and just make it only show errors?
42:16 Alright. Okay. So we So let's go to the pixel script. So sorry. Are you saying is it pixel script? PixelScript. Yeah. You can say PixieScript too. Either one. But, it's technically PixelScript. PXL is how it's spelled. Gotcha. Okay. Yeah. I think it'll be fun to make a little modification to this to this script. So, basically, what this script is doing is it's pulling the HTTP data, and it's extracting a little bit more metadata to show in the results because we're able to resolve things like the namespace and the node and the pod that each request,
42:53 you know, handed on happened on. And Mhmm. We're also dropping some columns that we don't think are maybe that useful to people. So I'm assuming based on kinda what I see that there's probably like a d f filter. Is that would that be right? You're so close, but, I'll tell you the, I'll tell you the syntax. So you'll say d and and you'll actually yeah. So d f equals d f and then we're gonna put brackets, like square brackets. And then this is basically pandas' way of doing a filter and we're gonna put a boolean
43:29 condition in here. And that boolean condition will be used as the filter condition. Alright. So we'll have to, like, say status greater than 400? Yeah. But, the only modification that you you got almost perfectly is that you'll do d f dot resp status because we have to specify which data frame, the column exists on. So let's try running that and see what we see. And if you collapse you can collapse that editor. And now we'll see only response statuses that are Mhmm. That's pretty nifty. That's quite a lot. There's actually more than I was expecting.
44:19 Yeah. Was Alright. Now if I wanted to do like an aggregate count of that, is that something I can do in Pixelscript as well? Oh yeah. Why don't we count it by pod? Alright. Okay. That sounds good. I'm not gonna try and guess that. We're gonna take off the d f. Head because we don't need that. That's just basically saying I only need a thousand records. So I'll just walk you through. So you're gonna d d f equals Is that equals? It looks like a dash, but maybe it's not. It's equals. Yeah. Okay. Okay. And then you're gonna do d f
44:54 dot group by. And No. It's it's just, like, just all lowercase, no, underscore. Okay. And then you can do parens. And then in, quotes in single quotes or you could do double actually too. You'll do pod. So, basically, this is saying we're we're doing an aggregate that's gonna group by pod. And then after that, after the last paren, you're gonna do dot ag or or a a g g. Oh, ag. Correct. Okay. And then in here, let's name the output column. So let's just say, like, num underscore errors. It doesn't have to be in quotes. It
45:35 will just be, like, a key Mhmm. Argument. And then equals. And then we're gonna have, like, a tuple here. And the first element of the tuple is gonna be the input column. So let's just say, time underscore. Doesn't really matter because we're doing account. And then that will be in quotes. And then comma p x dot count. And so, basically, what this and then that will yeah. It perfect. So what that's basically saying and our docs have more detailed information for how to compose these queries if that was, like, completely alien. It does it does follow the panda's API.
46:20 So there is a method to a method to it, and then the docs basically tell you about all of these different things you can do so you can use them for reference. But what this is saying is take the pod column and then count the number of errors per pod. Okay. So will I hit run on this? Yeah. Hit run. We can There we go. And it looks like looks like core DNS pods are a source of the four four or 400 plus errors. So that Yeah. Is something that we learned about your cluster today. And that's within the last five
46:55 minutes? Yeah. Is that Yeah. Okay. So you can edit that to be bigger or smaller but, know, we find that people tend to be most interested in looking at Oh, you made you made a mistake. We need to work on the UX for this a little bit. You actually have to do negative because it's like a relative Yeah. Of course. That makes sense. But I I consider that to be kinda our bad, not yours because it would be very easy to make that mistake. Okay. So if someone else wants to learn how to write Pexelscript,
47:30 Learning PixelScript and Discovering Schema (`px schema`)
47:31 I'm assuming they come into here. We've got is it just reference? Mhmm. So in operators, for example, you can click on ag and then see, you know, that's what we did. There we go. Excellent. Okay. Let's tackle a couple of questions and then we'll we'll we'll see what else we can do here. So Suresh is asking, are there any improvements to go e b p f observability done and are they available open source? I I guess the rest is asking whether they know that the Pixie team is is is working on Go eBPF and contributing there.
47:52 Viewer Question: Go eBPF Contributions
48:14 I'm not sure. Yeah. So I would say that we're more consumers of eBPF but, you know, we hope that the stuff that we provide can be an example to other people who wanna use eBPF themselves. And so you can look at our, you know, repo and see what we do as an inspiration for how you can also incorporate it. Okay. And I'll just put up a comment from Noel who's enjoying my editor there on core DNS. I never said I would produce a Philly working cluster. Just just enough. That's as that's as good as it gets.
48:53 It's not uncommon. Well, Kubernetes is hard. So alright. So we've got links to the Grafana data source plugin here. Let's say we've got some API documentation. Oh, that go reference docs. Okay. Yeah. So basically, we have client libraries if you want to automate, you know, pixies pixel scripts and use the results to drive CICD stuff or, you know, we have a tutorial to set up a Slack bot that can ping you with results of, you know, pixel scripts. That can be, you know, a pretty quick way to get alerts about what's happening in your cluster if you, you know, wanted to
49:11 Client APIs and Automating Queries
49:36 have been warned sooner about those four zero four's or something like that. Yeah. This I mean, I'm not gonna sit and work through all of this documentation but I think I'll be playing with this at some point in the near future. Now you mentioned alerts and there's a Slack alert tutorial here. Mhmm. Do you wanna work through this and and take a look at the process involved there, maybe setting up an alert? Like I don't think we need to throw it to Slack, we can use a HTTP banner or something but unless unless it's a Slack function, I'm I'm
50:06 not so for this tutorial, we've actually like, you know, made all the boilerplate to like Oh, okay. Do the Slack ping and stuff like that. So it might be easiest to do. So, you know, we don't have to do it, but this is just a quick way of basically looking at 400 errors in your different services and just being alerted to that. I did it in a few minutes last night and it's pretty easy. So we can dive into this if you think that that would be a good thing. Well, yeah. I think, you know, what we've
50:40 shown so far let me just close that one. Is, you know, Pixie is already collecting a wealth of information on my cluster and I haven't done anything. Alright. So we've got a big massive text there like I want this. But now we've taking a look at the scripts and there's loads there, know, we could always take a look at a few more just before we finish but there's always different representations of things happening in my cluster. We then modified a script and we're able to actually get a count of the four zero four as we see core DNS is
50:49 Setting up an Error Alert via Slack Bot (API/SDK Demo)
51:06 unhappy and what I'm thinking is like, there's definitely something I would want an alert from. I'd want to know if I have I don't know, I don't know if we can work out like the medians or what the standard deviation is and stuff but I probably want to learn something with those four zero four's. So maybe if we can put that together that would choose a nice end to end example of what what Pixie is bringing to my cluster. Yeah. That sounds great. Why don't we just work through that tutorial and you know, just bang it out.
51:36 Okay. So do I need to do all the Slack setup or can we just do the actual Pixie bit and have it? Let me let me do the I'm gonna create a private channel in the Pixie community repo and then we can use I'll create the slack bot on my side. Okay. Yeah. Because I just realized that they don't have a Slack. Right. Right. So, like, let me just get that started and I can send that to you. And in the meantime, you're gonna wanna track down the cluster ID and create an API key in Pixie. So
52:20 why don't we parallelize this Mhmm. And you can collect those pieces of information. And I can help you find them if you have trouble. Information and I can help you find them if you have trouble. Okay. So should I do the Pixie deploy just now? The demo deploy? I don't think that you have to but Alright. Okay. I think that's just if people wanna see more data but I don't think that that's necessary. Alright. Okay. That's deploying the Pixie Socks shop demo. I guess that's a micro service thing. Okay. Yeah. We're gonna have to make a
52:47 little modification to the pixel script because it filters on that particular namespace, but we can just quickly modify that. That won't be a problem. Okay. Sounds good. So I'm running Pixie get Visiers and I have my cluster ID. That was easy. Mhmm. Okay. And then now I need an API key, so I'm gonna follow the docs for that too. Nice and easy as well. Perfect. Alright. Should I clone the demo repository? Yeah. Please do. Select. Does it matter if I pick go or python? Nope. Nope. Okay. I'll do Python because the instructions are Python. There you go.
54:03 Oh, I don't have pet. Okay. We'll do go. Alright. So export my cluster ID. I'll let that build first. And, you know, you'll wanna actually modify the pixel script where we filter on namespace. So I would just find that in the HTTP errors dot pixel, and I would just delete that line that does that filter. Right. Okay. Yeah. I can do that. Okay. So I need to grab my API key and then you want me to modify this. PxDoc. There we go. And can I just use the PR namespace? Yep. That works. You can also just take
55:15 it out altogether and just do a class namespaces. Done. Alright. So I'm assuming I have to do a go build again. I'm assuming that's maybe templated in somewhere. It's I I don't think so. Think you can just do go run. And all I need now is a Slack bot token. Yes. I'm about to send that to you. But first, I'm gonna create the channel. Mhmm. Oh, I'm being mocked again by Noel. Come on, Noel. Be nice to me. I am always flashing API tokens on this show. That's true. But I always delete them straight after. I promise.
56:07 Okay and then once we have our Slackbot token we're gonna do a go run Slackbot. And that's us. Yep. Easy. Okay. Okay. I'm gonna add you and I'm going to invite the bot. Okay. I see Pixie alerting added to the channel. Okay. I'm sending you the code over DM. Thank you. Alright. So you're happy for me to share the Pixie test Pixie alerting channel on the stream? Mhmm. Alright. There we go. Oh, actually, one more thing. We have to edit the channel in the script itself in the code because it actually but but you do export first.
57:10 Do the export first. Okay. I'll do that over here so people don't see that. Okay. And then we wanna modify our pixel script. Oh, you did that, so that's great. Okay. And then could we also modify the function? It's not in the script, actually. It's in the so if you do Oh, Slack button. Oh, don't open the binary. There we go. Yeah. So now we're gonna just yeah. So just change that. I named it test Pixie alerting. So we're gonna just yeah. Great. So let's try go run. Let's see if it works. Yeah. We we did everything there.
57:59 I think so. Hey. It works. You can maybe just show the Slack. Oh, look at that. So Pixie Alerting sent us a message telling us that we had four x x spikes in the last five minutes. Mhmm. And you can, you know, obviously, you know, make it more conditional and things like that or run a different script, but, you know, we just wanted to kind of set up the full end to end thing for people with an easy to use example that they can then customize to their case. Okay. So I mean, that's the exact
58:40 thing that we discovered with my core DNS. So was that just complete serendipity that that's what the alerting script was set up to be? I think a lot of people are interested in HTTP errors. Alright. Okay. So I would say it's serendipitous, but I would also say that, you know, we try to target a common use case. Okay. Are you happy for us just to kinda take a look at that script and actually see what we have here? Let's do it. Yep. Okay. So we've got an import p x. I'm assuming that's just our our Pixie data frame integration
59:01 Data Source Tables
59:13 thing. Yeah. That's basically the module that we, you know, provide. Okay. The table is the HTTP events. So how would I find what that table is from my Pixie UI? Like if I've got data that I wanna start writing my own Pixie script from or would I just copy it? Like, are you saying that like could you just copy the script itself? Yeah. I was curious how did I know how do I know what tables I have available? Oh, okay. Here. Let's scroll up and we're gonna exit out of this one if that's okay and go to PX slash
59:47 schema. I see you already scrolled up. Was the one scrolled down. So, in here, what you can see is all the different tables in Pixie, and the description of what they do. And then in the next table, you can see all the different columns they have and what you can find out about them. Okay. There we go. So that's how I discover tables and columns that are available for me to put in my script. Okay. Got it. Yeah. Because that's not something you could just infer. We have to document that. I mean, it is documented but we also
1:00:24 want it to be programmatically generated as well. Yeah. That makes sense. Okay. So much like the web UI, we've got our start time here. Now we're looking for greater than 400 again. That was just complete luck that we happened to have the same query but like you said, it's a common use case. It makes sense. Yeah. Now. I promise it was legitimately already that way for months. Okay, so now we're pulling things from our data frame context and to the top level data frame. Assume that's just because we want access to the namespace in the service and the
1:01:01 data that we're working with then we have our group by service aggregating the error count and total request count. Mhmm. Okay. And then we get PX display. What does PX display do? Is that can I tweak that to display different visualizations? Yeah. So basically, the way you can think of it is there's two ways to write a pixel spread. The first is with the VizSpec, which is what you see if you want fancy charts and things like that. But the second is if you're lazy like me and you just wanna produce a bunch of tables,
1:01:38 you can use this way. And in this way, we'll basically basically say that anything you call p x dot display on will just be outputted. Got it. Okay. Now I I I could just run this in the browser to like get live feedback into what I'm doing and then move it into like the Go tooling and do things that way so like the UI would be a good way to build these out and then I could put them into other tooling, is that what you you yeah. Okay. I'm glad I I understand that then. And now with regards to the Go code
1:01:47 Workflow Summary: Build Scripts in UI/CLI, Automate with SDKs
1:02:10 here, is this just using the Pixie API to be able to coordinate setting up the alert and then publish to Slack? Mhmm. Okay. So the Slack alerting code isn't something that's available and like the the PX module. It's just it's just regular go code. Okay. That's right. That's right. Because the pixel scripts, you know, that's for clearing data. And then because of the API, you can embed those results into whatever control flow logic you want. Okay. The the full picture now is starting to cement in my mind. So I've got Yeah. Yeah. I've got pixel. I can create
1:02:49 a data. I could build my visualizations, and then we've got to go into Python SDKs if I wanna be able to take that tool into my own infrastructure, where I can do alerts and a whole bunch of other things. Mez? Yeah. Here here's something quick we could do actually. K. Let's just run that pixel script with the CLI just so you can show the interoperability. Mhmm. If you just do p x run and then dash f to save file and then HTTP errors, it will just run and produce the results in the CLI. Oh, I changed directory, so I need to
1:03:01 Running Custom Scripts from CLI (`px run`)
1:03:21 Oh, okay. Pixie one cube. Nice. So that can be another way to iterate on it. And we have a I would compare another version of the or another command in CLI that's like I would compare it to, like, k nines where it's, like, more interactive and you can, like, switch scripts and drill down easily. But just for the purpose of this, it was just easy to do PX run. And the one I'm referring to is PX live. So you wanna do control k, and then just select a script. That's a good one. But any of them honestly. Yeah. That's good.
1:03:53 Interactive CLI Exploration (`px live`)
1:04:13 Oh, nice. Alright. I'm running out of adjectives for being impressed here. You're gonna have to slow down. Yeah. I I I love that I can work in a pretty feature rich way from the command line or the UI or even from the SDKs like, you know, as a as a developer just having those availabilities to me and being able to work and what works best for me or my team, my organization. Yeah. This is just pretty well rounded. I'm really impressed with this. How is this just a startup? I don't get it. It feels more mature than that.
1:04:16 Wrap Up
1:04:51 Well, I mean, you know, we're no longer a startup. We're now a part of New Relic which is great. But yeah, when we were acquired we were I think 10 or 12 people. So, you know, we're just really passionate about making this a great experience for developers like us is what I would say. Yeah. Normally when I have, you know, companies that are relatively new, they're like, okay, it's like, don't click on this and don't run this and don't notice this and everything we've done here has just worked right out of the box. So very
1:05:17 nice. Okay. I I'm I'm guessing I don't know. Have we taken a look at most of the functionality or is there anything else you would like us to look at before we wrap up for today? You know, I could go on and on but I think that, you know, for the benefit of the audience, we could we could call here and we can just say that anyone who's interested in learning more should check out our docs or our GitHub repo. You know, please file issues for problems that you see, and we'd also love for people
1:05:20 Wrap-up and Further Resources
1:05:47 to join our Slack community. Awesome. Yeah. I will make sure there is a a link to the Slack community and the show notes. It's also available on the Pixie website. Yeah. You should all just definitely go check it out. You've seen now in the last hour how easy that was to get started and the features just keep coming. We got one question in the chat but I think we already answered it but I'll do it again just in case Bob didn't miss the start But can the whole stack be self hosted? And if not, is it
1:06:10 Viewer Question: Self-Hosting Pixie
1:06:17 on the road map? Yeah. Love to emphasize this. Pixie is entirely open sourced. So even though we went with the free hosted option today, you can entirely self host it and, you know, the instructions for that are in our docs or you can check out our GitHub repo to learn more about how to do that. Awesome. Alright. Thank you very much, Natalie. That was a great tour of Pixie. There was a lot to love there. Definitely gonna be kicking the tires on this more with some real clusters. I just wanted to thank you for taking the time out of your day for
1:06:38 Conclusion and Farewell
1:06:50 for guiding us through this and sharing your knowledge with us. It was an absolute pleasure. Yeah. It was really great, you know, diving into it with you as well. It's always fun to see it from a new perspective, like from someone who hasn't looked at it before because I look at it all day. So Alright. Well, thank you again. Have a wonderful day and I'll speak to you again soon. Yeah. Thank you so much. Alright. Thanks. Bye.
Technologies featured
Meet the Cast
Stay ahead in cloud native
Tutorials, deep dives, and curated events. No fluff.
Comments