About this video
What You'll Learn
- Set up InfluxDB 2 and Telegraf on a fresh Linux server for collecting time-series metrics in practice.
- Create and query buckets, measurements, tags, and fields in InfluxDB while preparing dashboard-ready time-series data.
- Build dashboards and downsampling tasks with Flux, then configure webhook alerts from threshold-based query checks.
Anais Dottis-Georgiou from InfluxData walks through InfluxDB 2 and Flux: installing the OSS release, collecting metrics with Telegraf, exploring data, building dashboards, downsampling via tasks, and configuring webhook alerts.
Jump to a chapter
- 0:00 Holding
- 0:45 Introductions
- 1:21 Introduction
- 1:53 Guest Introduction (Anais Dottis Georgia)
- 2:30 What is InfluxData? (InfluxDB & Telegraph)
- 3:06 Setting up the Environment (Equinix Metal Server)
- 4:17 What's New in InfluxDB 2? (Unified Platform, Flux)
- 6:00 Interesting Customer Use Case (Ruben Observatory)
- 7:37 Installing InfluxDB 2
- 8:00 Installing InfluxDB 2
- 9:21 Initial Setup and Web Interface (Quick Start)
- 9:30 Configuring InfluxDB 2
- 10:30 Loading Data (Getting Started)
- 11:22 Collecting Data with Telegraph
- 13:36 Installing and Running Telegraph
- 15:33 InfluxDB 2 Concepts: Buckets, Measurements, Tags, Fields
- 15:40 InfluxDB vocbabulary
- 18:15 Exploring metrics
- 18:16 Exploring Data in the UI
- 19:06 Understanding Aggregate Functions and Flux Queries
- 26:27 Exploring More Data / Dashboarding Introduction
- 28:08 Creating a Dashboard Manually
- 30:16 Generating Load and Observing Metrics (Humorous Interlude)
- 35:03 Exploring the Pre-built System Dashboard
- 35:56 Dashboard Visualizations (Graphs, Single Stat, Heat Map, Histogram)
- 38:01 More Attempts at Causing Load (Fork Bomb & Docker pids-limit)
- 44:20 Dashboards
- 44:23 InfluxDB Internal Metrics & Community Templates
- 45:12 Importing a Community Template (Debugging Failure)
- 50:00 Tasks and downsampling
- 50:08 Creating a Downsampling Task using Flux via the UI
- 54:00 Alerting
- 54:10 Setting up Alerting
- 55:38 Configuring Alert Endpoints (HTTP Webhook)
- 56:36 Triggering and Verifying the Alert
- 1:00:46 Summary and Conclusion
- 1:02:26 Getting Help (Slack, Community Forum)
Full transcript
Generated from the English captions. Timestamps jump the player to that moment.
Read the full transcript
1:21 Introduction
1:21 Hello, and welcome to today's episode. Today, we're gonna be taking a look at InfluxDB two and the Flux language. Now before we do that, I wanna say thank you to Equinix Medal for providing the time and resources for me to put this show together so that we can learn from experts around the world about all these cool cloud native technologies. Now joining me today to look at InfluxDB, I have Anais Dottis Georgia from Influx Data. How are you, Anais? I'm doing well. Thanks. How are you? Yeah. I'm very well. Thank you. Would you like to firstly take a moment just to
1:53 Guest Introduction (Anais Dottis Georgia)
1:55 introduce yourself, tell us a little bit about you, and then we'll talk about InfluxDB. Sure. So I'm a developer advocate at InfluxData, and I've been at InfluxData for about two and a half years now. A little bit about me and then a little bit about InfluxData in general. Well, I guess a little bit more about me. I have two really cute cats. Their names are Chester and Nikita, and they're always snuggling, which is adorable. And I like to make art and get outside and feel the sunshine as much as I can. And then a little bit about InfluxData.
2:30 What is InfluxData? (InfluxDB & Telegraph)
2:30 So InfluxData is a platform for building and operating time series applications. And it consists of two parts, InfluxDB, which is the database and processing engine, and then also the UI. And then the second bit is Telegraph, which is a collection agent for metrics and events. So should we get into using InfluxDB, or do you wanna learn some about some interesting customer use cases or how are you feeling? Well, why don't we do the little bit of setup work that's involved first, then we'll talk about some of those things and then by the magic of the Internet and time, we'll have
3:06 Setting up the Environment (Equinix Metal Server)
3:15 what we need to get started. So Perfect. I put up the InfluxDB website which we'll use to download it, but before we do that, we do need some hardware. So we're gonna spin up a new server. And this is Equinix metal, so it's all bare metal goodness. I'll spin this up in Amsterdam. Now does it matter what size of machine we use? No. It doesn't really. Let's just use an expensive one anyway. Makes it more fun. Let's go with it been to 20. In fact, let's pick a faster provisioning one, twenty o four. And we'll call this InfluxDB.
3:57 I always like to make sure that I don't really do anything upfront so that people can kind of follow along. We won't do any user data here. We're just gonna spin this up as a vanilla Linux machine, and then we'll walk through the installation steps. Now hopefully this will refresh. The box is spinning up and that is gonna take one to two minutes. So why don't you give us one to two minutes of awesome information about maybe what's new with InfluxDB two. I believe that was GA, was it, last week? Yeah. And then if you want, you can
4:17 What's New in InfluxDB 2? (Unified Platform, Flux)
4:28 feel free to share some user stories if you wish. Sure. So InfluxDB two is quite different from the one dot x line. Primarily, it's that previously, all the components of the platform are separated and now they've been unified from four components into two. And so the collection agent is separate, but all the capabilities of the UI and the data processing and task engine and the actual time series database is now just one unified, component. So that's one part. And then the other thing that we introduced with, or the not me, but engineers introduced with the two dot o
5:08 is that they created Flux, which is a functional query and scripting language for time series in particular. It's kinda JavaScript esque, and you can do a lot of fancy things with it and it, like, takes your data and allows you to play with it in what sort of feels much more relational than it than it is. Just extends all like, a bunch of functionality to working and analyzing your data. And I will say when I first started learning it, it took a little second for me to get used to it. But then once I got accustomed
5:44 to it, don't even understand how people do things in SQL anymore. It's just too many parentheses, too many nested queries, and I really prefer Flex. So you definitely have a shift in thinking, but once you get used to it, then it's a lot more comfortable. Some cool stories. If I still have time do I have time? You've got time. Okay. Cool. A cool story is well, recently, the Ruben Observatory is doing some awesome work work with InfluxDB. So they are trying to if I remember correctly, I'm not that well versed in physics and star stuff.
6:00 Interesting Customer Use Case (Ruben Observatory)
6:26 So apologies if I get it wrong, but I think they're trying to prove the existence of dark matter. And so they're doing a ten year survey of the sky and they're collect collecting 500 petabytes of images of the sky with, I think, the world's largest telescope. And they're using Influx to both monitor their telescope, and I think they're actually also collecting some some image related metadata as well with it. But 500 petabytes, that's fun. Yeah. Yeah. That's that's a sizable chunk of data. That's for sure. There were a couple of things there that I I really liked. So of course, the
7:10 physics stuff is cool, but that just that quote of physics and star stuff, think is now my new go to for describing astronomy. That is that is no cemented. And also, I'll I'll share your I don't know if it was frustration followed by joy of Flux, but I I remember the first few times I was trying to write a Flux script and it it felt like just banging my head off of a wall. And then eventually you just I I really struggled to write queries in any other language, so I definitely shared that journey with you.
7:37 Installing InfluxDB 2
7:37 Alright. And right on queue, we have a machine as well. So now because this is an entirely professional production, I did forget to put this on the screen when I was spinning up. If anyone wants to spin up their own Equinix medal, there's $50 coupon from the show code. Feel free to use it. So dot oh no. There we go. Don't close the tab. So I'm gonna click get InfluxDB, we're gonna ignore the pop up, we're gonna click on 201. So this is the docker images, the static binaries that we can just pull this down
8:00 Installing InfluxDB 2
8:11 onto our machine and get started. And if I get anything wrong, just interject and tell me I'm wrong. Alright. We're good. So I will copy this and where's my terminal? There we go. Ah, no. I need an IP address, don't I? So let's make sure I can get on my machine. So I've moved all of my hardware around and now my fingerprint scanner is over here which makes it a very smooth move to get onto my SSH as well. Yeah. Alright. Let's grab this again. So we're just going to pull this down, AMD sixty four debian fail. I mean, I
8:54 think that's the right one. Installed. That's good. Let's confirm. Okay. So it's not running, so we could just just start InfluxDB. Easy bit. Yeah. So this is the new single binary that you said is is available. This is new with InfluxDB two. All the components are shipped by one. So now I can just browse to this IP address and we'll get the web interface. Right? Hopefully. There we go. So let's click getting started. So this is the setup screen. Right? This is because InfluxDB two can't be unauthenticated, if I remember correctly? Correct. Yeah. And the design team has done really awesome
9:30 Configuring InfluxDB 2
9:59 stuff with integrating tutorials into the UI so that if you are a beginner like this, we can just, like, hit clicks quick start and be walked through everything that we need. Alright. Well, choose your own adventure then, I know. What what what path are we taking today? Yeah. Let's do quick start. Let's pretend that we're we're not an expert already. Alright. So what did that do? Looks like it brought us to the getting starting page where we have an option to either go ahead and load our data or build a dashboard or set up alerting.
10:30 Loading Data (Getting Started)
10:38 But we don't have any data yet, so let's let's load our data. Oh, those animations are really stick. I know. Right? I also love the color palette. Yeah. I mean, I'm just gonna sit and watch that one I think because it's cool then. Yeah. And Can we do the third one? Oh, yeah. It's not Sorry. What? It's not as good. Like the second one. Yeah. The second one. The first two are really good. You know, this is the important stuff that we need to kind of break down on this show. So and I love that randomly or serendipitously,
11:09 the color palette actually kind of matches my shows colors too. Like it's purpley pinkiness. Alright. So what are we doing? Are we are we loading data? Yeah. Let's load our data. Okay. So this is offering me client libraries, tell oh, wow. So I I mean, I should say because there's gonna be conversations where I say, oh, I don't remember this. I I we used to work together in Flux. So I am familiar with the tool and the languages. But I have been detached from it for like what? Six months now? Five months? I can't remember.
11:22 Collecting Data with Telegraph
11:43 So I'm sure it's changed a lot. But this is offering us the client libraries that we have available, which is if we want to instrument something ourselves, which I'm assuming we're not doing today. We'll try and we'll skip that. Has this telegraph thing which I've definitely not seen before. Now I'm assuming if I click on one of these, is it gonna offer me like a pre canned config for Yeah. It should. Yeah. Right. Okay. So I can just copy that to clipboard, and it tells me a little bit about the data structure. Alright. In fact, this just looks like it's
12:16 generated from the from the. Yeah. I think so. Okay. What's next? What else we got? Is that it? Yeah. I think so. So you can also click Telegraph itself too and generate a new config that way. Well, I mean, we know this stuff. We we can we can manually write a telegraph config. Yeah. We can, but you don't have to. Right. Okay. So let's see. I just click create configuration. Either way, I mean No. Because then I have to remember bits and stuff and other non star stuff. So let's just do it this way. So I'll click system.
13:05 Off the top of my head, I'm assuming we're gonna get CPU, memory, network, disk. I'm sure there'll be one or two others. Okay. System. Processes swapped. Yeah. I haven't ever remembered that list. So this this is probably better. Oh, there's a nice InfluxDB token for everybody. With my IP address, it's very public. So alright. Export. Oh, and we don't have Telegraph. Right? So Oh, yeah. We need to go install Telegraph. K. App update. App install. Telegraph. Yes. Alright. Let's download the binary. I was sure it was in the Ubuntu archives, but maybe there's an app repository.
13:36 Installing and Running Telegraph
14:13 There's a dev fail. There we go. Let's use that. Okay. So we go to slash etc slash telegraph and oh, no. We don't, do we? We just run telegraph with our special flag. Right. Yeah. And then we can see if Influx is gathering data successfully. Yay. And it's worth noting too that if you click setup instructions, you can follow that again. So if you like, I don't know if that's apparent to everyone that that's a button. And then also, if you actually click system, the config itself, then you have an opportunity to both download the config and change it if you need
15:11 to make adjustments to the configuration. Alright. Well, I'm assuming based on or you see just today, we'll probably get away with just this this ten second interval. I mean, could speed it up, see it a little bit quicker, but we've already wasted the first ten seconds. I'm assuming when we go through this explorer view, we will hopefully have something. So let's take a look at that. I've got my bucket here. Do you wanna just break down the vocabulary to them for people that aren't as familiar with InfluxDB? You know, so far we've mentioned like the system stuff we're gonna scrape and
15:40 InfluxDB vocbabulary
15:49 then a bucket like what words and what things do people need to be familiar with with InfluxDB two especially compared to Versys one? So a bucket is a combination of a database and a retention policy, all packaged as one and it's that's known as a bucket. And I like to think of just if you have a stream of data and it's coming into a bucket, eventually, depending on the size of the data, it's gonna overfill and expire some of your old data naturally. So you can think of the size of the bucket as the retention policy
16:22 and otherwise a bucket being like a database. And then after that, we have measurements, tags, and fields. Measurements and tags and buckets are indexed and fields are not. So that means that if you are writing to InfluxDB and let's say, you're either gonna create your own Telegraph plugin or you're gonna you have to make a decision as to whether or not you want data to be a tag or a field. You wanna keep in mind that any queries that you filter by a specific tag will be a lot faster than if you filter by a
17:03 specific field. And then the other thing to keep in mind is that it's much easier to cross compare data that is in one single measurement than than if your data is in separate measurements. So if your data is very correlated or related and you need to be able to visualize two different fields at the same time frequently, then you wanna probably put those in the same measurement. That being said, one of the cool things that Flex lets you do is to perform math cross measurements and join data across measurements or even buckets. So that's totally up to you. And then
17:39 the other thing worth noting is that you can scope tokens on the per bucket basis. So if you're building some sort of IoT app, for example, and you have a reason that security purposes that you want to maybe separate out various data sessions with individual tokens or or users data with individual tokens, and you can think about scoping tokens to individual buckets and directing your data in the way that you see fit. So does that is there anything else that you think I can clarify? Or if we should we just start I think we should
18:16 Exploring Data in the UI
18:16 just start clicking and see what happens, basically, is my Yeah. I think you covered everything there that I I thought of at least that we should break down. So I think if we just start clicking now, start exploring this data, show people kind of how the UI works and how you discover this stuff, and then we can try and get a little bit more pragmatic after that. So Sure. What's what's your favorite measurement? I always look at CPU and then like usage system. And then maybe I'll pick two or all of them. Yeah. Let's just
18:52 click go on this. And alright. So right now, we've got data. Yeah. Let's drop this down to five minutes so that we can there we go. That's a bit more spread out. So It's also worth noting that if you look on the right hand side underneath the submit button, there's a section there that we've selected with an aggregate function. So we've applied the mean with an auto interval based on the amount of data that you're querying for. And this is just to make the visualization easier because if you're clearing hundreds of thousands of points and your data is really noisy,
19:06 Understanding Aggregate Functions and Flux Queries
19:32 you're gonna probably wanna apply a trend anyways. So it does that for you off the bat. But if you wanna exclude that, you can just unclick the mean, I believe. No. Computers said can't. Oh, never mind. Well, then if you go to the script editor, we can see the query that Flex generated for us, and then we could always remove the aggregate function when function and then hit submit and then we'd see the raw data without the the mean. Do you want to remove the aggregate window? First, let's let me look maybe we should just talk about flux and what's going on
20:13 here. So Yeah. Go for it. Yeah. So first, we're select Let me try and submit on this because I don't think that's gonna be particularly readable. Right? Yeah. That's that's better. Also, the data with this aggregate window is really visually pleasing. It's got a lot of diamonds in it, which I appreciate. Anyways, so the way that Flux works is that you first specify the bucket that you wanna get your data from using the from function. And then you use pipe forward operators for every data transformation that you make on your data. So I like the pipe forward operators because it
20:58 really helps bring my attention to the fact that I'm now changing my data and also is indicative of how the stored engine works and how the the Flux engine works and how it operates with push downs and is pushing down each, essentially, operation within the Flux engine. So the first thing we're gonna do is select a range and filter for our start and stop times, or you can just filter for a start time and say, like, minus five minutes. You can also manually put in a time, but this references the v the variable time range stop references
21:43 the click down option that you used previously. And you can also select a custom date with that click down option as well. And then if you ever doing any sort of forecasting and you wanna look at data into the future, you would do a stop time of just five minutes without a plus or anything. And then we filter for the measurements and the fields that we wanna look at. And then afterwards, we can apply any other functions that we want. So, yeah, I guess we can get rid of the aggregate window to see kinda what
22:18 it was doing just for fun. For fun. Yeah. Alright. So I think there's a couple of really thing. Like a couple of really nice things that I'd, you know, we don't see in other programming languages that much but I know for time series it makes sense and that's like these duration literals. Like the five m and the minus five m I think are really cool. And you I like the pay forward operator too. Not that partial to the diamond shape but you know, I can appreciate it a little bit. So we are now ignoring just to make
22:48 sure everything is quite clear here by changing that range statement to be start minus five and stop five. We're essentially making this drop down redundant in our use case. Like I could go back and see an hour or actually that no. Changed it because time has passed. Like seconds have passed since the last time I did it. These are I'm not quick enough. That is irrelevant. We've removed the mean and we're still getting a graph that looks pretty much the same. Is that expected? With removing the mean? Yeah. I don't think so. Maybe I think maybe
23:29 because we have so little data that maybe the mean is really similar. So maybe if we instead of doing a window period, I wonder if we did a window period of one hour in the aggregate window if we'd see a difference. We well, we don't have a lot of data yet. But Right. So we would see, we would expect to see one point. Okay. So let's kind of we pulled data from the bucket. We've set up our own start and stop range and we fell through the data. Now we can have the ability to aggregate and
24:04 the calculations. What else can we do besides just the mean then? Like, if we were to bring this back in, what what is a a good common use case for for aggregate window? Gosh. That's a hard question because it it's one of those things that feels obvious. And just because if you're close to it, you and then stepping back and trying to explain it is harder. I would just say I would say one common use would maybe be for also downsampling your data. So if you have really high resolution data and you decide that after a certain time
24:46 that that high resolution data is no longer valuable to you and you're only concerned with overall patterns in the data rather than than the granular granular shape of it, then you might create a downsampling task with the aggregate window to pull that data in in a in a lower resolution form, and only save that and then expire the old data so that you can reduce your overall resources and disco disc space. So that's one one reason. I think I use aggregate window when I just want to smooth the data when I'm clearing it so I can better understand overall trends in my
25:29 data. But you also have a a window function as well in case you need to maybe window data and then perform unions across windows or I mean, really, what you what you do with the Flux is kinda it's so dependent on your use case and what, you know, data transformation is all about, what your data is, what you're trying to achieve, what insights you're trying to get out of it. So it's so specific. Okay. That that that that makes sense. Like, I I understand that. So now people have the choice. Right? They can they can just use the query builder and
26:13 probably get a substantial way through what they need to do at least from a a superficial level without really ever rate in flux. Is that a fair fair statement? Yeah. I think so. For sure. Alright. Well, let's see if we can go back to it just now. So we explored our CPU. Oops. Why don't we take a look at one more thing? And then we'll do some dashboard y bits. So there's quite a lot in here. How about we look at one of the InfluxDB ones? What do we got? Whatever the rate stats are. Do you know?
26:27 Exploring More Data / Dashboarding Introduction
26:57 What about them? Well, maybe that's it there. Let's try that. Go. Yeah. Okay. That's gonna I just wanted to see like because we've got telegraph which is on the loop which is writing data every ten seconds. We should see this linear growth of the number of frames. Okay. But it's also not that exciting to show. So let's pick one more. How about the memory consumption? Because I'm zoomed in, that makes it a little bit tricky. And we don't need this flux script right now. So so that's mem. We've got all the go runtime stuff. We've got memory here. So let's take a
27:36 look at available percent. And you can see I'm assuming here that just because we are writing to InfluxDB which has an end memory right ahead log, That's why our memory is dropping. And we're just looking at available with that. Well, that's not very good. Alright. Okay. So the query builder is quite cool. We can drop down to Flux when we need it. Shall we try a dashboard, some of those, and see if we can understand what a healthy Linux system looks like? Sure. So if you go back to the explorer, you can actually click oh, yeah. Click your
28:08 Creating a Dashboard Manually
28:27 measurement that you want. Alright. Let's do CPU first. Right? So Okay. So, like, I think the intended workflow here is that once I mean, there's a million ways to skin a cat. That's so dark. But once you once you find your query and you're happy with it in the explore tab, then you can click the right hand little button that says save as, and you can kinda save it as a dashboard if I remember correctly. So we are did we add that? No. So I created it for you. Yeah. Alright. Well, we'll create a new one just
29:06 now, and then we'll we'll take a look. So we'll call this system manual, and we'll call this CPU. It would be nice if it took me to the dashboard. I'll make a note. There we go. Alright. What else will we add? Will we do memory again then? Yeah. And disk maybe. Yeah. Well, disk is something that's relatively easy for us to destroy quickly. Put that in there and memory. I really should just use a search, shouldn't I? Mem let's do just available. Let's add used. Now these don't look terribly exciting just now because I probably threw too big a box at
30:13 this and we're not really, we're barely tackling it, I would say right now. So let's cause a little bit of trouble. Well, they pop open another shell, then we can say rate from random to a fail called big fail. And that should just keep writing random bytes to the disk. So what what's happening here with telegraph, right, is that it's scraping metrics every ten seconds, drag them to InfluxDB. Our dashboard is not all refreshing, but we'll just set that on ten seconds now anyway. We'll look at the last five minutes and I'm assuming we should see some movement on
30:16 Generating Load and Observing Metrics (Humorous Interlude)
31:11 some of these cells. Right? Depending on how big the file and how quickly it's writing that to disk, of course. What is that CPU that jumped up? Let's see. No. Because DD is a single core application, one of the CPUs is now essentially continuously just writing this big file. Oh, and we can kinda I mean, again, the desk is so large in this machine, 800 gig that but I do see a small curve forming. Right? Yeah. Maybe not as quickly as I would have laid, and the memory hasn't moved at all. Wonder how we could eat up memory.
31:58 Let's see. Linux consume memory fast. Googled no one ever. So I have a script that consumes a constant amount of ram for a user to find them in a time. I mean, do we trust Stack Overflow? I I always do blindly. I mean that script looks really safe. I mean if I understood it even remotely maybe I would do it. And this If I could read it, it would be Oh. Thanks. So this one has some sort of imperative loop. It does some multiplication, and then evolves it. I think that's just continually adding to the rate. Let's see if we can
32:57 find one more before I trust that one. I mean, I tried to, like, dot dot go, but it never gives me the answers I want. So. That's it's charm. Right? And now Google is telling me how to stop processes eating all my ram when I actually want to cause a very explosive. I'm just gonna trust it. Let's let's go with this guy. So we'll get one more tab. All of this just so I can see a graph modify on the dashboard. That's it. So We could also write data to Influx with Flux too, but this is more fun because
33:44 you're just messing with things causing chaos. Well, if I get it working, perhaps. Am I supposed to oh, no. Wait. Because the numbers aren't very big. Right? That's not consuming RAM. Whale through to okay. That's maybe gonna consume some RAM. Let's see. Let's see what Influx is. Well, we'll find out in ten seconds. This is gonna be so anticlimactic, isn't it? I don't know. We'll find out. I'm hopeful. Oh, yes. I had a funny comment from Nikolay who said, just install Slack. Yeah. That would definitely eat around pretty fast, wouldn't it? We do see oh, no. That is kinda bending too. Okay.
34:48 I mean, we're still not really doing much damage, but it's slowly we can see these graphs being manipulated and we can see resources being consumed. Just not in the fun chaotic way that I was kind of hoping. Alright. Let's do something a bit more practical then. That was that was okay. So we take a look at the system dashboard that got generated for us then. Is that gonna be a bit better than mine? Ours? I guess we can find out. I think that's subjective. Yeah. Okay. Cool. So we've got system uptime, which you can see if we could for
35:03 Exploring the Pre-built System Dashboard
35:25 a eight CPUs. We've got some system load here. We got our memory. Our memory. Yes. I mean, we're not even at 1% memory usage, so and then we got this guy over network processes and swap. If we break us down to the last five minutes, it'll look a little bit more full. There we go and we can see where we we we tried. Right? We tried our best to consume a better load and consume a better memory. You maybe just then break down the the different cell types that are available? Like, this is not a graph.
35:56 Dashboard Visualizations (Graphs, Single Stat, Heat Map, Histogram)
36:01 Right? Yeah. So you have a variety of different visualizations both available to your explorer tab and also your dashboard. So you can add a cell and then change the visualization where you can configure individual cell to display a different visualization. So let's do that. Let's click the gear icon on any one of these cells. And then, yeah, hit configure. Yep. There you go. So there's heat maps, histograms. I really like the single stat on top of the graph graph plus single stat, especially for things like memory and stuff like that. I think it makes it
36:45 pretty powerful. Probably not too useful, but I clicked on one of those boarding ones though. Right? So Yeah. Let's use the desk IO since we managed to get at least cause a little bit of graphing there. There. Well, what else did you say? We got heat maps and histograms. You know, I've never really understood the difference between when to use a heat map or histogram. You got any thoughts on that? Oh, fascinating question. Honestly, no. I don't really ever use histograms. Let's click on a heat map. Let's just see if it speaks to us intuitively. Nope.
37:25 See? It doesn't. I think if the visualization doesn't help you understand what's going on, then it then it's not the visualization for you. Well, I mean, maybe that's not the visualization's fault and just the fact that we have less than twenty minutes worth of data that's not really moving too much either. So, like, I don't I don't think we're gonna get particularly useful things. Like, there's like it's our gauge. Right? It's like, hey, you're zero. Nice work. Try harder. So Yeah. I mean, if I didn't have my rule of doing everything live, perhaps I should have
37:59 spun one up earlier and left it running for a few days with some weird workloads on it. Or maybe a crypto miner. That would have been a good idea. I mean, we could always fork bomb the machine. That would really cause some problems. Let's do that. Are you familiar with a fork bomb? I just it sounds good. So I'm excited. So is this voodoo piece of bash that spawns infinite processes until the machine falls over? No. Generally, the it may not like consume all of the memory or the CPU, but it will give us something interesting in
38:01 More Attempts at Causing Load (Fork Bomb & Docker pids-limit)
38:39 the processes table and we probably will run out of pet IDs, which is generally when the machine stops functioning. And I don't even remember how it works in practice because I haven't been stupid enough to run it on my own machines for a long time, But we may have a better time to run it before it kills itself. Why do people ever use it practically? No. People don't. Okay. Alright. Let's let's let's do this a little bit safer. Right? Let's not I won't be too chaotic. Let's get Docker installed. Because there's a really cool feature with Docker
39:21 where we can limit the the pedal. We can do a pedal limit so that it won't destroy the machine. So let's do command docker image pool Alpine three. In fact, it needs bash. Let's just do bash. Okay. So we should be able to do a container run dash dash r m bash echo hello. Let's make sure the ped limit flag works first. Memory failure. Pedsler. Course. No? Oh, yeah. What did I get wrong? No space? No equal sign? No. No. I know. Docker container. Oh, no. It has to be after the run. Peds limit equals two.
40:34 Okay. So now we have oh, my dog is going wild. Oh, we need to pass that in as a string, so let me re copy that. This is a long way about just to show something that's not even that exciting. So paste. Copy it as one line. Don't fail me now. I'll do it in line. Okay. I feel like I'm committed now. Yeah. We have to see this through. Alright. So now we're inside of a batch container. Okay. Cool. Now that wasn't very same because we only ran it with a pad limit of two. Let's let's see if we can visualize that.
41:31 I'm not gonna be able to visualize it with two though, so let's bump it up. Let's do 500. Okay. And now it's setting the limit and see if we get anything here. We just need to wait ten seconds or won't we? Maybe we should do this from here. And this is the episode where we stopped looking at Influx and instead just tried to break the machine. So will that give me all fields? It does. Okay. Hey. We did cause some artificial load. What is the diff how many pets can we have? Max, pets, Linux. I don't wanna raise it and then find
42:20 out that I put it too high and then we do destroy the machine. 4,000,000? That's that's million. Right? Yeah. Yeah. Sure. Just yeah. Just agree. That's fine. I read 4,000,000. So So we'll do 1,000,000. And that's either the end of the episode because we measure at the number or it'll just work. Now it's not printing resource error yet. And I've lost the machine. And and now we have helpful comments from people who are suggesting other ways to cause artificial light. Oh, no. The telegraph one. Okay. So it's just I've just lost this window. So maybe.
43:12 Yep. Okay. Telegraph is trying to exit nicely, which is not very helpful right now. Docker container l s. Destroy that container. There, not too dangerous. Come on. Right. And then we'll do it the nice way instead of that way. Let's see what we see here though. Oh, wow. Okay. That's good. And then that's where we saved it. Now the nicer way would have been this comment stress n g which I'm not familiar with but I guess we might as well check it out. Let's see what this does. I'm assuming it needs some flags. Right. Let's
44:17 forget that. That was a nice digression, I guess. So we also have this InfluxDB two OSS dashboard. This is what using the internal metrics? Yeah. So there's a InfluxDB OSS, and then there's also a free tier. So hot tip, if you are using either or use one to monitor your own InfluxDB instance as well. Okay. Now as a do I have to build all these dashboards myself? Is there somewhere I can get more of these pre canned ones? There is a community templates repo and InfluxData, and there's a bunch of temp we should actually test the template I just
44:23 InfluxDB Internal Metrics & Community Templates
45:08 just merged yesterday. So this one? Mhmm. And what was your template called? InfluxDB two operational metrics. There you go. I think you just passed it. So I just copy this YAML? So you can just copy the URL actually. And then go to settings and then templates. And then in and then click the copy and paste. Yeah. Okay. So this is gonna create some task summary dashboards, some cardinality explorer. We've got bucket. We've got a task and then some labels. Okay. I'll trust it. Let's go and see what we have then. So I don't know. I think we're gonna
45:12 Importing a Community Template (Debugging Failure)
46:09 have to change the frequency at which we're writing today. So we created a cardinality task as a part of this template. So let's change that cardinality by bucket every period and make it less than an hour. Maybe make it ten seconds or thirty seconds or something. So we actually start getting some data. Says fail. Let's find out why. How do we do that? So we can either can we I think if we click on yeah. Let's try clicking on that view. There we go. View logs. Why'd you fail? Can you make it a little bigger, please?
46:59 Oh, perfect. Okay. Requires host to be specified. Maybe I shouldn't have picked a template that I just added, But we could try to debug it. I could go to the task and see what they mean by oh, wait. But part of this is a task failure monitoring template. So maybe that'll give us more information as to why this is failing. InfluxDB. What does it mean problem with host isn't identified? I don't know. Let me look at the header. I should have been paying more attention. So cannot execute task run run time map fail to evaluate map function.
48:05 And so it says the InfluxDB provider requires a host to be specified. So to me, that means that this needs some configuration. Maybe maybe here. Does this have a open plate? I'll look at the InfluxDB cardinality package. I'm not sure. Why don't we just write our own task? Yeah. We should just write our own task. Oh, you can specify. Yeah. Does that work when you specify the host? There's a parameter for that. Alright. Okay. Let's let's see. 12702021. Save. Can I just force a run? Yeah. There we go. Oh, it wants the port number in the
49:09 host. Okay. I can fix that. 8086. Might as well just protocol two. Force run. Unauthorized. Oh, so you need to include your organization and your token. Okay. Tokens. Copy. Alright. So organization. It's just org. Yeah. Token. Okay. Or and can I just be the name? Mhmm. Run. Oh. Requires a host to be specified. Alright. Let's write our own task then. Okay. Let's do that. So let's say we wanna downsample our CPU. Mhmm. Let's run this every well, let's do every ten seconds for quickness. So Do we wanna write it from scratch or let the UI write it for us?
50:08 Creating a Downsampling Task using Flux via the UI
50:27 I feel like I should say UI. So Okay. I guess we just find the measurement we want. Let's do system. K. Do I just copy the script? Or Yeah. Or do you wanna I think we're just doing a window aggregation. Do we maybe wanna find a metric that's a little bit more obvious? Like, I think we should do it maybe, like, return the max. So I would just go to the script editor. Like that? I would just go to the script editor and then remove aggregate window and just create max instead. Like that? Mhmm. Is that what we want? That's that's what
51:28 I want. Alright. So I I copy this and then we just go to the task here? Yeah. Or no. You can click save as. Oh, sorry. Go back to the explorer. And then you can click save as. And then you can say a task. Ah, okay. So downsample CPU run every thirty seconds. I guess we can just use that bucket since it's there. I don't think there's any data in it. I I was trying to use that that cardinality task to write data from your Rawkode bucket to the cardinality bucket about your cardinality, but I obviously need to fix something. So
52:12 We can start to reuse the cardinality bucket, and we'll just download sample CPU into it. So if I hit save Oh, the output. Got it. Yeah. And we'll run this manually now. What we should see is something. I'm not sure. Let's see. Current oh, yeah. CPU. We got UC system. And we probably only got one point right now because it's only run once. Is that why it's not graphable? I think it's because it's finding if if you do a group, maybe you'll find that you'll be able to on time. I would just do a plain
53:02 group without any columns. Oh, that's why. Because it automatically added a our window too. So oh, yeah. I don't think you wanna group. I changed my mind. Ungroup. Ungroup. No. I like the group. You do? Okay. Alright. Okay. So we get a max value at least. I think we we just don't have a lot of data. But I I think that's really cool anyway. So we have I don't know we could do that for the UI which I also really like. So we have a Rawkode bucket right in ten second granularity usage data, which looks like this. We then created a
53:47 query to fetch the max value of that over a given window. And that is now being written to the cardinality bucket here. And we do have points, but we just don't have a lot of data in here yet. No. I guess we could speed the task up, but I guess that's not particularly important. Do you wanna walk us through the process of alerting on that max going above a certain value? Sure. So we can just go to the alerts tab, and then I think it'd be useful to create a threshold alert here. And I guess we'll just make one to
54:10 Setting up Alerting
54:29 fail to start with. Do we wanna alert on our dance sample data or our raw data? I think we should alert on our raw data. Yeah. I guess the dance sample data is really for longer term analytics and look back whereas we'd wanna we'd want the real high resolution data for the alert. Right? Yeah. Okay. So if we find a max Now we can just click on a threshold basically on the graph. Once we click configure configure check, the next tab. Oh. I'm just clicking now. I don't okay. Critical. There we go. Yeah. There you go. Yeah.
55:21 Alright. So if we keep this relatively low, we should be able to trigger that not with another thought on, but, you know, with with something else. So did I just click done? I I believe so. Yeah. And then I see a tab for endpoints. Let's use HTTP. There's this thing called Rbox. App which is really cool. Which just gives you like an endpoint you can throw arbitrary payloads to. R box. And then rules. What do I do here? I think you can just like, this is where you change the conditions as well as add more information about what's happening. Like, I
55:38 Configuring Alert Endpoints (HTTP Webhook)
56:18 think you can specify. Okay. So a rule being that every five seconds, if we see something in critical, notify the standpoint. Yeah. Done. Cool. Let's make the CPU or surpass our threshold. Okay. Let's try. And then do I want it? Yeah. Let's find an example of stress n g rather than me running that thing again. So stress n g example. Because we had it recommended to us, so we should we should use it. I thought that was really neat how you also shared the the comment with everybody. This is a cool tool. Oh, yeah. Yeah. I really like this.
56:36 Triggering and Verifying the Alert
57:14 Okay. I mean, I've just copy and pasted that. I don't know what one matrix is. I'm assuming the dash t one minute means run it for one minute. Oh, yeah. And we also got another. You're very helpful today. Thank you, Nicolas. Stress n g dash dash CPU. What did I do? Yeah. Let's let's just document. So dash dash CPU. Let's how many CPUs do I have? Try again. I have 48. So dash dash CPU, 48. Alright. So that means should we view this from here first? So we should be able to see on our CPU metrics here.
58:07 System. I'm still seeing this comment, so it's hard for me to Oh. There we go. Thank you. So while the tool is really cool, sadly, it's still operated by an idiot. So we we do our best. Now yeah. Our CPU is it's claiming there. So that should mean my alert should be firing. Yeah. It looks like it's got a green check. So Is that good or bad? That looks good. Let's click on the I I eye icon. Oh, that's hard to say. I don't know. Yeah. Which CPU metric are we alerting on, though? I thought we were alerting on all of
58:57 them. No? I don't know. Let's find out. You said system. We're learning on all of them. And that's what I created here. Oh, no. You should use our ah, okay. So let's change that. User. There we go. So can I We we can't? We just had to wait the ten seconds, won't we? Mhmm. So if you do you did you bring a joke? If you got a joke? I only have the world's worst pun or like dad joke, but it's so Texas for specific. So what did how did Davy Crockett like his pie? Do know who Davy Crockett is? Yeah. I'm
59:50 familiar. Okay. L mode. So that updated a few seconds ago. We have a green tech. I don't know if the green tech is good too. It just means it's check ran or it's failing. I guess the rule would be I think it says that it's running successfully. And then if you look at the history, we'll see that it I think it's failing. Oh, yeah. So that means I refresh this bit. Ah, there we go. We have old oh, wow. So I I know. Obviously, that HTTP web hook thing is entirely contrived, but, you know, that could be PagerDuty,
1:00:36 Opsgenie, Slack, whatever tool people are using. Let's stop that stress testing tool. Alright. So what did we do? We used a quick start. We got a telegraph config for free. We start to collect the data of a telegraph. We went down the wrong path and started playing with the FortBomb stuff, trying to make our graphs look a bit more interesting. And I failed miserably, but we did manage to cover a few cool features there as well. Let me just pull this back up, see if it's any more interesting. Bucket. And then we imported a dashboard,
1:00:46 Summary and Conclusion
1:01:19 which was nice. It didn't work, but well, we then then sampled some data from the Rawkode bucket and our cardinality bucket. That was relatively easy to do. I didn't I I wasn't familiar. I didn't remember that we could just do that from the query builder and then save that to a task. That was a really nice touch. And then we configured an alert in back end and we start to send an alert space on again, a contrived situation of just making that CPU spikes to the ref. Did I miss anything? No. That was perfect summary.
1:01:53 Is there anything else you wanna show before we finish up for the day? That's a great question. I think that feels pretty well rounded, so I'm inclined to say to say no. Right. Yeah. I mean, I think the best we forgot the best feature which is toggle on the dashboard onto the other mode. Anyway, I was so impressed when that came out for a while and then realized I don't like bright colors. But, you know. Okay. Well, that was a good fun. I think it was nice to explore and FluxDB to take a look at Flux and
1:02:26 Getting Help (Slack, Community Forum)
1:02:30 and just see what's coming out of that space. It was GA last week, so people should I guess start to play with this. If they have any problems, I'm assuming they can just come and speak to you. Right? Yeah. So we have a Slack channel and a community forum as well. So come find us. Come come ask questions. The community forum, there's a there's a little if you want immediate response from somebody, I would recommend going to Slack. But if you have a more complicated question or you wanna brag about a solution you found, you should just post it on the
1:03:02 community forum. Alright. Awesome. We got one final comment there, which I'm hoping is in relation to the the theme change, but, know you know, some people like it, some people hate it. Anyway, thank you for joining me today, Annie. I will speak to you soon, and I hope you have a great day. Thank you. Thank you for having me. Thanks. Bye.
Technologies featured
Meet the Cast
Stay ahead in cloud native
Tutorials, deep dives, and curated events. No fluff.
Comments