Overview

About this video

What You'll Learn

  1. Set up Grafana on a fresh Equinix Metal host and complete initial admin account hardening.
  2. Add Prometheus and InfluxDB as data sources and inspect collected Linux metrics in Explore and dashboards.
  3. Build alerting rules with webhook notifications, then create panels and import a community dashboard.

Marcus Olsson joins to introduce Grafana from scratch: installing it on Equinix Metal, wiring up Prometheus and InfluxDB data sources, building dashboards, configuring alert rules with webhook notifications, and importing community dashboards.

Chapters

Jump to a chapter

  1. 0:00 Holding screen
  2. 2:20 Introductions
  3. 2:23 Introduction and Sponsor
  4. 3:09 Introducing the Guest and Grafana Overview
  5. 3:30 What is Grafana?
  6. 4:17 Diverse Grafana Use Cases
  7. 6:09 Audience Greetings and Chat
  8. 6:48 Demo Setup and Plan
  9. 7:00 What was prepared upfront?
  10. 8:02 Installing Grafana
  11. 8:10 Installing Grafana
  12. 9:25 Initial Login and Password Change
  13. 9:45 Login / Default authentication
  14. 10:16 Navigating the UI
  15. 10:50 Adding data sources (InfluxDB and Prometheus)
  16. 11:05 Adding Prometheus Data Source
  17. 14:24 Adding InfluxDB Data Source (Flux Beta)
  18. 15:22 Exploring Data with Grafana Explore (Prometheus & InfluxDB attempts)
  19. 15:40 Exploring metrics
  20. 23:38 Creating a Dashboard and Adding Panels
  21. 27:40 Building Dashboards
  22. 28:49 Panel Visualization Options
  23. 35:53 Configuring Alerts and Notification Channels (using Apt Upgrades)
  24. 36:15 Alerting on metrics
  25. 43:11 Adding Disk Usage Panel and Alerting
  26. 49:00 Verifying Alert Trigger (Apt Upgrades)
  27. 51:30 Importing a community dashboard from
  28. 53:08 Explaining the different visualizations
  29. 55:37 Exploring Panel Types (Gauge, Stat) and React Migration
  30. 57:18 Reviewing the Imported Dashboard
  31. 58:18 Conclusion and Call to Action
  32. 58:20 Closing
Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

2:23 Introduction and Sponsor

2:23 Hello and welcome to today's episode. Today, we are going to be taking a look at Grafana, a dashboard and tool for building out really cool dashboards with metrics and understanding how your systems are operating. That's a terrible description, but it'll get better soon, I promise. Before we start, I wanna thank my employer, Equinix Medal. They provide the time for me to actually produce the show and you know, produce hopefully content that you all find useful and help you learn. We are giving away now $50 promo code to anyone who is watching, who wants to spin up some bare metal servers

3:01 on Equinix metal. The code is Rawkode dash live. You can sign up today at metal.equinix.com. Now, I'm gonna invite on my wonderful guest, Marcus Olson, a developer advocate from Grafana Labs. Hey Marcus, how are you? Hey, I'm great. Thanks for having me. Awesome. I'm very excited. I'm a big fan of Grafana. I'm hoping we can we can do some cool stuff today. Yeah. Hope so too. You like to just take a couple of minutes and tell people about your role at Grafana Labs and even you know, what Grafana is? Yeah, sure. So I am a developer advocate at Grafana

3:30 What is Grafana?

3:40 Labs, which is the company behind the open source project Grafana. And if you've never heard about Grafana before, it helps to think of it as a visualization tool for systems monitoring. That's really just a tip of the iceberg though, but it kinda gets you going. So my role at Grafana Labs is to help users get going, but also up their game and help them take advantage of everything that is Grafana. And lately I've been also focusing a lot on plugin development. So if you're curious how to extend Grafana to your I don't know, integrate with your

4:17 Diverse Grafana Use Cases

4:29 special database or you wanna offer a new type of visualization, that's usually me who's who you're gonna talk to. So more about Grafana. I so imagine that you know, you have this application that you've built and you're really proud of it, but how do you know it actually works? How do you know that it actually does the thing that you promised it to do? So that's really where Grafana comes in, where Grafana brings in metrics, information, statistics about that application and then visualizes it, typically in graphs over time. So you can see for example, how is my application using resources

5:14 over the course of a day? My application running on its, know, using too many resources during 8PM or something like that. But you know, that being said, we also see a lot of people using it for all kinds of crazy stuff. I know there's a community of beekeepers that are using Grafana to a monitor their beehives, which is really cool. I have a colleague who just set up her She's she's growing avocados and she just set up her her monitoring solution for just making sure that they have what they need, they're growing like they should. So it's very

6:01 commonly used for systems monitoring, but we're seeing just tons of different use cases for it. Awesome. Yeah, I don't think there's probably anyone in the technology, at least in a sys admin operator SRE rule who's not familiar with Grafana. But to hear that it's actually now spread now to all these other domains for beekeeping and avocados. Yeah. Yeah. Yeah. I I guess as long as you've got time series data and everything that you track over time, Grafana is a natural fit to to dashboard and visualize it and and yeah. That makes total sense to me. Yeah.

6:09 Audience Greetings and Chat

6:35 Alright. So we got a couple of hellos. So let's go through that. John McCabe, pleasure to have you back. Hello again. Demo, yo to you too. And Carlos, hey buddy, how's it going? Hey everyone. So let's see. Let's cover the what have I prepared upfront, which hopefully is as little as possible. So if I go to to to I have not installed Grafana. So we're starting from the we're starting from scratch as far as all the Grafana stuff goes. But I have prepared a little bit of infrastructure on Equinix metal, so that we can have some

7:00 What was prepared upfront?

7:13 metrics to actually work with today. I don't wanna be doing metric collection in real time and setting up, because it's not what we're as important for today's session. So I spun up a three bare metal machines. They all have a Prometheus node exporter running and they all have telegraph running. They're collecting very similar metrics, host, desk, network, CPU memory, all the basic Linux stuff. I also have on one node an influx DB two running and a Prometheus running. The reason I've kind of duplicated my effort here is that one of the selling points of Grafana is that it works with any

7:49 data source. So we were gonna take a look at setting up and talk to be, we'll take a look at Prometheus, we'll build some dashboards, we'll import some dashboards. And it'll all just go amazingly well and nothing will break. Right, Mark? Absolutely nothing. Alright. Okay. So I've also prepared here the download page for Grafana. So if anyone is curious, you can follow along. But this is just grafana.com/grafana/download. I'm assuming the most recent version is this one here from the 10/28/1973 o. Yeah. Which is today. Oh, so it is. So congratulations. You get to try out 7.3.

8:10 Installing Grafana

8:29 I mean, that sounds kinda dangerous, but I love it. So we're gonna do that. So we should get a terminal on the first machine. So we're only gonna one run one Grafana. I'll run that on the same node as Prometheus and m fox d b. So step one, get on my machine. And this is a bintu. That means I can just run this. Yeah. Yeah. Sure work. Things always start off so easy just to kinda lure you into a false sense of security and then it all goes downhill. That's what I find every time I write

9:11 code. Like, I think I've got this. This is easy. And then Yeah. It's always the details they get you. Oh, Grafana server. Okay. Nice. There you go. So by default, what port is my Grafana gonna run on? It's gonna be port 3,000. Three thousand. You should yeah. You should be able to access it from there. Oh, that's my Prometheus. At least we know that works. Yeah. I really should have checked everything again. Yeah. So I didn't set up any credentials. What do I do here, Mark? Sure. So the first time you open up Grafana, you'll you'll be given a default

9:45 Login / Default authentication

9:58 admin account which is admin, a d m I n for both username and password. We'll have to change that right away. So there is Well, I think that you have Yeah. So if you're just Yeah. The responsible thing to do is just to change it right away. I had to change my name and strategy there for my fake password Perfect. I obviously typed it in front of everybody. Rawkode one two three. That is my Google password too. Good luck everyone. Awesome. Well, so yeah. I mean, once you logged in, first thing you'll see is the the home

10:16 Navigating the UI

10:38 dashboard as we call it, which is a dashboard just like the ones that you will create, but it's a little bit more specific. So it kinda helps you get going, where to start, where to go next. So yeah. So we're not gonna follow this tutorial. I mean, we're we're just gonna dive straight on another data source. Right? We're feeling pretty confident. I think so. Let's let's go. Alright. So we can see that Grafana supports Prometheus, Grafana, OpenTDSDB, OpenTDSDB, and FluxDB, and then oh, that there's loads here. I'm not gonna list all of I mean, what

11:05 Adding Prometheus Data Source

11:15 what I actually think is really cool about Grafana is that it's it's, you know, it started with basically just being a visualization tool for graphite, but it's grown to support so many other data sources. And by now it's it's not that so when I said that, like seeing it as a visualization towards just the tip of the iceberg, I really mean it. It's like, right now it's a platform where you you you can connect all different type of databases. Grafana doesn't actually store anything itself. So all the data is outside Grafana, you control it. And if your database is

11:51 not supported, you can build a plugin to make it appear in Grafana as well. So this is just the ones that are built in but yeah, you can write your own integration. Alright. Okay. So let's just start with Prometheus. This is the first one on the list. I'm assuming that's because most people are running Prometheus. Like is there any rhyme or reason to why Prometheus is is first? Yeah. So so so we are seeing a lot of users using Prometheus and Influx and Grafite, OpenTSDB. So they they are definitely the most popular ones. That's pretty much it.

12:31 We don't favor any database over another but we are seeing a lot of people using Prometheus for sure. Alright. Let me just make sure I still got this IP address. So my Prometheus should be available on port 9090. Yeah. That's right. And we've got access server or browser. Yeah. So this is actually something that tend to trip some people up that the access mode is really how Grafana accesses, know, your your server, in this case Prometheus. So Grafana is a server component and a front end component. So you can decide whether you want to access your

13:22 data source through the browser or through the server. So typically, yeah, you would pick server because of security reasons for example, you might not want your users to authenticate through the browser for example. Yeah. So I'm assuming if I set any sort of authentication here and set browser, does the browser become aware of that authentication? So going for the server is just a safer way of doing that? So you could still use server and basic auth. So that would mean that the server, the request originates from the server but you still need to authenticate to the endpoints

14:03 that you're you're getting the metrics from. Okay. No problem. So I haven't configured any TLS, anything like that or CAs. This is a a pretty wild setup that I did very quickly. So I'm assuming I can just have save and test here and we can just we get a green little banner there. So I'm pretty comfortable. I love it. Yeah. Good. And I'm just gonna add the influx one while I'm here too. So oh, okay. I can pick between influx q l and flux. Well, I'm gonna go with flux. Yeah. That's that's that's in beta right now. So we'll see

14:24 Adding InfluxDB Data Source (Flux Beta)

14:41 how that goes. Token, token, token. Good question. I would just pop over here for thirty seconds and grab my token and a nice secure friendly. And I even remembered the influx password. Generally, that's the thing I forget. Okay. Mhmm. So copy. Perfect. I'm pretty sure the bucket was Rawkode as well. So if I hit save and test, we get green. Wonderful. Cool. That means I haven't broken anything since I set it all up, which I'm happy with. Alright. So we have two data sources. I mean, I I guess now what I wanna do is is start creating some of

15:22 Exploring Data with Grafana Explore (Prometheus & InfluxDB attempts)

15:36 these metrics that I've been writing into these TSTPs for the last couple of hours. What's step on there? I think yeah. The the the first step is to check out the explorer. So in earlier versions of Grafana, you would go straight to create a dashboard. But it does involve a little bit of setting up. So if you're just want to explore, play around with the data, look what's there, I think explore is a great first step. Yeah, so at the top you can select which one of the data sources you wanna query. And you get a little cheat sheet

15:40 Exploring metrics

16:16 as to what you can query. I think you can click on them to auto complete them but Alright. So we'll with Prometheus because there's no one fox d b cheat sheet and that means I'm gonna have to remember stuff. Which I'm which I'm happy to I I will remember of course. But we'll start with Prometheus. Yeah. Sure. And see. So if I click on metrics here. So it's actually gonna list all of the metrics names that we have available. Yeah. You don't actually have to. Okay. So let's start with Yeah. You can check out Node maybe.

16:58 Yeah. You you you get a lot of metrics from the Node exporter. Just pick your favorite one. Yeah. That's a good one. I Yeah. So it looks like it's running. You see the little progress up there? Yeah. There you go. Now that says showing only 20 time series show. Okay. So that takes quite a while there because there's sorry. I'm not I'm not really sure. Why does that take a little while? What does eleven fifty two mean here? That's a lot of time series you got there. And the most likely scenario is that you have

17:39 a lot of labels that are very that are varying a lot. I'm trying to see what kind of labels you have. See if I can zoom in on this a little bit. Computer says no. Why won't you zoom? Let's try it from this photo here. So what you could do, it could be that you have a Yeah, you could have a lot of That's a good question. So for every value of a label here, you get another time series. So even though it looks like we've only queried one single time series in node CPU seconds

18:29 total, that is actually gonna return a bunch of time series for every value, you know, every permutation of the the labels that you have down there. So we might if you know, maybe we can limit the instance. I'm not sure how many instances you're getting the metrics from. We can also limit the mode. So maybe maybe we want to see how much CPU we're currently using maybe. So let's I think you can select the idle mode and just Well, my tab there was giving up on life. Uh-huh. Alright. I may even pop this open and Chrome if

19:16 that happens again. But let's see what happens now. Yeah. Alright. So Sure. Let's pick something that's maybe not as verbose to start with. Let's try up. Right? That should just be a single. There we go. Yeah. Nice and fast. There you go. Yeah. So the up is a great, you know, good time series to just check that everything is working. You can alert on it, you know. So it's a good good start. Looks like it's working though. Yeah. And each one of these is just my each copy of the node exporter, I guess. Yeah. I think so.

19:54 Let's see why I've got four here. So I think Prometheus scraping itself plus I've got a node export machine and then three. Yeah. Okay. Cool. Yeah. Alright. Let's try taking a look at I don't know what APT is. Could it be Aptitude from Ubuntu? I don't know. I don't know. Maybe it's actually Maybe the node exporter does some apps. Yeah. Maybe. I'm not sure. Oh yeah. Upgrades pending. There we go. Oh, that's cool. I didn't know that. Oh, well, we'll learn together. So this is okay. So it's just yeah. It's obviously checking the app cache behind the

20:32 scenes pulling down the updates. Yeah. Alright. How do we try and so when we were looking at the those node metrics and maybe I'll just pick another one and hopefully we get a bit luckier with a number. Let's try this guy on now. Yeah. See this look at that. A blurb. Although this isn't a counter, which is what I was gonna say. Because it Grafana gave me a hint. It was like, hey, this is a counter type. You probably you don't want to view this raw. You actually want to calculate some sort of aggregation over time.

21:06 So So Let's try. Not a counter. Maybe I'll risk it just to CPU again. Although, I think base total should be a counter to be fair. Yeah. That that's a heavy one. I mean, we could try to limit the number of time series, but yeah, that's a good question. So yeah, it gave me this here. So it says metric node CPU seconds total as a counter. So I'm assuming it's understanding what Prometheus is returning over the API and then trying to give me a little bit of hints about how I should actually visualize this. So if I run a rate over

21:49 five minutes. Yeah. You what you could also do is just to limit for like one machine or something. If you just wanna make it a little more responsive, Heather. Okay. Let let's do that then. So let's see if I remember how to do this. If I grab the IP address, I can come in here. Yeah. There we go. Exactly. And And instance. Instance equals. Oh, it's actually giving me complete. Nice. Yeah. So I can just hit run. Yeah. See if that It's still quite chunky. That is curious. Yeah. I wonder why you're a lot of time series. So Oh, how

22:50 many CPUs do you have? Yeah. It looks like you have 10 CPUs on every Yeah. That's gonna multiply the time series, but I think it's fine. Okay. Well, we've added the Prometheus data source. We've done a little bit of querying on the Prometheus side. Let's jump over to the influx DB side and then let's take and then I think from conversations that we've had is that there's some dashboards that are available online open source that would actually allow us to create a quick step or monitoring of these machines. So we can take a look at that

23:25 too. And I'm gonna close this tab because it just the minute I do that query, it breaks. And I think it's just my setup. I don't think it's anything Grafana related. Alright. So let's take a look. This is the influx DB query thing. So this is the new beta stuff that you you said is is pretty new. So this is working with influx DB two and their new query language flux. Now let's see if I remember how to do the thing. Okay. Forget if I could zoom this in. Oh, there we go. I may have to zoom it back out

23:38 Creating a Dashboard and Adding Panels

24:05 when I'm done. But so the flux syntax, I can see from bucket range start one day ago. In fact, let's just do one hour and then filter. I I could just do that. Right? And that will return all the metrics. That's a good question. I haven't had the chance to play with flux yet. So I'm in your hands right now. That's alright. I've got at least 5% confidence in my ability. Good enough. Did it run? Yeah. It did. Okay. So I can see we've got CPU. Oh. Yeah. It did. Okay. So let's just filter this on that same metric filter

25:02 function row row dot CPU usage guest. Fine. Let's just let's see what happens. Alright. Run. Have I broke it? Looks like it ran. So I'm a little bit confused by this graph here. This is showing I mean, it's not my filter here. Right? So I said filter only show me this measurement where CPU usage gets nice, but I can still see the bolt d b one here. I mean, how is there any way for me to debug this? If you try first try to run the query again. Maybe Yeah. Doesn't okay. That's a good question.

26:08 You can try the query inspector that that you can if you look under the query field to the right, you go. If you check the core inspector, I think you can see what the actual yeah, that's what you're getting back. You can also check what your request was if you're Alright. I'm gonna cheat. If you're unsure. Let's pop over here. I'm gonna just copy a query so that it's not I don't need to remember anything. So let's hit go. Maybe I'm maybe I'm monitoring too much. Alright. Go away. So let's run this again. There we go. Now I just need to fix

27:00 this because these variables won't exist. Stop as an important. Let's see if that works. So if I just pop open this response, got a data frame. I have no idea what that means. Yeah. That that's not very helpful, admit. That's yeah. That's that's the action the the the encoded data frame. That's not gonna help be helpful for you right now. Yeah. If you close that okay. It didn't seem to get any time series from that. Alright. Let's forget that then. Wait. We just forget the index speedy stuff never happened because of okay. So Grafana has this ability to explore metrics. We

27:40 Building Dashboards

27:49 kinda got that working. Mhmm. Probably it would have helped if I understood the data model a little bit more about what I'm I'm throwing into this thing, but we're just making up as we go along. Mhmm. Now the explore visual the explore thing here. Right? That's really good for kind of ad hoc. Like, I don't really know what I wanna understand yet. Right? Like, would you say that the default application of Grafana would be that, you know, you go to dashboard. So you you pre built so you understand the systems a little bit better. That makes sense?

28:18 Yeah, for sure. If you're not really sure yet what you wanna monitor, I would say explore is great. Dashboards is kinda, you know, curated insights. You know, you you you figure out what you want and then you curate a dashboard with the information that you actually want to see in in a like a at a glance view. So yeah, I think the next step is to go in and create a dashboard possibly. Okay. So let's see. Yeah. You can go to the plus in the side menu, that works too. Yeah. Alright. So dashboard has something called panel. I

28:49 Panel Visualization Options

29:05 click add panel. And this okay. So I can just build up my query the same way I did on the explorer view. Right? Yes. Yes. So if I wanna track the app upgrades I've been pending over time Yep. Then we can see that graph pretty quickly there. Alright. Yeah. So it yeah. It's basically the same as you had in explorer. So you can you can take that into I I think at some point, if not already, we we should have a feature to go from explore and then just create it in the dashboard as well. I know that you

29:42 can go from the panel to explore mode, but I think that we're working on getting the two ways as well. Okay, that makes sense. So how much control do I have over this visualization? Can I tweak this? Can I change What are my options? Yeah, sure. So we basically have three parts to what we're seeing right now. Right now we're in the edit mode of a panel. And I guess the simplest way of explaining is that a panel is really a query and a visualization. And what you see where you already went was the query editor, that's the one that's down

30:21 to the left. And then you have to the right, you have the panel editor, which is responsible for controlling how we visualize data. So you can set the title, you can add a description and stuff like that. That is more general for all panels. And then the next session section you'll see is yeah. I'm just clicking stuff. Sorry. Nice. Very nice. So the next section is the visualization section. And then here you will actually get the chance to select a visualization that is more appropriate for the data you wanna visualize. So you got a bunch of built in

31:02 ones here, but again, you can extend this with more visualizations from plugins. But in this case, I think we can just go with the graph for now. Yeah, think that makes sense. Yeah. And for the So the next sections are more specific to this visualization type, which is the graph. So, yeah, you can you can set it to be bars, lines, got points, can set all kinds of stuff here. I usually go when it when it comes to, you know, time series, I usually prefer the lines. And this is just a personal taste, but I I actually like to remove the the

31:47 the area fill. So I don't actually see I only see the lines, but it's totally up to you. You can you can choose however you like. If you wanna get fancy, you can even set a gradient for the the fill as well. So Oh. Yeah, exactly. Killer feature, I know. Yeah, these things are important when you gotta look at them all day. Exactly. Also staircase could be a good option for you as well depending on how your data looks like. So if you only have discrete points, you don't wanna show them interpolated. So staircase helps you to really show where

32:36 the points are. And what have we? Yeah. You can can stack the the time series as well. Of course, stacking has some problems as well in in terms of understanding what is actually happening in the graph. But if if your use case needs stacking visualization, then there you go. Yeah. You can you can change the the units of each axis. So you would have to know up ahead of time what unit the data source is returning. Yeah. We're we're working on adding more, you know, hints from the data source to let, you know, the the

33:34 the visualization do better guess at what units should be set. But for now, just just you have to know what what unit. If you're using Prometheus then then you have a little idea because usually you see bytes or you get you see seconds or whatever could be. Okay. But yeah, so you have some options here. And for the graph, if you scroll down you have something called series overrides which is really the next, if you really want to dig in there, I think you have to scroll up a little, no, a little more, there. There we go. Series overrides.

34:17 And this lets you set custom settings for each time series. So do a regular expression for the time series you wanna adjust for. So you can set one time series to be points or another to be bars and so on. So if you really wanna dig into it, that's where you go. This, a little background, the series overrides was something that's been very powerful, but it's really been only available for a couple of visualizations, graph being one of them. So for seven point o, this is now possible for any visualization. And so that's the tabs up there

35:06 that you see, that's kind of the next steps. So that's the evolution of series overrides. So there's a there's a lot of work go moving over to a new platform in in Grafana. And the actually the graph is you'd think that's the first thing that you would migrate to the new platform, but it's actually one of the trickiest parts of Grafana. So we're working on it right now. So it's not been completely migrated to the you know, field overrides model yet. Okay. So I've got a suggestion then. Right now, complete serendipity, I guess, but we're we're actually graphing something

35:53 Configuring Alerts and Notification Channels (using Apt Upgrades)

36:00 that's very easy for us to change. I mean, CPU would have been the same. We could have ran up a process on it and try to spike it or whatever. But the upgrades is actually probably the easiest thing because I can just go and upgrade one of these boxes and really change this graph instantly. So what I've seen here and this is where I'm gonna throw an idea at you and we'll see how painful this might be. But I see alerts and I'm wondering, can we create an alert that when a package when the number of packages available for upgrade

36:15 Alerting on metrics

36:25 goes to zero, we do something and like Yeah. We can try. The thing. Like You can try. Alright. So if I say zero packages is the name of my other I'm just making this up. Feel free to correct whenever I get this wrong. But see how I valued it. Can I do five seconds? Yeah. Should be able to. And we're looking at the last five minutes. If I I mean, I could just match that. Right? So it's Yeah. So the yeah. Four is if you if you put a really high number or duration on four, then we're not gonna see anything for

36:58 five minutes probably. So it makes sense to I can also set the last value here as well. So I guess that means if the four was over a wider period, but we're only looking at the last value which should hopefully be zero when I run this and Yes. But So evaluate every five seconds, that's just gonna evaluate that conditions. But in order for that alert to trigger, it needs to be evaluated to true for a full five minutes or five seconds in this case. Before you will actually get an alert. So that's why you want in this case,

37:36 you want something low. That makes sense. I get that. I was completely wrong with my naive understanding there, but I've got it now. Okay. So with this, when last of query I mean, can I drop this down? Let's say 10. We try and get this alert to go quickly and Mhmm. See when you just click stuff and it's just a quite intuitive. It's Mhmm. It's great. So I'm gonna say below one. No. No data. We can ignore that. So send alert to. Yeah. So you would have to set up a notification channel. So you would have somewhere to send

38:15 this. Ah, okay. So is that something I can just click go from here? Yes. So you would go to No. You have another tab in the side menu. There you go. Notification channels. What are my options and email? Oh, then I'll have to configure some sort of Yeah. To h t t p? Yes. You can. You can do you can do a web hook. So it sends a post to a URL. So one of my friends built this thing called rbox. App that gives you an endpoint. Oh, that's that's that's awesome. The service I've been using for doing that,

39:05 they have started charging, I believe. So I haven't really been using that lately. Okay. I'll look at this one. Rawkode. So I I had test. I refreshed Rawkode and I can see the request came through just fine. So we now have an alerted endpoint. Right? Is that it? Yeah. So now we have somewhere to send the alert. So you should be able to wonder if you might have to update this. So if you go ahead and save the the dashboard first. I don't think we did that. Let's call this David. Save. Yeah. Cool. So if you go back

39:44 now and and sign the Okay. The alert channel. So my alert is still here. The only thing I need to do now is, oh yeah. Send it to our box. Message details. No updates. Yeah. Let's see if emojis work. Test rule. Yeah. So that tell me that it's not gonna fire, which I guess is to be expected right now. Right? Firing false, state okay, condition of false, false equals false. Wait. I wonder if So if you check your R box, see Did you actually get a request? I think you can Let's see. Yeah. And that is the first request.

40:35 Yes. That's that's just the test request I think. Isn't it? Yeah. Yeah. So we never got another. Yeah. So let's let's say I view the last five minutes and this So we can turn on auto refresh in five seconds. Mhmm. So in theory, if making stuff up, find my app. Oh, no back. There we go. I can run upgrades. It's going pretty fast. This is refreshing. Let's just make this a bit bigger. I'm assuming within the next five, ten, fifteen ish seconds, we should see one of these time series essentially fall off a cliff down to zero.

41:26 Right? Maybe? Maybe? Oh, it's still running. Oh, there's a kernel upgrade in there. So that may just take a few seconds. Oh, we have a couple more comments. Sheidan says, hey, folks. Hey, Sheidan. Thank you for joining us. I'm not sure at what point you got your monkey eyes closed and mind blowing, but that's cool. Thanks. Okay. So our app upgrade is finished. Let's see if we get another refresh here. And not what I was expecting. So Well, it could also be depending on the scrape interval of Prometheus. Or if it's set to fifteen or thirty

42:12 seconds, it could take up to that long for Prometheus actually. Oh, yeah. That's a good point actually. The scrape interval in Prometheus is set to ten seconds. Okay. So yeah, I guess we may just have to There you go. I think, yeah, something happened. Oh, no. Okay. God. Diagonal bit. And then let's just do the blast everything approach. Everything is. Okay. So we should have a zero. We should have a zero. Might not be what we what we thought it would be. Yeah. That's what I get for making it. I just assumed, which is obviously the first problem that I

42:58 made here. That this was tracking the number of dependencies the app thought was available. Alright. Let's go to let's okay. So we're still making progress here. We've got a dashboard now. Let's save this. Save. I can click on David here and there's my dashboard. So I can add another panel. Right? Mhmm. Here. Okay. Let's do maybe do something a little bit more traditional. So something I guess we'd want to alert on would be the the disk utilization. Let's go with that. Right? I can I can write a massive fail for the disk? So in order to alert on that, what

43:11 Adding Disk Usage Panel and Alerting

43:33 we wanna do is pick a metric. So this is gonna be node. Yeah. I think you have in some file system. Was it available? Disk. So that's regrade bytes. Alright. Let's see what have we got available here. Oh. I really should have switched to Chrome. I've been persistent with this with Valid thing for a while. Oh, so that's the sky. Oh, yeah. We probably don't want that. I think you want this file Yeah. That one. Oh, we can actually see. Was that my app running? That we because we would have modified the disk. Right? So the baits that are available

44:21 would would have dropped and then Yeah. To remove the cache and then it would have freed up a little bit. So okay. So cool. We can we can we can identify that. So let's and we can see the three root file systems from each of my devices. So I'm just gonna hit apply. And can I put these state by state? Yeah. I think it's easier if you resize them first. Okay. Cool. There you go. So let's try and see if right into our desk is gonna cause this. We've still got our five second refresh going on. We can still see this

45:04 that. I've dragged them side by side but because I probably actually wanna make that bigger for everybody. Right? Because the zoom isn't too great on this browser. So there we go. Alright. Let's write big fail. So input dev random output my big fail. Did you set an alert? Was that what you wanted to do? Or or you I just wanted to confirm that this worked first. Oh, gotcha. Gotcha. So in theory, we should wait the ten seconds for the Prometheus node export to detect Yeah. Makes new values or five second refreshes running here. I'm assuming this is gonna be

45:44 the same. This blue line should be the same machine, the same server. Yep. And there we go. We can start to see that lane just stepping on the end there. Right? Yeah. I'm not making that up. Alright. So let's add an alert. So what do we got bates wise here? 219,000,000,000. Oh, there we go. It's dropping. But we've got enough disk space there that we can survive and weather a little bit of that d d running. So let's oh. Edit. Alert. Create. So I guess I just do last again. Five seconds five seconds. We set this to 10

46:32 and is below. Is there an easier way for me to kind of have a relative value to where it is right now at all? Or am I just gonna have to start typing numbers? Check the the condition if you might have if you like instead of last, I'm thinking you might have a percent def maybe. I haven't really used it so I'm learning now as well. I I think that's what you want. Like if it drops drops too fast for example. Okay. So if it drops more than well, it's not gonna drop 10% in ten

47:23 seconds. So let's just go with the big number approach. At least then I don't have to build up a mental model of what I'm actually expecting to happen. So 217,000,000,000. 2 1 6 1 2 3. So that's 200,000, 2 million, 20 million, 2 hundred million, 2 billion. Right? Mhmm. Yeah. You can also see where the alert is in the graph right now. So Oh, nice. Oh, Yeah. You can even drag that with the handle if you wanna adjust it down to the right on the actual pop up. Yeah. I think yeah. You can probably drag the whole Done.

48:06 Yeah. Alright. In fact, I can just it's it's going pretty fast. Okay. Let's stop the thing. Right? Yeah. What you can do is also to set a a a minimal value of the the the axis so you don't have to fight. Because right now it's it's taking the, you know the the it's auto adjusting. So you can instead set a fixed min value of the axis to help you with that. Alright. Well, so we're not writing to the disc anymore. So let's just leave that there. That's just below where we want it to be. I've clicked our box.

48:48 We'll see disk going to fall over. And apply. Done. And is this my alert lane that's here? Yeah. That that's the threshold of where it will start, you know, evaluating. So the the alert might not trigger right away depending on our settings to you know, to avoid spikes. So you know, if if it spikes over the or under in this case, the the threshold line, then and then it's back to normal, then we might not, you know, send the alarm just yet. So we might want to have it triggering for for a few minutes at least or

49:00 Verifying Alert Trigger (Apt Upgrades)

49:38 So I've noticed something. There's a broken heart on this one. Yes. What does that mean? That means that one of your alerts is firing right now. So it finally noticed my updates. Is that right? Yeah. I think so. Right. Well, let's let's probably task. So let's run that one again. We'll cause this one to get me a broken heart. Can I zoom in? Oh, I can. Yeah. So if you hover those little triangles at the bottom, you can see yeah. Oh yeah. App update is pending zero. Okay. So it just took a little bit of time

50:12 whatever's happening with the Prometheus export. You know, the Prometheus export probably doesn't check for updates in the app cache every ten seconds. Like that would be probably have a negative performance on my machine. Like it probably does that over a wider interval. Couldn't yeah. That's a plausible reason. I'll I'll go with it. So in theory, if I refresh here. Yeah. We do. There's a little smiley. There he goes. It worked the first name. Says, that's so cool. Yeah. I'm glad that worked the spell. I wasn't nervous. So awesome. So the app update trick worked. We

50:53 can come back to the main dashboard. Is my DD running? It is woah. It went the wrong way. I'm not sure. For a bit. Don't have no idea why. Oh. Because I just blew away this device by rewriting over the top. Oh, yeah. Yeah. Yeah. Causing actually freed up to this space, which was clever. And now it's got to climb back down to where the thing is here. So let's just leave that for a few minutes. We have already seen a work in the layer. That was really cool. I'm happy with that. Yeah. Let's we've got about ten minutes left.

51:30 Importing a community dashboard from

51:30 So shall we import a dashboard that maybe makes a bit more sense of our metrics than us? Yeah. We'll try it. We'll try it. So if if you go to grafana.com/dashboards, you've got a ton of community built dashboards that you can use. I think there is one that if you just search for node export exporter. And if you search or sort by downloads, I think that you're gonna get one that is pretty Yeah, the the first one there. If you click that one. So if you look in at the the the right, you see copy ID to clipboard.

52:16 So now we can go to Grafana again. If you just switch tab. Yep. And then go to, yes, that one, manage. And then you can click import to the right there. Yep. And then in the in the top field yep. I mean it's just so intuitive. Yes. And you need to set the the data source you wanna use. But that should be it. Oh, there you go. There you go. Thank you community. Yeah. There's no way I was building that and that that's comprehensive. Wow. Yeah. It's got a lot of Would would you mind just like there's

53:08 Explaining the different visualizations

53:12 more than just the graphs that we've looked at so far here. Can you just maybe pack out a few of these different visualizations and and tell us what they are? Yeah. Sure. So the on the top we see something called a gauge. So the gauge is typically like something you would see in a car, know. It has a lower number and a higher number. So it lets you visualize something that can, you know, something that has a upper limit and a lower limit, basically. So disk usage, for example. If you wanna know that wait, you're reaching a certain

53:53 percentage of disk available or something like that, then you could use a gauge because you can set thresholds. Actually, if you go to one of the the gauges and edit them, I think that we can adjust the thresholds. I believe if you go to field, the field tab to the right, yeah. And then you scroll scroll down to the threshold sections. Yeah. You can change that as well. So now the maximum, yeah, would be 10%. If I set it to, yeah, it's almost full. Yeah, exactly. Alright. I'll put it back to a hundred. Yeah. Alright. And if you scroll down to

54:44 the the threshold sections that you can set those orange, when it goes to orange, when it goes to red. So you can try it now if you wanna just update those to something, yeah, like that. Alright. So let's say red is seven, we should be orange. Alright. Nice. Okay. Cool. Yeah. So it's up to you. It's just a visual indication of when things are starting to go wrong to catch them ahead of time. Oh, you can actually do percentages as well. Yep. So it can be a bit more dynamic then depending on that means, you know, if I'm

55:30 setting the max here, then the percentage do seem like a much better workflow here. Cool. Nice. Yeah. Let's see what else we have. And another one that's very heavily used as the staff panel, which I think you got to the yeah. The This one? This one. Yep. So if you edit edit yeah. I think you can edit that one. I could have I could click. There we go. Yeah. Compare it so hard for me. So this dashboard is using the the old version of this panel. So that's why it's asking you to migrate to the newer version in this case. Can

55:37 Exploring Panel Types (Gauge, Stat) and React Migration

56:08 I just click that? Good question. Try it. Yo, there you go. Easy as pie. So what's different between the old what's this called? A stat? A stat and then your stat? Yes. So there the main difference is the underlying technology. Grafana used to be written angular and is now migrated to React. So this led us to do a lot of bunch of new and cool things. And that's that's one of the major reasons. Okay. I would say If I know React, I can write my own visualizations, my own Oh, for sure. Panels. Nice. For sure. So,

56:52 yeah, you can use A panel visualization is basically just a React component with custom properties. So for sure you can write your own one. But yeah, so the stat panel is good for if you have like these KPIs that you wanna report to upper management. How many concurrent users do we have at this moment? Like numbers that you wanna copy real quick. That's that's a great use of stat panels. But as you can see, graphs are really the yeah. That's really the main one that you'll want to use. But it's just so interesting you know, going

57:18 Reviewing the Imported Dashboard

57:41 through these and seeing how my machine is is operating by looking at all these you know, these these peaks and troughs and you know, there's a lot of things that are pretty static. You know, the temperature obviously isn't gonna change too much. System d, I mean For sure it's a different kind of doom doom scrolling. Yeah. You you can you can sit here. You can you can see that d d command kinda consistently across as as we scroll down. At least the first spike on I'm not sure what's going on with the most recent one again.

58:09 I could just do my Prometheus setup, which this is all really cool. So Grafana, you know, based on what we've covered, I'll just pop back over here. You know, based on what we've covered now and like an hour Grafana is just this multi data source visualization and dashboarding tool. But so much more beyond that. Like, you know, we never even touched on the transformations, but there's a whole lot of things that we can do and these panels with the data on the fly and the alerting was really cool as well. Like there's just so much power in this. Like

58:20 Closing

58:43 I think we're easily gonna have to do another six of these and I'm just gonna send you up right now. So Yeah. That's okay. I'll I'll be happy to come back. Alright. Can do a lot more. Do you have any closing thoughts before we before we finish up for today? No. Just that if you're if you wanna get started with Grafana and if you're having any issues, feel free to reach out to me on Twitter or any any place where you see my name. Just and I'll get you get you if I'm not able to answer myself, I can certainly

59:17 get you to the person who can. Alright. Well, thank you very much for joining me today. It's been an absolute pleasure. You know, I'm looking forward to doing more sessions on Grafana and starting to help people unleash a little bit more of the power that it brings available to our infrastructure. So thanks again. If anyone has any questions, leave them in the comments. I'll make sure that we try and follow-up afterwards as well. And stay tuned, we will be back with more Grafana in the future. Thanks again. Have a nice day, Marcus. You too. Bye. Bye.

Technologies featured

Meet the Cast

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Comments, transcript, and resources

More from Rawkode Live

View all 173 episodes

More about Grafana

View all 20 videos
Prometheus

More about Prometheus

View all 26 videos
InfluxDB

More about InfluxDB

View all 8 videos