Part 4 - Tutorial 3: Collecting Metrics with Telegraf

Watch / Tutorial On demand

The embedded player needs JavaScript.

Open the video stream (HLS) Download captions (VTT)

Overview

About this video

What You'll Learn

Configure Telegraf with a remote TOML file hosted on a public endpoint.
Tune batch size and buffer limits to improve metric collection resilience.
Use collection jitter and Prometheus inputs to spread load and scrape metrics.

Walk through Telegraf running against InfluxDB 2: remote TOML config, agent options like batch size, buffer limit, jitter and precision, fanning out to a second file output, and adding the Prometheus input plugin to scrape a local endpoint.

Chapters

Jump to a chapter

Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

0:06 Introduction to Collecting Metrics with Telegraf

0:06 Okay. Welcome to part four of the complete guide to InfluxDB two. In this tutorial, we're gonna be taking a look at collecting metrics with Telegraph. Telegraph is one of my most favoritest tools ever because it has just such extensive support for collecting metrics from anywhere, and I mean almost anywhere. I've given many talks on Telegraph over the years, specifically around cloud native architectures and patterns for scaling your metric collection pipeline. I will be giving that lecture on this course again with some new details and tips, so stay tuned for that. Today, we want to understand how to start collecting metrics with

0:55 Telegraph. Here is our InfluxDB that we have set up previously, and we are not collecting any metrics. We have the buckets that we created on the last tutorial, but that's it. Now we can jump over to our Telegraph configuration here and we'll just delete the one that we created when we were experimenting. We'll do that again. We're going to create a system one, we're going to store it for one hour and we're going to click continue. We will name this system monitoring, some good description, and we will click create. Now, what we're to do this time is

1:11 Setting Up and Running Telegraf via Remote Config

1:34 copy this to the clipboard and export it, and copy this and run it. So what we're doing here is exporting our Influx token, which I now see that is slightly covered by the header, but that's okay. We're using something in Telegraph that's relatively new, which is the remote configuration and that you can actually store your Telegraph TOML files as GitHub guests on static site hosts like S3, GCS, Netlify, any of these things, and just tell Telegraph to consume them, provided they're in a public endpoint or something that you have secured locally. There's also options for providing authentication for these

2:15 endpoints, but we won't be covering that today. So we can hit return and our Telegraph has started. And what we should see, hopefully, in around ten seconds time, that it begins to write metrics to our InfluxDB instance. Now some useful output from Telegraph when you get started is the loaded inputs. Really, you want to ensure that the plugins that you have enabled appear here. We're not using aggregators or processors, although we will have tutorials on them later, and we are writing our data to InfluxDB v2. And if we pop over here and say listen for data,

2:53 Telegraf Configuration

2:57 we'll see connection found. So it looks like everything is working just as expected. We can click on one hour and now we see our measurements showing up in our InfluxDB. We'll change the window to five minutes and we can see we have this nice steady line for the usage guest. And if we take a look at network, it may be a bit more active, perhaps not, but that's okay. And we can start to see those metrics coming in now, and of course my face is covering up the most useful bit. Now, configuring and working with Telegraph is really

3:33 Exploring Telegraf Agent Configuration Options

3:36 simple, straightforward, trivial. You just have to configure it a couple of times to really get to grips with what you're doing. So we're to make a few changes to our Telegraph config. We're going to come to here, we're going to click on it, and this allows us to copy and paste the whole thing. Like so. So there's a fair amount of comments, but you'll see here that we have the agent configuration block, which is denoted here by a single square parens. We can configure the interval at which we wish to collect our metrics. We can set the round interval.

4:16 This is great when you have Telegraph running across multiple nodes in your infrastructure and you want the time alignment on your metrics to work. So that means that if you have your interval set ten seconds, as the example here is kind of showing us, that every point that will be written to our InfluxDB will be on the hour with zero seconds, ten seconds, twenty seconds, thirty seconds, forty seconds, fifty seconds, and then back to zero. Just means you don't have to do any juggling if the telegraphs are not all started up precisely at the same time across your machines.

4:53 Now, two of my favorite flags the Telegraph config are the metric batch size and the metric buffer limit. This is how you get your resiliency when you are collecting metrics with Telegraph. The metric batch size, Telegraph will attempt to collect that many metrics and write them to the output so that you're not writing a single metric with every request. It's going to increase the throughput and performance of your Telegraph agent. The metric buffer limit is there in case your output should happen to be down or throw in errors. If your database is down and Telegraph cannot write to it, it's going

5:30 to hold up to 10,000 points in memory and drop the oldest if it comes to that. You can set this to whatever value you want. You really need to understand how important your metrics are and how much memory you have available for this telegraph to be able to store those in memory if required. We have a collection jitter. This is a parameter which allows each of the Telegraph plugins to stagger the request that they make to the system for metrics. You don't normally want to have ten, twenty, 30, a hundred, a thousand telegraph agents collecting

6:05 metrics from similar endpoints or similar systems and all hitting them at the same time. So we can jitter the collection just to avoid things like that. The flush interval allows us to see how often we should write to our output. We have a collection interval and then a flush interval, and we can jitter the flush as well. So if you do happen to have a hundred telegraphs running on a hundred machines, all right into a single database, you don't want them all to land at the exact same millisecond, potentially bad. So you can use a jitter

6:39 to stack with that too. Okay. Precision. You can tell Telegraph to store higher resolution metrics by storing the timestamp of collection in nanoseconds, milliseconds, microseconds, or seconds. Debug is the first thing we're going to change today, and we're going to enable that to true so that we can understand what is happening when we run Telegraph. No. Omet hostname is a flag that is entirely optional, and I want to talk about it right now. So if you're running Telegraph in an ephemeral environment where maybe it's containers, maybe it's Kubernetes, then storing the host name of where Telegraph

7:28 runs may not be that important to your overall metric strategy. However, in more traditional environments where you look after your infrastructure and you take care of each of those machines, then understanding the host name may be important. So if you're collecting operating system level metrics, yes, keep the host name there. If you're running an ephemeral environment as containers, probably best to just admit that you don't really want the pod name or container name to be available as a host name. It doesn't make a lot of sense there. Okay, now we're going to leave our output configured,

8:01 Configuring Multiple Outputs (e.g., File output)

8:05 but we're going to leverage something else that is very cool about Telegraph and that you can actually write your metrics to multiple locations. So as well as writing to InfluxDB two, we want to understand what these metrics look like so we can see them in our system. Maybe we're doing some sort of experimentation, testing, or debugging, and we can just say, write these files to standard out. So we're gonna have live feedback from Tenograph as it attempts, or as it is collecting these metrics. We'll sell right onto InfluxDB two. What I see a lot of people doing

8:38 is trying to InfluxDB two and InfluxDB one and maybe Thanos and maybe M3. You you may just be experimenting with what back end store you need for your metrics and that's okay. Telegraph can just hook in there and work with anything. We're not going go over the inputs, they're all mostly self explanatory. What I will show you is how to understand the inputs that are available and how to configure them. The best thing you can do is go to the Telegraph repository. That's github.com/influxdata/telegraph. From here, you can click on plugins. You will see inputs and outputs. Those are

9:01 Telegraf Plugins

9:14 the two that you will use the most with inputs being the one that you wanna be, maybe you're perhaps more curious about and want to understand. You can scroll up and down this and you'll see all of the systems that Telegraph knows how to fetch metrics from. What we'll do is we will go to Prometheus. We will click on this. We want to be able to collect some Prometheus metrics. You can see an example from the configuration of how to enable it. If there are any optional flags, they're all available here and we can uncomment them or

9:49 copy and paste the code that we need for the configuration. At the bottom of every readme for these plugins is an example of the output too. We'll see here, these are the metrics that are going to emitted by this plugin. We'll add one more metric to our configuration. And I have a Prometheus running on my local machine, and hopefully, we should see some metrics come from here. So now we're just going to run Telegraph like we did before, but instead of a remote configuration, we are going to use the telegraph.com that we have just edited and saved on

10:20 Running Telegraf with Local Configuration and Verifying New Metrics

10:30 our machine. We will hit return, and within a few seconds, we should start to see one extra debug output, which we already have here, and the output fail plugin, which is going to spit a whole bunch of metrics, like so to our terminal. This allows us to confirm that we're seeing what we expect to see. Now we can jump over to here, click cancel on this and explore, And now we have more metrics and measurements available inside of our OneAir bucket. You can see the Prometheus style ones all start with go underscore because it's using the Go runtime to understand

11:05 that Prometheus is healthy. There's some Prometheus specific ones below here. So we have configured Telegraph. We understand some of the agent configuration to change the interval, the flush interval, the batch and buffering, as well as any jitter that may be needed for our environment. We know how to use the Telegraph GitHub page to find new plugins and understand our configuration and output. And we can run Telegraph with remote configurations or local configurations to get metrics written to our InfluxDB. So, I hope that helps. We'll be doing more. So, I hope that helps. We'll be doing

11:13 Summary and Conclusion

11:44 much more with Telegraph throughout this course, and I look forward to sharing it with you more. Have a great day.

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Documentation

Telegraf Plugins Reference

Code

influxdata/telegraf GitHub Repository

More about InfluxDB

View all 8 videos

InfluxDB 3 & Rust

InfluxDB 3 & Rust

Dynamic Scrape Targets

Dynamic Scrape Targets

Part 2 - Tutorial 1: Installation

Part 2 - Tutorial 1: Installation

More about Telegraf

View all 4 videos

Part 3 - Tutorial 2: Getting Started with InfluxDB

Part 3 - Tutorial 2: Getting Started with InfluxDB

Part 5 - Workshop 1: Introduction to Flux

Part 5 - Workshop 1: Introduction to Flux

Part 1 - Lecture 1: Introduction to Time Series

Part 1 - Lecture 1: Introduction to Time Series