Observability for Developers: What You Need to Know?

Watch / Cloud Native Compass On demand

The embedded player needs JavaScript.

Open the video stream (HLS) Download captions (VTT)

Expand player Shrink player

Overview

About this video

What You'll Learn

Understand why developers own instrumentation and when observability teams should not own production code signals.
Compare OpenTelemetry auto and manual instrumentation, including when zero-touch coverage adds too much noise or technical debt.
Use sampling, feature flags, and collector routing to balance telemetry coverage with dollar cost, storage, and environmental impact.

Adriana Villela explains why developers, not observability teams, should instrument their own code. We cover OpenTelemetry's auto vs manual instrumentation, sampling and cost trade-offs, single-pane-of-glass backends, and the environmental impact of telemetry.

Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

0:00 On today's episode, we're talking about observability, which is basically how you figure out why your microservices are crying at 3AM in the morning. And we got to chat with Adriana, who's a principal developer advocate at Dynatrace, an open telemetry maintainer, and apparently enjoys climbing walls when she's not instrumenting code. He's also a CNCF ambassador who somehow finds time to podcast, blog, and contribute to open source while maintaining a full time job. I'm exhausted just thinking about it. We dive deep into OpenTelemetry, discuss why your observability teams shouldn't be instrumenting your code. Spoiler. Developers, it's your job.

0:42 And explore the eternal struggle between getting good observability data and not bankrupting your company. And, of course, David asked approximately 47 impossible questions about instrumentation strategies and cost optimization. Classic David. Hey. At least I didn't ask about rust this time. True. Though we did talk about environmental impact of observability, which was genuinely eye opening. We also learned that Sweden has the greenest data centers in the world and that swivel chair observability is not the goal. Who knew? Enjoy the episode. Thank you so much for joining us today on Cloud Native Compass, Adriana. For anyone who's

1:23 not familiar with your work or you, could you please take a moment just to tell us what you're up to and what you enjoy? Yeah. Sure. So my name is Adriana Vilela. I am a principal developer advocate at Dynatrace. I also have a podcast called Geeking Out. I blog a fair bit on Medium, and I work in the observability space. I have for, I guess, the last couple of jobs. I'm I'm heavily involved in OpenTelemetry. I'm one of the maintainers of the hotel end user SIG. And, by night, I like to climb walls. I I am into bouldering. I've sustained

2:06 a few injuries over the years as a result, including, I guess, my my most recent severe injury was an ankle sprain back at the October 2024. So, when I was at cube con in Salt Lake City, I was semi limping around because I think I was, like, two or three weeks fresh off the injuries. Yeah. Yeah. I I definitely practice something. You're sport. Very cool. A great hobby to be doing. I know there's quite a active bouldering and climbing community inside of the Cloud Native community as well. I know that Dan Fennerin is always talking about trying to get people

2:41 up to her Mhmm. Or pails or rock walls or whatever whenever we go to keep going. But I prefer the the quiet not breaking my ankles, breaking my ankle life, so I'll keep it this way. Anyway I don't blame you. Cool. Yeah. Just I I I'm your stereotypical geek. I don't I I prefer not to touch grass, but that's me. So, anyway, let's get back on topic. There's no point in digression right away off the bat. Right? So you're you're a wonderful person. We've met before, and I did some research before we decided to record this as well. And I ended up

3:14 on your bio page on GitHub, and the amount of talks and panels and blogging and podcasting that you do is amazing. Like, there this is I I can't believe the output and how much you're doing. First of all, thank you for everything that you're doing for the observability, open telemetry, and declarative space. But I'd love to know just how do you manage to find the time to juggle this with a full time job and be so present in the cloud native community? I think, to be honest, I'm lucky that my job allows me to build that into

3:47 what I do, because otherwise, I think I'd need, like, clones of myself running around, because it it is a lot. And even even with that, I just I like, my days my days are definitely full, but I think it's really important. You know? I'm a CNCF ambassador, and for me, it was really important when I was looking for my next role to be able to build that into what I do, build the, you know, the contributions to OpenTelemetry into the role. So I was very, thoughtful in in my job selection process. This is my second developer advocacy role. So before

4:30 that, I I I mean, I was a Java software developer for, like, sixteen years. I've led teams off and on. You know? I was like, manager, not manager. Decided I don't wanna be a manager. I'm much happier as an IC. But, you know, when I was looking for my next developer advocacy role, a lot of very nice, well meaning friends kept just throwing, like, whatever DevRel role they they saw, and it's like, database still have DevRel and, like, networks related DevRel. And it's like, cool, but that's not really my passion. So I wanted to make sure that I got

5:04 to continue doing the same things that I was super passionate about and interested in. And, I mean, I've spent, you know, I've been in tech for twenty four plus years, but, you know, all the stuff that you listed, I've only done in the last, like, years since I've, become a developer. And it it feels like this awakening of my career that I hadn't, you know I I didn't even know tech could be cool like that until I I got into cloud native and developer advocacy. So, yeah, I I definitely credit, my employer for for giving

5:38 for allowing me to build that stuff into into my job because otherwise Yeah. That would be a lot to Is this gonna have common misconception about developer relations and advocacy that, you know, we just go around conferences on our private jets and, you know, jump on a stage, have a drink, and it was, like, so easy. But it is a hard life trying to be that present with so many people and so many communities. My god. It has I mean, it's hard to say this. Right? You know? But traveling is hard, full stop. Doing it with intent

6:07 to educate It is. Inspire and help people and be engaging, It's it's a tough gig. So, yeah, I'm I don't envy that anymore. I've not been in DevRel for a while. I so it it was tough. Especially, I've got two young kids, so traveling a lot was very hard as well. Anyway Yeah. Yeah. Oh, yeah. With that with that needed stuff. A month now after Cape Cod in London, and I'm just gonna bring up a couple of the conversations that I had there with people because I think it leads us in nicely to what we wanna talk about,

6:37 which is observability. Now every year, we go to keep going. They do the keynotes. They ask the question of how many people are brand new to Kubernetes and and Cloud Native, and it's an overwhelming show of hands year on year. This year, no exception with 12 and a half dozen people. People are new to microservices, cloud native, Kubernetes containers, all of this. And they get promised this this ideology that if we work in microservices, their jobs and their lives will be easier because they're slinging less code with less responsibilities and all of this. But there's the downside of microservice and cloud

7:12 native, which is you now have 12,000 applications to keep running, a arbitrary number I'm throwing out there, and you need to know when that's broken. And I think the classic example that I love to talk about with people is, like, if you have a mono a monolith, you stick a health endpoint on it, and if it returns to 200, you can kind of assume that the application is running okay. When you have many microservices, understanding the health of a whole application is almost impossible to a certain degree, but then there's healthy for what actions and what systems,

7:42 what personas, etcetera. And this is where I think people need to start diving into observability and understanding how do we bring a little bit of sanity to the chaos that is a distributed system. And there's obviously observability. There's this recent conversation about observability two point o. There's open telemetry. There's a lot of moving parts in this space, and you're the expert. So for the people that are listening to this and they are on that journey to cloud native, can you help them? How do they bring some some sleep? How do they get more sleep while shipping microservice cloud native applications?

8:21 Sorry. That's a pure wide, argent, broad question that you're looking at. I I mean, I I think you you hit it spot on. Like, you know, with microservices architectures, now you've got all these moving part. Like, you have so many moving parts interacting with each other, and it's sometimes unpredictable. What might seem okay on the surface is, like, some weird ass crap going on deep down below that you might not be aware of. And, you know, observability is the way to help surface that. Right? And especially with distributed traces, I know, you know, observability itself has been a bit of a journey.

9:03 Right? Because I think a lot of people, they started with, like, traditional monitoring, which is, like, very sort of a reactive thing, and it's very, like, metrics based and and, you know, logs based and all that. And that's great. And, you know, we we gotta start somewhere. But then observability kind of opened the doors and basically said, hey. You know what? Let's look at everything in terms of, you know, like, distributed traces. I see the distributed traces, the star of observability because it gives us that end to end view of what's going on, right, from start to finish. You know, I love

9:40 using the example of, like, you you're shopping for shoes. You click on the button that says add to cart, and, you know, you have a distributed trace that that captures exactly what happens from the minute you click on that add to cart to when, the item is added to your cart. And the trace shows you exactly what is going on in each step of the way. Right? Right down to those database calls. And then you've got, like, your supporting actors. You've got your metrics, and you've got your logs. Your logs telling us, hey. These are

10:14 the things that happened along the way, along that journey. Right? They're almost like little bookmarks, point in time, captures of what's going on. And then and then you've got your metrics, which tell us things like how long did it take to get from this service call to this service call? How long did we wait spend, time waiting on that database to return? You know, we can even take it, more broadly and look at, like, how many shoes did we sell in the month of November compared to December? And is it because we saw some, like, performance blips because maybe November

10:55 was busier than than December. December was busier than November. So having that overall, view of what is going on, I think, helps make life so much easier. And what I will say with observability is I think, and I've I've written about this recently. I think sometimes observability is treated as, like, the sort of adjacent thing. It's either like an adjacent to, like, SRE thing or it's like, oh, it's an SRE concern. And it's like, yeah. It will definitely help your SREs understand your system, but, like, everyone has a part to play. Right? I mean, someone's instrumenting your code. It's

11:34 gotta be the developers. You're not gonna ask your SRE team to do it and or or, like, my my pet peeve favorite, the observability team will instrument your code and create dashboards for you. It's like, bro, I don't know what you're what you need. Like, I don't have that context. So how can you ask an observability team to do that? You know, developers need to instrument their own code. In instrumenting their code, it enables them to troubleshoot their own code, first of all. So two for the price of one. Secondly, when you hand off that code for testing,

12:11 your your testers can go in and say, oh, I found a bug. Because I have instrumented because the developers have instrumented the code, they can go in and say, hey. I found a bug and I know where it is. Or they can go, I found a bug. I don't know where it is. Developers, you need to go back and instrument the code because I don't have visibility into what's going on. And then by the time it gets to the SREs, hopefully, you know, if there's an incident or whatever, they have the information required to troubleshoot.

12:43 It's not gonna be perfect. I you know, observability is an iterative process, but everyone has a part to play. Right? And I think that's the thing that people need to really keep in mind. I also take issue with with so called observability teams because oftentimes, companies will split spin up observability teams as, like, the catch all of, like, you will take care of all things observability, including instrumenting code and creating dashboards. And I'm I you know, I was running an observability practices team at two cows, and I had to push back a lot because our team

13:16 kept being asked to instrument code and create dashboards. I'm like, this is not what we do. We we are the experts on observability. We'll tell you what practices you need to follow to instrument your code. We can give you guidance. We'll come up with the company wide standards because I think you need to have some sort of standard oversight around it. But, like, this is not a we do your work for you. You have to do your work for you. Like, a developer writes log statements. So what what's the difference between, like, that and adding, like, some traces

13:50 to your code? It it's just like it you have to wrap your your mind around it. Right? Because everything new, like, there is there is the tendency for resistance. There's the, oh my god. There's the learning curve. I have to learn yet another thing. It's not for a living. But then I look at it and, like, if your if your house is on fire, are you going to keep building the living room? No? Hope so. I sure hope not. Take the time to learn this stuff. Alright. Learn learn the observability stuff so you can save yourself. Save your house. Put out

14:25 the when you talk about observability teams, and it takes me back. Like, you know, I mean, you've been in this industry, like you said, for twenty odd years. So when DevOps was gaining ground, every company in Scotland was hiring DevOps people and DevOps teams, and you're like, hold on. You've kinda missed the the important message of what you're supposed to be doing here. You can't just stick it on a job title and say, problem solved. That's not what we're trying to do. Yeah. So there's Right. Like I said, it's an iterative process. So for with that in

14:51 mind and there's so many avenues we could go down here. Right? But let's kinda start from the idea that people are trying to do the migration to cloud native. I think it's very rare that people have they start with a microservice architecture or a cloud native architecture. They have something that's old, and they wanna modernize it. What does it mean then to instrument your code? You know, when we talk about adding traces and and metrics and hopefully logging already exists for these companies, I hope so hopefully, it's centralized. You know, this isn't 1994 anymore, but what is the starting point? And I

15:22 know there's ways to do this with auto instrumentation and manual instrumentation. So maybe we can touch on what is the the approach today in 2025. Yeah. That's that's a great point. And as you pointed out, there's there are two ways using OpenTelemetry, which is, you know, are the the CNCF standard. It's the CNCF tool, which has become, I would say, the de facto standard for instrumenting applications. It has the backing of of most of the major observability vendors. It's the got the second highest number of contributions behind Kubernetes. So, and it's a it it it's massive. It's

16:00 it's incredible, and it's, you know, very sort of thoughtful, community where, you know, there isn't this attitude of, like, one vendor. It reigns supreme over all of them. By design, it's like, let is vendor neutrality is the message. I work with competitors all the time. I don't see them as competitors. They're friends, and we're all working towards a common goal. So with instrumenting with OpenTelemetry, as you said, there's two forms of instrumentation. Right? There's the manual instrumentation, and then there's the auto or zero code instrumentation. And as you as the name implies, zero code instrumentation really means you are not touching

16:36 the code to instrument it. There's usually, like, some sort of a wrapper around your code. So, like, for Python and or Java, which I've worked with, there is, like, a Python or Java wrapper around your code, which then will inject the auto instrumentation for for your code. And especially, like, if you're using libraries that have been auto instrumented, like Python flask, for example, you don't have to, like, go in and and instrument your requests because that's already taken care of for you. So you've applied this wrapper and then magic. You send your data to your OpenTelemetry collector, which is a

17:12 vendor vendor neutral agent that is basically, like it it's basically an ETL tool. It ingests hotel data, transforms, and then spits it out to to us somewhere, which can be an observability vendor. And the magic it magically appears. Then you see your traces on there. And, you know, zero code instrumentation is a great starting point, I think, because, you know, it it is low effort. But you can get into some, a little bit of trouble with zero code instrumentation in the sense that, like, you can end up with more data than you bargained for. More data than is relevant to you.

17:47 Unfortunately, with zero code instrumentation, there is a way to, like, turn off, like for certain libraries, for example, hey. I don't want to instrument everything in this library. So you turn off the noise. And, you know, and then beyond that, I think you need to go in and start manually instrumenting the code because zero code instrumentation makes the decision for you. Right? Like, it deems like this is important for instrumentation. That is what's gonna happen. Manual instrumentation requires you to be a lot more thoughtful. Right? Because I think it's gonna be one of those things where it's kind of like creating

18:20 an SLO. Right? It's like you you take a stab and you keep refining it. Right? So you're gonna start, you know, adding traces in the same way that you, like, add a log to your to your code. Right? There you select, yeah, a chunk of code that your trace is going to apply to. And you can add attributes. You can add what's called span events, which is like a a log embedded into your span. And so that's how you go about it. And it really is a process. I I would say as the developers going along,

18:52 they can use the process of debugging their own code to sort of understand what are the pieces that would be important for for instrumenting. And I would say the same thing for metrics. I think for metrics, a lot of the time, things that are really important to us are things like how long is the span, and the span basically represents an operation or a process. So you wanna capture that kind of information, which can be derived from a span, by the way. And then you can create metrics for for other things that you may deem important.

19:24 You know, you've got your typical ones like your CPU and your RAM usage, And then other things like I mentioned, like, in our shoe example, like, how many how many shoes are we are we processing per month? And so it's gonna be, you know, it's gonna be that kind of process. You don't have to boil the ocean. I think when instrumenting code, especially when you're instrumenting existing code, by the very act of instrumenting existing code, you're basically introducing technical debt into it. It's just the way it is. Right? You're you're adding new code. You are it is code,

19:57 so you're probably opening yourself up to bugs, as a result. Right? Because you can you can goof up when you're instrumenting. And so you just have to kinda be mentally prepared for, like, what to expect when you're instrumenting so that you're not like, oh my god. I thought this was gonna be so much easier, all of a sudden, like, it's 10 times harder than you expected. Like, manage your expectations. I think that's that's super important. And then so there's that. Like, manage those expectations. But then also the other thing is, like, one sort of, like, quick win is

20:30 instrument your homegrown libraries and frameworks because chances are your code touches a lot of that. So now you're getting kind of like you're instrumenting a bunch of stuff, automagically just by doing that. And then the other one is any new code that you're writing, instrument that as you go along. In the same way that you would add log statements as you're as you're going along, just get into that habit. Like, you know, we we've gotten hopefully, many have gotten in the the habit of test driven development. Think of it as the same way. It's called observability driven

21:02 development. You're instrumenting as you go along. So if you forge that habit right away, then at least, like, the new code will be taken care of. I'm not gonna lie. It's it's not an easy process. It's not like a waving a magic wand. I guess auto instrumentation is more like that waving a magic wand, but there it'll only take you so far, and that's why you do eventually have to go into that manual instrumentation. So, like, first of all, let's clarify a couple of things. So OpenTelemetry, as you said, is now almost ubiquitous within this space. I think this is the standard

21:36 that everyone should be building on. It supports Go, TypeScript, Rust, Python. Like, I think any language that people are listening to this and thinking I write in this probably covered, and I I would assume at least pretty good support. Now there's the manual instrumentation. Let's focus on that. The audio instrumentation is there for people that want to experiment, but, you know, as you said, the value comes from you deciding Yeah. Where to create spans and events and all of this stuff. And we can get into the cost of observability as we kinda get into this

22:04 conversation, but the challenge is we're building distributed systems. We've got network traffic. We've got other services. These things we have to correlate as our request comes in through the front door and all the way through the system, and this is where OpenTelemetry and traces are so important. But we're seeing now that people can use this information to to spans within a single service. So this is function calls and probably just function calls, I guess. But even in in business defense, you talked about shoes, and I don't know how if that's a real example that you've seen

22:38 in in the world before, but, yeah, why not if you have an ecommerce store and met event information on business domains. Right? Because there's analytics within all this too that can be propagated up to some other system. So this is a a challenge, and I don't wanna ask like an impossible question, but when is too much. Right? Like, how do people work out what is important to instrument and what isn't? Is there a heuristic that you would say helps people be successful when they, you know, taking a new function and adding a span to it? Is

23:10 that always the right approach? How do I turn that off? And what is the cost of infinitely adding these to every single function within my application across hundreds of different services? And, again, sorry. That's a really tough question as well. Yeah. I mean, that's a lot to unpack, but I a really good point. I think, you know, I think the best way to answer the the quest it it's always it depends. Right? And I think the best way to answer the question, though, is to look back at the definition of observability, and I've been quoting this definition a

23:39 lot. It's from Hazel Weekly, and I think it's a great definition because it it encompasses so much. So observability allows us to ask meaningful questions, get useful answers, and act effectively on that information. If you're finding yourself at a point where you can't ask the questions, you can't get the answers from that, or you can't act effectively on the information. Like, any of these, any combination of these, then you need to go back to the drawing board and revisit your instrumentation. Now then you mentioned the cost aspect, and that's tricky. Right? Because the temptation is to, like, add distributed tracing

24:23 to everything, spans to everything, and that can get very costly, not just from a dollar sign standpoint, but from an environmental standpoint as well. Right? Because any of these things uses up energy, and, you know, data centers suck up a lot of energy. They they add a fair bit to the the carbon footprint, and it's only going up, especially when we consider things like AI. You know, it's a double edged sword. Right? Lots of compute. So and it's not just like you your application generating, the telemetry. It's also the ingest of the telemetry. Right? So whether or not you're sending it

25:06 to, you know, a homegrown observability tool that you, you know, like running, like, a self hosted, observability, setup on your in your own data centers or whether it's a SaaS tool. And even, like, as as a little sidebar, like, I I did a talk at at on, like, you know, examining, like, can we can we tune the hotel collector so that it it consumes less energy? And as part of the talk, one of the things that my my talk partner, Nancy Chohan, and I researched was we looked at, like, you know, depending on where your data center

25:45 is hosted, it can use up less energy. And fun fact, Sweden, the data centers there are, like, the greenest data centers in the world. So so, yeah, like, we've got, you know, we've got our dollar sign costs. We've got our environmental costs. For the dollar sign costs, I think it's a matter of being very mindful and effective with your observability data. Like, what are you sending over? And I think one of the ways, to do that is to limit what you send through sampling. Another interesting thing that I've seen suggested is to use feature flags.

26:24 So for example, if maybe you wanna instrument all the things, but maybe you have feature flags that that turn off, like, instrumentation until, like, things start going cock on. Maybe you wanna switch on those feature flags so you have that, like, extra bit of visibility so you can see, like, all of the things that are going wrong. And then once you've, like, figured out what the problem is, then you can shut off those feature flags and then limit what's what's being what's being admitted to to your observability back end for analysis. So those two things can can definitely help. I

26:56 guess it's easy for us to over miss the environmental impacts. I'm glad you brought that up and mentioned that. Like, it is really important. And, you know, you said AI, but, yeah, that is literally consuming the environment right now. So we need to be a bit more careful with these decisions. There's also the the dollar cost you talked about as well, and the one you mentioned earlier is the technical debt as well. Like, you know, if people are companies and teams are building libraries to do auto, you know, manual, but auto instrumentation for the downstream consumers,

27:24 they're automatically getting all of these spans and events that they don't really know exist unless they dig into it. So, like, yeah, there's so many considerations. There's really it's a hard problem for people to get right, and that the thing I like there, you mentioned the quote from Hazel is that, you know, if you're not asking these questions then, you probably shouldn't be instrumenting it, though then you're into a path of resiliency engineering. I don't know when things are broken and you ask a question, you work backwards and you instrument it and you move forward. Like,

27:52 I don't know. There must be some sort of golden path here, but that's not something we're gonna solve on on on a forty minute podcast. So we'll move on. Now this is great. I think, you know, people got a good idea of everything they're doing now. Right? Tracy's spans events, open telemetry. They're happy. They're like, yes. I'm gonna do this. This is perfect. It supports my language. I've got a collector, and I'm I'm laughing all the way to sleep now. But, unfortunately, there's still a lot of things to consider. Now there's so many databases where you can put this telemetry data. Right?

28:26 I think there's the temple project from Grafana. There's all the SaaS companies like Dynatrace and Datadog and New Relic. There's the Yeager. I don't even know what database they use over. They have their own database, but there's Yeager for the tracing stuff as well. When it comes to the technology stack beyond the specification of OpenTelemetry, how do people make the decision of where do I put this data? How do I visualize it? And, you know, is there a right or wrong answer to any of that? That's a great question. And I think it boils down to a personal choice. Right? Because

29:02 I think because OpenTelemetry is a standard, it means that all of these vendors that support OpenTelemetry are ingesting the same data. So now what differentiates one vendor from another is what do they do with your data that makes it useful to you, that allows you to ask those questions, get those answers, and act effectively on the information? And so it becomes a matter of personal choice at that point. Right? Because is it there might be a feature from a particular vendor where you're like, oh my god. This thing is blowing my mind, and I cannot

29:36 live without it. In which case, you know, it's kind of a no brainer. But you also, I guess, have to balance out with with cost. Right? Because some vendors are more expensive than others. Some vendors can be more expensive than others. If you don't do your sampling properly and and just, like, instrument all the things, that can add up a fair bit. So these are these are the types of of things to to consider when going with a vendor. The other thing that you mentioned that I've seen even in, you know, just use cases that I've read

30:10 or or, like, you know, I review CFPs for for KubeCon. I've done it a number of times, and it's interesting to see the number of proposals that come in that talk about and we use this tool for logs and this tool for for metrics and this tool for traces. And my thought is, you know, I feel like you're not getting the most out of your observability story here in doing so, because you don't have, like, a single data store, and you don't have not only a single data store, but a single place where you can correlate

30:51 all the data in one place. So, I think a lot of organizations end up losing out as a result, either because they're using a tool that doesn't, that doesn't support all three main telemetry signals, I. E, the traces, logs, metrics, or they've decided, you know, because some legacy whatever, oh, you know, like, we send our our logs to Elastic and our metrics go to Prometheus and then our traces go to Jaeger, and it's like, okay. And so now, you know, and and I'm gonna borrow a term from from my husband, which describes it perfectly. You're doing swivel chair

31:32 observability where you're swinging from one system to another back and forth trying to correlate this stuff, where you should be doing the put up put your feet up on your desk observability, where you should be able to see everything in the same interface, everything nicely correlated, you know, our single pane of glass observability. And that's, I I think, what we should be aspiring to, and I think the vendors that provide that single pane of glass observability and the organizations that embrace that. Because some organizations are using vendors that support all three, all three signals provide the single pane of glass that provide

32:14 the single pane of glass observability but are still, you know, forking different signals to different to different back ends. So I think those are the ones that are going to get the most out of their observer. Gonna ask this question in two ways. Let's start with the first one. We understand. I think it's it's very obvious now. Right? There is a cost. You can't just write every single Spanning event to a SaaS provider because the cost will be I I mean, it can be pretty pretty large. Have you seen in practice people taking a tiered approach to this where,

32:51 say, maybe they have, like, a a Grafana what do they call it these days? Loke, Grafana, LGTP. I don't know. Premi m m Yeah. Yeah. Oh, I think it's oh, not g m. So do people do that for, like, high frequency, high fidelity data that lives for an hour where they can sample it based on success rates and then push the outliers of anomalies to assess for them to do their magic. Right? Like, nobody's ever seen that Dynatrace and Datadog and New Relic don't offer exceptional product. It's just very expensive. Like, is there a way where I can where I

33:29 can just say, okay. I'm gonna write Yeah. Yeah. Everything here locally on a big chunky bare metal machine, and I'm gonna condense that down and send everything else and then take advantage of Dynatrace to understand what went wrong with the things that are really important. Like, is that something you've seen in practice? Does it work? And should people aspire to that in some capacity? I have seen so in my last job before I got into DevRel, that's something that one of my teams was trying to do when I was managing the observability practices team at.

34:04 We were looking at, basically, long term storage of logs, for example, and not doing it through the SaaS vendor that we were that that we were were paying. And we were looking at at, like, self storage of logs using, like, an open source tool to enable that for two reasons. A, it gets really expensive. And b, we wanted to make sure that we had that storage for for compliance reasons. Right? And that's that's another thing that organizations need to consider because I I think that's that's why a lot of the times they'll also want that long term

34:46 storage. Because I think some observability companies initially were like, we'll we'll only store your your data for for a few days because, you know, observability is in the in the here and now. We wanna kind of if the problem's happening now, that's what we wanna be able to troubleshoot. But, what if you wanna go back and look history, or as I said, like, for compliance purposes, you need to retain the data for whatever reason. It's where it gets very expensive, and then that's where you know? Because OpenTelemetry gives you that flexibility via the hotel collector

35:23 where you can basically, send your data to multiple destinations, then you can send your stuff to x SaaS vendor for your here and now analysis, and then your long term storage of, like, say logs to whatever, like, source tool and maybe something that compresses your data so that you're you're because you're still paying for storage. It's just internally in your own in your own stack. That that's definitely that's definitely an option. Awesome. Yeah. I think that's a good pattern. I mean, again, I'll I'll lead on my own experience here, and my open telemetry observability knowledge is

36:05 is very poor. Right? So you'll need to fill in some gaps here. But six years ago when I worked at Influx, it was just working with time series data. And one of the really cool things that we could do there is I could store samples every 10 and keep that for twenty four hours, then we we'd enter a down sampling loop, which is okay. That data was useful for twenty four hours. And if and after that up to three months, we only want the resolution to be maybe every five minutes because that average over that time is is

36:33 valuable enough to us. Is this something that's even possible with traces? Obviously, metrics that can be done because it's just, you know, vanilla time series data, but traces are very different. And I've heard of a term that I don't fully understand, so maybe you can fill me in as well. But people talk about exemplars, and maybe that ties into this conversation. I'm not really sure, but what's your thoughts on that? Oh, yeah. So exemplars, at least in the context of OpenTelemetry, is is about correlating, like, your metric to your trace. And that one's kind of an interesting one

37:09 because it's only been fleshed out for I think it's been a while since I've checked it out, but for Java and experimentally for dot net and nowhere else. Wow. Okay. Maybe a bit too soon then to be kicking the tires on that. But That's still that's still a work in progress. That's still a work in progress. But, yeah, that's kind of an interesting one. But, yeah, that that's my that's my understanding of of exemplars from what I've seen in hotel. But, yeah, I mean, in terms of of, like, tweaking the granularity of traces, I think

37:41 it's it's really, I guess, a matter of, like, do you store all the traces or some of the traces? So then I think it becomes kind of a sample What about them with the dimensionality or cardinality of these traces? Like, you know, I think charity majors have been vocal about how we should have lots of different properties with on a trace, revealing as much information as we can. But that feels to me something that gets very costly very quickly as well, and the value of that data three months from now might not be as high as it was three seconds after

38:14 the actual trace. Yeah. I was doing I thought yeah. Yeah. That one's an interesting one because I I, like, I I think she's right in the sense that it's it's good to have as much information about our traces as possible. But then, again, you run into the it's a it's a data storage problem. And and I think, like, different also different observability vendors will charge you on the data differently. So maybe, like, having high dimensionality might be cost effective if you're working with one vendor, and then you switch over to another vendor, and all of a sudden, it's

38:55 like, your costs have blown up. So that's the other thing, to consider as well. Like, it's you know, you're you're still capturing the same data, but do you need to tweak your data? Because now it's it's not so pocketbook friendly. Right. I'm sure that people will listen to it. I don't know if we're we're helping people. They're scaring the absolute crap out of them. I'm just confusing them. Yeah. It's like, it depends. I'm sorry. Alright. Well, let's just tack a little bit. I swear observability is worth it. Let's let's just tack a little bit. Right? I mean, I think

39:29 you've done a great job of explaining observability, open telemetry, and, you know, all the bets and they're not symbols, the lexicon that people need to understand. But you have been in developer advocacy now for a while, and I'm curious, you know, what are some of the lessons learned that you've seen as you have personally adopted OpenTelemetry or see or spoken to other people doing it. Right? Like, how can people be successful? Do you have any tips, like or even just general advice of how to get started beyond NPM install or go mod get or whatever

39:57 it they're using? I think, first of all, for learning OpenTelemetry, the best way to learn is based on how you learn best. Right? We actually have a a series on as part of the OTELN user SIG called OTEL ME, and we have different OTEL practitioners come on and talk about their experiences with with using OpenTelemetry. And we had a a guest last week, he got into OpenTelemetry through Outreachy, and he was talking about, his journey into OpenTelemetry. And he said for him, like, he's a visual learner, so he really craved having, like, those videos explaining

40:42 how things worked for him. And then the videos kind of gave him enough of an overview where he's like, okay. These are the topics that I wanna dig deeper into. And then he dove into the docs, into the hotel docs, for more information. Obviously, I think in in an ideal world, I would love it for everyone to go to the hotel docs, as their, you know, one stop shop for all things hotel. But, you know, docs aren't perfect. I think the folks running the hotel docs team are fantastic and are doing a really great job

41:13 in in, you know, constantly improving the docs. But I think in in some cases, some people don't find the docs useful enough as a as a starting point. So then they'll you know, I I honestly, Google is your best friend. And and will you know, there's, like, so many people writing about OpenTelemetry, from various walks of life, whether it's, you know, blogs from observability vendors or personal blogs. Like, I myself documented my own open telemetry journey. I was, like, learning in public as I was, you know, I was managing this observability practices team, And I'm like, oh, damn

41:51 it. I don't know anything about observability. I'm gonna learn, and I'm gonna blog about it as I go. And it's I guess DevRel was, like, perfect for me in that sense. I I think, like I said, Google is your is your best friend because there's, like, a wealth of resources from videos to blog posts. I think having these good overview, either videos or blog posts to sort of give you an idea of what those base concepts are, for OpenTelemetry are great because then you can use that to, like, as as did in his journey, dig deep,

42:27 you know, go into the docs and dig deep on a on a particular topic. So it sort of helps to direct your learning journey. I'm also gonna do a shameless plug here. I do have an O'Reilly video course on observability with OpenTelemetry. So if you have an O'Reilly subscription, you can check it out if you're if you're a visual learner. So, yeah, I mean, tons of resources. There's, you know, also, I think my my former colleagues, Ted Young and and, Austin Parker have a great book on OpenTelemetry as well. There I mean, the sky's the limit. It's

43:00 it's a matter of, like, what what are I I think we just I think having a good getting started resource of, these are the main concepts, and then these are the things that I need to dive into is is really the moral of the story. And it's not a shameless plug if it's super valuable to people. So I'll make sure all these links are in the description for people to click on and make it easier. And lastly, obviously, you are a CNCF ambassador. You are an hotel contributor. You're in this space. What if someone is listening to this and going,

43:31 you know what? I wanna help. I wanna join this this mission to make OpenTelemetry easier for people. How can they get involved, and how can they contribute to the project? Amazing. Yes. Great question. So I'll I'll send you afterwards. I have a blog post that I wrote on how to contribute to OpenTelemetry from my viewpoint as as someone who was in those shoes. But in a nutshell, basically, you know, first of all, join CNCF Slack. In CNCF Slack, there are gajillions of hotel channels. Like, they all start with hotel dash. Pick an area of OpenTelemetry that interests you.

44:10 So for example, you wanna learn more about the hotel SDK or maybe, like, you are a Pythonista and you would love to contribute more to OpenTelemetry Python. Join those channels that interest you and just, you know, monitor the conversations. Just just be an observer. Fly on the wall. Join join the SIG meetings. You don't have to be actively involved. You can just sit and observe. If you're looking for a more active involvement but are afraid of, you know, touching code at this point, not not because you can't code, but because contributing to an open source project can be

44:47 overwhelming, a great place to get started is always in the docs. Because as I've said, you know, I'd love I would love for the hotel docs to be, like, you know, the the book of record for all things hotel. And the only way for that to happen is to have people, who have used OpenTelemetry and have found a gap in the docs and and have made and and make, an effort to contribute back to the docs. Like, for example, I was doing some research for, for my KubeCon talk on on, like, the hotel the greener hotel collector, and and I

45:26 was using this tool called the OpenTelemetry Collector Builder. And I went to the hotel docs for some guidance on it, and I got stuck. And, again, Google was my friend. I reached I phoned a friend, got some help. And then I'm like so and then I wrote a blog post about, like, what I did. But then I'm like, well, I wanna be a good open source citizen. So I actually made a point of contributing back to the docs with the stuff that I learned so that people wouldn't be stuck like I was as well when they when

45:57 they go back to the docs. And I'm happy to say the PR was merged last week, so that always makes me happy. So it's such a great way. But, you know, bottom line, it's such a great way to contribute to OpenTelemetry in that way. Joining the hotel end user SIG is also a great way. Even if you're not necessarily an end user per se, it's a great way to contribute because we have tons of things that we run as part of the SIG. Like I mentioned, we have the this Otell Me series, so we're always looking for people

46:27 to to contribute, to interview. We have Otell in Practice where, if you have an interesting OTEL topic that you want to talk about and want to present on, maybe want to test out a talk. You wrote a talk, you want to flesh it out, use this as a guinea pig. So you can you can join that. We're we work with the SIGs to run surveys, so we liaise with the SIGs. And and so we've got a couple of people who joined recently who have taken on that mantle of, like, really, streamlining our survey process. There's always stuff to be done. And

47:07 so, if you're looking for a way to contribute, that's that's a great way to get started. So Fantastic advice. Alright. I think that is now us at time. Do you have any last words for the audience before we say goodbye? I would say, you know, don't be shy about contributing to open source, especially OpenTelemetry when you're submitting your first PR, don't be don't don't feel overwhelmed because everyone has been nothing but nice for in to me and and to others that I've talked to since since contributing to OpenTelemetry that the comments are always thoughtful. No one is ever aggressive or

47:47 rude, so it makes me wanna contribute more. So don't be afraid because this is honestly, like, a wonderful community. And if there's there's one CNCF community that I recommend that you join, and, of course, I am very biased, I I would say definitely join OTEL. We are Well, it's been an absolute pleasure. Thank you so much for your time. Thanks for having me. Thanks for joining us. If you wanna keep up with us, consider subscribing to the podcast on your favorite podcasting app or even go to cloudnativecompass.fm. And if you want us to talk with

48:23 someone specific or cover a specific topic, reach out to us on any social media platform. Until next time when exploring the cloud native landscape on 3. On 3. +1, 23. Don't forget your Don't forget your compass.

Meet the Cast

David Flanagan

@rawkode

Laura Santamaria

@nimbinatus

Adriana Villela

@avillela

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Documentation

OpenTelemetry docs

Build a custom Collector with OpenTelemetry Collector Builder

Additional Resources

Fundamentals of Observability with OpenTelemetry (O'Reilly video course)

Thinking about contributing to OpenTelemetry? Here's how I did it.

Learning OpenTelemetry by Ted Young and Austin Parker (O'Reilly)

More from Cloud Native Compass

View all 23 episodes

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Flatcar Linux: A Modern OS for the Always-On Infrastructure

Flatcar Linux: A Modern OS for the Always-On Infrastructure

Platform Engineering: Asking "Why"? with Evelyn Osman

Platform Engineering: Asking "Why"? with Evelyn Osman

AI-Augmented Programming

AI-Augmented Programming

The Future of Sustainability in Open Source

The Future of Sustainability in Open Source

Atlantis: The Terraform Automation Powerhouse

Atlantis: The Terraform Automation Powerhouse

More about OpenTelemetry

View all 4 videos

Cloud Server-Side WebAssembly

Cloud Server-Side WebAssembly

Microservice Troubleshooting, Built for Developers

Microservice Troubleshooting, Built for Developers

Hands-on Introduction to Quickwit

Hands-on Introduction to Quickwit

More about Jaeger

View technology

Hands-on Introduction to Quickwit

Hands-on Introduction to Quickwit

More about Prometheus

View all 26 videos

Flatcar Linux: A Modern OS for the Always-On Infrastructure

Flatcar Linux: A Modern OS for the Always-On Infrastructure

Hands-on with Headlamp: The Kubernetes UI

Hands-on with Headlamp: The Kubernetes UI

Hands-on Introduction to Perses

Hands-on Introduction to Perses

More about Grafana

View all 20 videos

Build a Production-Ready Kubernetes Cluster with Spectro Cloud Palette | No-Code Tutorial

Build a Production-Ready Kubernetes Cluster with Spectro Cloud Palette | No-Code Tutorial

Hands-on with Qovery

Hands-on with Qovery

Hands-on Introduction to Quickwit

Hands-on Introduction to Quickwit

More about Loki

View technology

Hands-on Introduction to Loki

Hands-on Introduction to Loki