About this video
What You'll Learn
- Load NYC taxi rides into TimescaleDB with parallel copy, then verify counts.
- Query ride counts by day and hour using date_trunc and grouping.
- Analyze trip data geospatially, then visualize time series in Grafana dashboards.
Avthar Sewrathan explains how TimescaleDB packages Postgres with hypertables, compression, and continuous aggregates. We launch a Forge instance, load NYC taxi data with parallel copy, then run geospatial queries and visualize in Grafana.
Jump to a chapter
- 0:00 Holding screen
- 0:50 Introductions
- 0:51 Introduction to TimescaleDB & Guest
- 2:35 What is Timescale? (Slides)
- 2:50 What is TimescaleDB? (Overview & Benefits)
- 5:42 Examples of Time Series Data Use Cases
- 7:33 Understanding the Hypertable Abstraction
- 13:30 Launching a cloud Timescale instance with Forge
- 19:00 Installing psql and pgAdmin4
- 24:30 Enabling the Timescale extension
- 25:30 Connecting to Timescale with psql
- 30:10 Importing the New York taxi data
- 59:00 Querying our data
- 1:10:21 Developer Productivity & Postgres Ecosystem
- 1:11:58 The Value of SQL for Time Series
- 1:13:27 Getting Started with Timescale Cloud/Forge
- 1:13:40 Advanced demo of geospatial time series data with Grafana
- 1:16:01 Connecting & Database Setup
- 1:19:41 Loading Data (Troubleshooting & Parallel Copy)
- 1:59:04 First Queries: Basic Time Series Analysis (Count)
- 2:04:08 Analyzing Data: Average Fare & Relational Joins
- 2:07:32 Advanced Analysis: Geospatial Data (PostGIS)
- 2:16:46 Visualization with Grafana & Advanced Functions (time_bucket, gap_fill)
- 2:25:08 Conclusion, Resources & Part 2 Discussion
- 2:31:35 Farewell
Full transcript
Generated from the English captions. Timestamps jump the player to that moment.
Read the full transcript
0:51 Introduction to TimescaleDB & Guest
0:51 Hello. Welcome to today's episode of Rawkode live. I am your host, Rawkode. Today, we're gonna be taking a look at TimescaleDB, a time series database that is a little bit different from what we may be accustomed to. Timescale is a relational time series database, which brings with it a lot of flexibility, value, and power that we're gonna take a look at today. No. I'm not gonna do this alone. I am gonna be guided. And for that, I am joined by Aftar Serrafan from TimescaleDB. Hi there. How are you? Hi, David. Thanks for having me on the stream. It's a
1:26 pleasure to be here. The the pleasure is all mine. I'm really looking forward to to playing with Timescale and just showing how interesting and cool Timescale is in general, but even how cool are some of the databases are that we're getting coming out nowadays as well. So would you like to take a few moments just now to kind of introduce yourself, tell us a little bit about you, and then we'll talk about Timescale itself. Fantastic. Well, hi, everyone. I'm basically streaming in today from New York City. I'm Afthar. I'm a developer advocate here at Timescale,
1:58 and I'm originally from South Africa, but now live in New York City. I'm super interested in using technology to empower people. That's really why I enjoy being a developer advocate here at Timescale. You get to learn new things all the time and explore a bunch of different use cases and and different projects, some of which we're gonna get into in the stream today. And as I mentioned, you know, part of my job is learning new stuff all the time. So you can, follow me on Twitter or my personal website is offthora.com if you'd like to learn more. But yeah,
2:28 excited to get into TimescaleDB today with you, David. Awesome. Very, very cool. Alright. I believe we're gonna skip the chat around what it is and you're gonna walk us through some slides that you're prepared just to kinda give us a little bit of a overview and some flavor on what Timescale DB is. So let me pop those up onto our screen. Please take it away. Fantastic. So I just wanna give a a short intro to what is TimescaleDB, keeping in mind that, you know, some people may be familiar with time series databases in general, whereas other people that might actually be a
2:50 What is TimescaleDB? (Overview & Benefits)
3:03 foreign concept. So let's just take an overview of of what exactly is TimescaleDB. The simplest way to think of Timescale is that it's Postgres with time series superpowers. A lot of our users describe Timescale as times Postgres for time series. And there's really what what that really means is that Timescale is a relational database packaged as a a relational time series database packaged as a Postgres extension. And this is in contrast to a lot of other time series databases out there, which are traditionally no SQL. So for example, influx DB, which is something that David has
3:40 worked with before and knows very well. And some other databases that are used for time series data, things like MongoDB and Cassandra. So there's sort of six main things that I wanted to introduce you to around Timescale. So the first one is that, you know, Timescale is built on top of Postgres. And so that means the same Postgres that people might be familiar with from just storing relational data, it all the same features are available and Timescale basically runs on top of that. You inherit the whole Postgres ecosystem with all of its reliability and tooling and
4:18 and and community that it's built over the past twenty to thirty years or so. The other thing to keep in mind with Timescale is a lot most the reasons why people gravitate towards it is because Timeseries data requires performance that, traditional databases can't really give. You know, when you're dealing with Timeseries data, it's a lot of data coming in, constantly from, the devices that you're monitoring or whether it's some sort of deployment. There's a like a DevOps use case as well. So you wanna pick a database that can actually give you the kind of performance
4:54 both in ingest and querying. And so Timescale is is a really good one to use for that. And the other reason is, you know, once you reach a certain scale, that's kinda why Timescale has a scale in the name. Once you reach once you reach a certain scale, you're gonna require a database that can actually keep up with, in in our case, supporting terabytes or petabytes of data. And then the last three points is just, you know, it's it's actually easier to have this relational and time series data together. And we also have advanced time series features,
5:28 some of which we're gonna get into things like compression and continuous aggregates and automated data retention policies so that it's not just about performance, but it's about just working with time series data in general. Some other things I wanted to mention, you know, those people who are unfamiliar with time series databases, a good way to think about it is that, you know, time series data is data that show that that represents how things change over time. So four examples of this that I just wanted to introduce to you is the first one is of DevOps monitoring.
5:42 Examples of Time Series Data Use Cases
6:03 So for example, I just have a SQL query on the screen here that shows the change of memory consumption for containers in a Kubernetes cluster over the past ten minutes. So that's an example of something that you'd wanna wanna monitor with Timescale. Some other examples are things like application metrics. You know, you're looking at, you know, user behavior on a website or in an app each time a user performs some sort of event, that event stream is is time series data. And another example is the Internet of Things. We're gonna take a look at two demos later
6:36 on. Any devices that you that you may use, for example, smart devices, anyone who has an Alexa or a Google Home or something like that. And then traditionally, the traditional example of time series data is financial data, stock prices. Those of you who are familiar with, you know, GameStop and and stuff like that, all that stuff that's been happening recently will realize that, you know, as the prices go up and down, that's time series data for for finance. Okay. And then I have some slides here about, like, what makes Timescale a bit different, but maybe we can save that for later
7:11 on or I just wanna give you a chance to jump in here before I just, you know, carry on. But our this is about Timescale's major abstraction called the hyper table, but maybe we can deal with this later on in the stream. Yeah. Whatever you think is best for for people to be aware of just now before we start kicking the tires on this. So, you know, if you wanna cover it, please feel free. Otherwise, we can come back to it later. Okay. Well, it might just be two minutes. I think that the reason why I'm bringing
7:33 Understanding the Hypertable Abstraction
7:37 this up is, you know, a lot of people, it can be difficult to understand, you know, how can a relational database scale for time series data. And so that's why I want to introduce the major abstraction of TimescaleDB, which is this thing that we call a hyper table. So, you know, one of the benefits of using Timescale is that it allows you to ingest these get really high ingestion rates on your data. So in this case, millions of data points a second. And the way that we do that is to do this abstraction called a hyper table,
8:10 which is you can kind of think of so so this is how the the hyper table works. So what Timescale has is automatic partitioning in both space and time. So if you think of your traditional relational table, what we do is we have these different subtables, which we call chunks. So in this case, you know, you have data coming in in time, but you also might have different kinds of data maybe from different devices or different locations or something like that. And so what we have is each chunk represents both at a point in time as well
8:52 as you can also partition in space, which is, you know, the the the type or the location. And so these chunks are kinda like subtables. And so what happens is that the these because you're dealing with these subtables underneath and not just one large table, the time that it takes to actually query data and ingest data is shortened quite a bit. It's it's sped up quite a bit. But as a user, you're still dealing with just one abstraction of a of a single table when in reality, as you can see, there's actually, in some cases, hundreds
9:29 or, you know, even more of these subtables underneath. So that's like the main abstraction. And I think what that allows you to do is to make what's called wide and shallow queries as well as narrow and deep queries. So wide and shallow queries, can think about, you know, you wanna understand what is the state of all my devices in the past five minutes. So here, you just wanna look at this latest time period and you can see like, okay, it's these three chunks right here. But if you wanted to do what's called a deep and narrow query, which is like, you know,
10:02 for a sudden location, show me the history going back one year or two years, that's just for example, if I'm just looking at the blue location, then that's gonna be this middle row right here. So you can see you can that's kind of to give you an intuition about why you get this kind of performance gains when you do use TimescaleDB. Then I have some slides here about, like, developer productivity. It's basically the showing the fact that, you know, a lot of people are can be confused that, you know, time series timescale is built upon Postgres.
10:32 So what does that mean? It basically means that, you know, for all your relational queries, it uses the Postgres query planner. And then for time series queries and stuff that use TimescaleDB special functions, we have modified the query plan in such a way that it then uses TimescaleDB for those things. So all the reliability and all the kind of backup and tooling that you associate with Postgres, all that stuff still works. The only thing that we modified is how you deal with time series related queries and and functions. And And as a result of being built
11:09 on Postgres, it basically means that everything that you're used to in the Postgres ecosystem, whether it's visualization or ingestion tools and stuff like that, that all that still works. You know, whatever works with Postgres works with Timescale. And so all these things, I'm gonna do a demo later on using Grafana as an example of a visualization tool. But, you know, all the things that you're used to ingesting and visualizing data with, those all work with TimescaleDB. And this is just a last thing to give you an idea of the amount of people who are using it today. You know,
11:40 sometimes it's when you have a new software that you're evaluating, it's good to get a sense of, like, where else is it used in production. So we have over a million active databases right now and used at places like AppDynamics, Cisco, Samsung, IBM, all those all those places. So but that's just just an overview for you to get a taste of Timescale. I know we're gonna get stuck in in just a moment into the actual code of it. Nice. Awesome. I think I would just I I think I've got two points now. Right? Got it.
12:13 One one is, you know, I love it when I see examples of, you know, this is the the type of time series, you know. And that's great because we show people something that is applicable within their context, whether they're from financial background or software engineering or, you know, SRE, etcetera. Yeah. But what I always love is just that there's not a single second of this any given day that anybody on this planet isn't surrounded by time series data. Literally, everything in Everything. Humidity, temperature, rainfall, precipitation. All of this stuff could be tracked and monitored in time series database. Even, you know,
12:46 exchange rates when you buy a cup of coffee. So I I love Timescale data. And I'm really really looking forward to playing with us today. And also, a second thing I wanted to mention is that, you know, I do have a lot of experience in Timescale, you know, formally working in the Prometheus community and then moving to influx. And one of the biggest points of frustration or questions I always would always get is why can't I just use query languages I'm familiar with? And this is one of the biggest selling points I feel for for Timescale is that
13:14 it's built on a backbone of an open source database that many, many people are familiar with using query languages and tools that many, many people are familiar with. So I'm very excited, and I think we should just start playing with Timescale. Let's let's get stuck in. Yeah. That sounds good to me. Alright. So let's see. Let's get my machine's screen shared. Alright. So here I have the Timescale website. Typically, what we'll do is we'll just click on the docs. We'll find some sort of getting started guide. There's always a getting started guide and try and just cut through some of
13:30 Launching a cloud Timescale instance with Forge
13:53 the basics that people need to make and start their own Timescale journey. So I think the first thing we really need to do is to install Timescale. Now what's your recommendation here? Do we go with a docker, a container image? Has there brew packages? What what would you do? Yeah. So there's a couple ways to get started. I think depending on on what you're most familiar with, Docker is is usually good. The one that we recommend to people, obviously, because it's the easiest to get started is using Timescale Cloud or Timescale Forge which is a hosted
14:31 option. You know, there is a a free trial that you get $300 in credits. So, you know, if you just wanna check it out or if you're just using it for a small project and don't wanna actually pay to to upkeep it, you know, if you're if you're not running in production or or seriously or if you're just kicking the tires, then that's that's also another option as well. Well, I do love to I do love showing off cloud products when they're available. Do I need to enter a credit card detail or will it No. So yeah. Actually, I'd
15:00 recommend choosing the Timescale Forge. So let me actually explain a little bit about why there's two options there. It's funny because when people usually kind of sign up for Timescale, it's just through the the front page of the website and and the docs kinda need to get updated a little bit. That's showing that, you know, we're a bit behind on on updating ourselves. But basically, difference between Timescale Cloud and Timescale Forge is that Forge is a newer product that we're building that is gonna have much more deep integration with things like you mentioned Prometheus before
15:36 as well as just a more rich experience for users. It's a it's a bit of a younger product. Whereas Timescale Cloud is a product that's been around for almost two years now and that just has more regions and more configurations for folks who want more flexibility, whereas forge is just, you know, instances based in The US. So that's the that's the the difference there. Alright. So let's try forge. Where else is it? Oh, well, we'll go with it. And I'm not these things. K. Let me just pop that open over here. Copy link. Oh, so I have Timescale now. Right?
16:56 Okay. So the page that you're seeing now is just the the direct page to create a database. So you can just go ahead and and enter a name and select a configuration from there. Alright. So we can just say this is Rawkode Live. I mean, I'm assuming for what we're gonna be playing with today is the smallest instances. It's probably okay. Yeah. Yeah. That should be fine. 25 gig of data. I'm sure that will be cool. Sure. I'll save that over there. That's me now showing the world two passwords, but I will remove this current and change the
17:40 password afterwards. Yeah. Alright. So what we get now is just our pretty standard postgres connection line. Does does that just to clarify. Right? Does that mean I can use any postgres compatible GUI locally? Yeah. So I actually use p g admin. Later on, I'll show you my p g admin screen. But, yeah, you could use whatever you want that connects with Postgres to interact with Timescale, whether it's good old p s equal, which is the the terminal based UI for Postgres or stuff like p g admin, d beaver, you know, whatever whatever takes your fancy. So,
18:22 you know, you get to choose what tools you wanna use, and it's gonna be compatible. Excellent. Definitely a major one. Now what about this multi node connection info? Is Timescale horizontally scalable? Yeah. So in the in the most recent release of Timescale Timescale two point o, we've introduced horizontal scalability or multi node capabilities. You know, one of the original criticisms of Timescale when it first released was that, you know, it really scales well on one node. But what happens when you wanna get into really large scale production deployments where you have, you know, these 20 node clusters dealing with,
19:00 Installing psql and pgAdmin4
19:05 you know, billions of data points every every minute or so. And so that's that's what we've just actually introduced that. So that's a new feature on Timescale Forge. And, yeah, if you need that, I think there's one or two tutorials on YouTube that takes you through exactly, you know, like, how the multi node connection info is different from the general connection info. But for the sake of simplicity and not confusing folks at home, I'll point you to that to to learn more. But That's just the information that you'll need to get the multi node set up up and running.
19:40 Okay. Awesome. So I guess we should go back to we've we've got our instance. We had that little pop up that says it's running. So now we can go back to again, Startup Guide and I'd recommend the one that might be good to do is under tutorials. It's getting started hello timescale because this current getting started guide kinda explains the stuff that I've been telling you that I explained in the slides already where it's kinda like the abstraction of hyper tables and what is TimescaleDB, whereas this one is actually a really nice tutorial. I worked on this myself
20:18 about using a real world dataset and analyzing it using TimescaleDB. In this case, it's about taxis in New York City and analyzing their ride behavior and and stuff like that. So it's actually quite fun. Okay. Let's do the NYC data then. So do I have to download this file? Yeah. So the first step is just to download the dataset. Alright. Sure. That won't take too long. Yeah. Ten seconds. Okay. Cool. So let's see. Prerequisites, this tutorial. Yeah. Okay. I probably know enough SQL to be dangerous, and I'm sure you'll keep me right as we go. So Yeah. Yeah. It the
21:05 SQL is actually spelled out quite what is it? In in detail in the tutorial. So even if you're a beginner, you'll still be able to understand, you know, what's going on. Okay. So how does this want me to interact with the cluster? Do I need to install a post course client? So what you can do is you can use p SQL, which I think is installed by default on a on a Mac. But, yeah, basically, you'll need some sort of post course client in order to complete this. So whatever you'd you'd like. Right. Computers are no on that one. So Oh,
21:45 no. Okay. Let's just pull up the brew page. See if it's got a client on its own. Oh, it should install. Postgres. Yeah. So it should come with that. Maybe just update. And what was the name of the GUI you also recommended there? Another one that I like is called PG admin. It's kind of the more old school database administrators one. I'm not sure if there's a proof for that, but Yep. P g admin four and also install Postgres. And I think that gives us a better that gives us a few options with how we wanna work with it. So I'm gonna
22:31 Exactly. See both. Yeah. So, hopefully, this is pretty quick. Is p g admin the web one or is that an actual Mac app? So it's a it's a browser based one, kinda like, you know, if you've you've you've used Grafana and stuff before, it just opens up a web browser or web page in your browser, and that's how you interact with it. Yeah. I can I always remember p g you know, PHP my admin from back in my PHP days? That was Yeah. Yeah. Yeah. Back I've developed a couple of websites using that very rudimentary
23:10 stuff when I was still learning, like, web development a few maybe six years ago or so. Yeah. Alright. So I'm sure this will be very quick. So why don't we pop over here just now? Yeah. So the first thing I want us to do is just Go ahead. Oh, yeah. Feel free to interrupt me. Was just about to say there's one for folks who are seeing this and, you know, who who have no idea about how to use P SQL, which is one of the things that you might use later on. There's actually a tutorial
23:39 in the docs about how to install and use P SQL. So that's, know, if you're a total beginner or just wanna wanna refresh the basics, that's there as well. I think if you just search p SQL on the docs, you should should find it. But that's there to reference. Here we go. Yeah. Hey. So this looks like we've still got a minute to go. So you say PCQL, I say SQL. I guess people will be wondering what the difference there is. And that's the I think the short answer is this technology. We all pronounce things differently,
24:14 and that's okay. Exactly. Not one for these dogmatic wars and correct pronunciation of technical terms. I think it's silly. Okay. So we're gonna create a database called NYC data. And so this is how we interact with Timescale. We have to enable an extension. Yeah. So because you're using Timescale's like hosted instance, that's actually been already done and it's been enabled by default. But for folks who are using like Docker or just installing on bare metal on your on your computer, you just have to enable that extension. That's the create extension. What is it? That command,
24:30 Enabling the Timescale extension
25:03 and that'll that'll enable TimescaleDB on on on your Postgres instance. So that just just so I understand how this works then. Like, if I don't enable that extension, I'm I'm really just working with the native Postgres and its query planner and such. But on a database level, I can enable that extension, which brings in the custom query planner and the custom storage layer. Is that right? Exactly. % correct. Cool. That'll be the only time you see that today. Everything else, I'm gonna get completely wrong. But alright. We now have those things installed. So, yep, we have PSQL
25:30 Connecting to Timescale with psql
25:39 and let's see. The admin is running on a port. Who knows? It's been a while since I've installed all of this. I forgot how much trouble it can be. Alright. I'm just gonna use this because I'm I think I'll be able to pull enough of this back out of my head to make it work. So let's see if we can connect. Right? So here is our line. I'm sure it's the s h. Well, this has got credentials on it too. Right? Yeah. You what you will need to do is after the TSTP admin, you will need to put in your password.
26:28 So it'll be a colon followed by your password, or you can I think if you do this, then it might prompt you for your password afterwards, which you can enter it? So that might be the safer way. Yeah. Okay. So let's not make it up. Let's do that. I won't make it up. We'll do that myself. So h s host. But we actually have more than a host. We have a full connection string. Yeah. Yeah. So Let me just double check how to how to actually connect. Alright. So I don't see end of an obvious there. So I'm also gonna take a
27:07 look at p g admin four and see what the port number is there. That may just be running there in the background quite happy. So if I were gonna deploy and install oh, no. Mac. Nope. Container. Let's see what port it runs on. So I think if you remove the the dash h there and just put the it's the the service URI afterwards, it should work and take out the that send me the quotes at the end of it there. I didn't like that. That's what I just tried. Oh, I didn't? I think it would need to yeah. Okay.
27:55 We need to close because of, you know, the Oh, because of the string. Yeah. Okay. Cool. Alright. So now I need my password, which I kept over here. And then there we go. Success. Okay. And then if you just do a backslash d x, that will show you that TimescaleDB is actually installed on this Postgres on your Postgres instance, on your TimescaleDB instance. And, you know, you can actually verify that, you know, you're you're good to go from here. Okay. Is that just shorthand for show databases? So x is the extensions. Showing your databases will be
28:36 I think it's now you're testing my PC called knowledge. Let me double check. Let's consult Google. Yeah. I have no idea. It's been too long for me. Okay. I think show databases is with my influx command. Yeah. It'll just be backslash l, like list. Yeah. So then you should see a databases. Postgres is the database by default. If you want to oh, I'm sorry. So Postgres is the default database, and then the the database that you'll interact with is called TSDB. The last one at the bottom, which is TimescaleDB. To do that, to choose TimescaleDB,
29:28 you do backslash c, which means choose and then TSDB. I'm just making up commands. It's all good. You might as well try try try what you know. Most of the time, it actually works. So yeah. Yeah. I wonder if that's just MariaDB slash MySQL, the show and use syntax. Maybe it is okay. Yeah. So now we are using our TSDB. So we wanna be able to will there be any tables that are created or is that something we know we're gonna do? So that's something we're gonna do in in step one here. So if you just go back to the
30:02 tutorial. So this actually create database thing. Since we already have a database and we've enabled Timescale, we can actually skip that step and just move move right ahead. Alright. So we have our SQL file and well, yeah. I'm assuming there's a CSV on it too. I've not extracted it yet. So why don't we do that? And notes. Okay. We've already got our connection working. Oh, the command was right there in front of us. I just wasn't going through the It's okay. It's it's good to be impatient sometimes. But, yeah, that's the that's the command that we figured out.
30:10 Importing the New York taxi data
30:43 Alright. We've done that. It's then we did the d x command. So we've seen the extensions, and now we're gonna need three tables. So we're gonna have a hyper table, which we've already spoken about in the slides. If anyone has any questions and you're watching, please feel free to drop them in the comment section. We will tackle as many of them as we can as we go. We're also gonna have a regular post gross tables. Again, this is one of the superpowers of Timescale is that we have the choice to work with relational data and time series
31:12 data. And that's gonna store payment types. So we're using that as like a I guess, like a metadata store and then another regular postgres table called rates. Okay. And I'm assuming that's just for some sort of calculations. Yeah. We're gonna we're gonna be using all the tables later on. But, yeah, basically, the the the the hyper table is gonna be storing the time series data. So every time a ride takes place, it'll store the time of the that that ride took place, so the start time and the end time. And then all sorts of information about the ride,
31:49 which will then be associated with the other relational tables. So you have, like, rates and I think there's, like, ride types or something like that where those are just IDs to stuff in the relational table. So that's it's yeah. As you mentioned, just metadata and and descriptions in the relational tables. Okay. So this seems to create, I guess, as rates just a regular table and then a hyper table is a I don't wanna say view because I think that word is overloaded in this context, but maybe you could try explain what this create hyper table is is doing.
32:29 Yeah. So basically, the way you create a hyper table is that you first just create a relational table with all the with the schema that you want. So in this case, we have all the things like let me actually just open this up here. So we have all the things like the the pickup date time and passenger count and all that kind of stuff. And then what you do afterwards is run the create hyper table command, which actually will convert your regular postgres table to a hyper table. And then you can perform all sorts of functions reserved
33:06 to be performed on the hyper tables on that. So there's some advanced functions like data retention policies, down sampling, and things that can only be performed on hyper tables. And so that's just to distinguish which tables are regular tables and which tables are actually hyper tables. And is there any default, you know, time to live or data exploration policy when you create the hyper table or is it all infinite? So yeah. Everything is infinite. Yeah. There's no time to live and there's no data exploration policy. Those are things that you can actually set up on a on a per hyper table basis
33:42 or there's actually a way that you can set up a custom retention policy for all your hyper tables, but that's only if you if you'd like to. And for those people who are interested in that, the timescale feature related to that is called user defined actions. And the the actual stuff to do with data retention is under the automated retention policies. So, you know, for folks at home wanna check that out, you can. Cool. Alright. So we've created those. Now we're just gonna create a couple more. So this is just our yeah. The metadata payment types and sort of
34:21 SQL enum and then our rates. So let's copy this. And now we can run slash d t, which is I'm assuming just described tables. Yeah. And we can see those three tables that we have already created. So far, it's still good. Okay. Now we wanna copy data from my downloads folder into this instance. That's a neat command if it works. Okay. Now is that going to be from where my client was executed, or is this gonna expect something that's living on the cloud server? Should we just test it and see? Yeah. Let's test it and see.
35:16 Okay. Let's try copy from downloads. I have no idea. And did that go into a directory? No? Okay. Looks good. Yeah. Okay. So that's from wherever I executed my client. I've just passed on an absolute path, so we should be good. Let's see. There it is. I have a I have a hunch that it might not be working, but let me actually try and figure that out. Would you expect some sort of output to suggest that it was copying? Yeah. Yeah. Yeah. I think you might have to just create a new tab in your terminal.
36:01 And then Mhmm. I'm just consulting the docs to make sure that I'm gonna say the right thing. Yeah. I feel like it's doing something. Surely, would crash. We can confirm the hypothesis. Right? Let me control c. Could have done a wrong file name first to see what happens. I've anger it now. Alright. Let's try. K. And I'll never find it. Okay. There is our command. We put the bash x back on. Yeah. That's for my password. So now I've got an active session, and I'm gonna copy this copy command again. Get the filename wrong. Yeah. I think that is working because when
37:31 the filename's wrong Okay. Cool. I think just maybe this is gonna finish first because I tried to control c it. It's not control c ing, so it's probably gonna wait till Okay. It finished sendings. Guess how big was the fail? Oh, that's that's all. T crap. NYC. One Point Seven gig. It can it can take a few minutes. Yeah. Yeah. Yeah. Okay. So oh, there's a fast alternative. Yeah. So there's a fast alternative. It's a little bit more involved, but maybe we can try that. Basically, we've created this Go program, you know, for folks who have more higher
38:23 ingest rate requirements. It just takes advantage of parallelism, And so it just it's a multithreaded way to insert into your Timescale database. Okay. Well, why don't we open? Would that mean if okay. What was the command? Was it slash d t s t b? Nope. Slash c would be yeah. T s t b. Alright. Select current. Like, I wonder if there's any data in this table now from NYC. Was it no. Was it just rides? It'll just be rides. Yeah. Oh, no. So we'll have to wait and wait for that operation to finish. Let me double check
39:12 to ensure that that's working. Yeah. As I mentioned, I haven't done this in a while, so it's good to know where the friction points are. I personally actually use p g admin all the time or have something that's inserting data, like, from a Python script or something like that. But this is hopefully, will be done soon. Yeah. I'm not really sure. I wonder if it's a special command I can send to postgres to print the status. Yeah. If you wanna try if you wanna try using p g admin as well, that one has a bit of
39:59 a better UI to show you, like, the progress of the insert. And you just kinda drag and drop which file you wanna set inside, and then it'll do it for you. But you know what port that runs on? I'm not sure. It's just five. I know that's a postcard port. Six Oh, it'll be six five two eight eight, maybe. Try that. I found this slightly different number. What you're saying? 65288? Yeah. Nothing. Okay. Maybe that's just for my my particular installation. Yeah. We got a suggestion from. It's like, Suggest I check my network IO. Thanks, Vignesh.
40:55 I'm pretty sure it quite fast. I don't even know how to do that on my Mac. However let's see. Let's find out how to run this PT admin thing. So all through through. My Google Foo is failing me this evening. So I did find another command which suggests I may try and run that locally and get a time from it. So let me show you this. So because of Stack Overflow, someone shared this command where it seems I can just type the files through PV. I'm not sure what that is. It doesn't appear I have PV.
42:13 Good postgres. We could drop in our string there. Would that work? Or am I just being really impatient and we should just wait? There we go. Well, I think it actually faintly canceled, didn't it? Oh, wow. Okay. It'll be from Rides? Of course. Yeah. Rites. Do you think I should run a local one? Let me just double check something. Yeah. I think the the hack around this will definitely be PG admin. So if we can find a way to open that, that'll probably be the best thing. Yeah. Sure. Alright. Let's see if it's running. Yeah. I actually use the Mac app for
43:20 PG admin, but that still just opens the browser as well. So it's not like a separate UI, but it should be the same thing regardless. Okay. Well, we got a comment from Ryan who says the issue is single threaded in a single transaction. So using the parallel copy is significantly faster because the copy is single threaded. Yeah. Maybe let's try the parallel copy. That's the command. That's I think it's the next command in the doc. Shout out to Ryan for his help. Do I need to install something locally for this to work? Because I'm not gonna have
43:54 Timescale DB parallel copy. Right? Yeah. So if you just go to the go back to the tutorial and there's a link to parallel copy just above yeah. Right there. And there should be some installation instructions as well. Yeah. I think it just wants me to compile it with Go. Alright. I'm pretty confident this will be fast. Once we have that, we are going to run. So we need to drop in the connection string of the DB. Okay. Let's grab all that just now. K. That looks better. Okay. So our connection string and our DB name. So our DB name is T s
45:13 t b. Yep. And our connection string, is that gonna be that fill forge thing? That'll just be the the service you are at the end. Alright. And we need the link to our file, Oh, it says download MIC data CSV. Right. CSV. Password authentication failed. So I think what you might do is Flash the password. Right? Yeah. That that's honestly the hackiest way. Not good security practices. Don't don't do what we're doing. Yeah. Please, nobody break my instance until after the stream. Afterwards, knock yourself out. Okay. So do we get any output from this, or are we just gonna be waiting
46:25 for oh, up every thirty seconds, we're gonna get some progress at the Yeah. Okay. So we can wait thirty seconds. So why don't we discuss some of the feature sets that Timescale brings to us? So we've kinda mentioned that we create hyper tables from our normal tables, and that brings in all of the time series functionality. We kinda spoke about time to live, which was that, you said, through user defined actions, which allows you to downsample that data. And it's So there's so on the on the topic of, you know, how long you wanna retain your data
47:06 for, time to live of of your data, there's a couple of features there. There's one that is part of the the data management features, which is called setting automated retention policies. So that's the the name in the docs that folks can search for. But the thing that I was mentioning is that these retention policies are set on a per hyper table basis, meaning that for every hyper table you have, you can set different retention policies for them. But if you would like to have one retention policy that applies to, like, all your hyper tables, you can use
47:44 a feature called user defined actions, which is basically so we have this background scheduler and things that that we've created in order to enable stuff like the retention policies to happen. So what that does is allows you as a user to take advantage of the scheduler and that basically, that stuff that's going on in the back end to run your own jobs on a on a specific schedule, and one of them can be this down sampling or this compression job or this data discarding job as well. Okay. I mean, and time sampling's important with time series data. Right? Like, it's probably one
48:25 of the first tasks that people should be doing when they start to store time series data because unlike traditional data that we store and relational or no SQL stores is that time series data isn't valuable forever. Is it? Like Yeah. It depends. Because I think a lot of the time when you think about the two major kinda uses that people have for it, there's kinda the real time monitoring, which is more, you know, the recent data days is what you're interested in. But there's also a case where you wanna look at historical analysis or just analytics of
49:00 how something has been doing. And there, you know, the recent data isn't quite important, but I think you're you're right that and so for the analysis, it's definitely more you know, the the more data you have depending on how long you wanna go back, the better. But I definitely agree that, you know, after a certain amount of time, data usefulness does degrade. But I think also, you know, the way it's going today, people wanna keep data around for an increasing amount of time because they're doing all sorts of analysis on it, machine learning models, you know, all those kinds of things. And
49:36 so there's actually a feature that Timescale does have that allows you to do that, which is native compression. Then that's just to keep your your data storage costs and the actual size on disk lower than if it's uncompressed. So traditionally with Postgres, one of the disadvantages is that the table sizes used to be quite big. But and especially with time series data, you know, you're inserting data every you know, whatever your insert period is, it can pile up quite quickly. So compression does help to keep that down and within costs if you're using like a hosted or if you're even if you're
50:11 running a self managed on something like AWS or something like that. Cool. Thank you. So Jared says evening. Hey, Jared. Morning depending on where you are. What time is it for you, how far? It's just past my day. It's like one 1PM now. Good day to everyone. And is saying, what about data archival? I mean, I can try and and fair what I think you mean there, but please feel free to just follow-up with a little bit more detail. So when we talk about downsampling with time series, let's use, like, a real time monitoring as an example. Like, there's something, you know,
50:51 if you're collecting data at what what would you say is standard for a Timescale customer? Ten seconds, thirty seconds, one minute? What what do you see in the field? Honestly, it really depends on, like, the use case. So for example, if you're monitoring machines or say, for example, it's a classic, like, DevOps use case where you're monitoring a bunch of servers that you have, then it's probably gonna be on the second or millisecond basis. But if you're another use case is like monitoring like weather or air quality, then you're probably gonna be looking at it
51:28 on like a minute basis because, you know, the weather doesn't really change that much within a second or so. So it really just depends on on your use case. But, yeah, I think along the the the the lines of like a second or a minute is is usually to what people use. And then for machines, if you're getting device data, it'll be at a much higher fidelity than that, like, millisecond. Is fidelity the timescale term? I would probably call it resolution. Are those interchangeable, or is there any different characteristics to of timescales? Oh, no. It's just it's just my
52:04 what that I use. Yeah. There's no standard definition. We would probably use the word granularity. Just like how granular you want your data. You know, if you're gonna downsample that makes it less granular, whereas if you don't, you're gonna have the the little grains. I in another presentation, I have this thing where you have like mountains of sand. And then when you downsample it, it becomes like a big rock or like chunked together. So, yeah, that's something a good way to visualize it. So maybe we can try and explain, you know, going back to Vignesh's question about
52:37 data archival. I mean, data archival probably isn't something that most people would do with Timescale data, but we we would downsample it and reduce that resolution or granularity over time multiple times. So maybe we could kinda give a traditional workflow there. Exactly. What I can do is I actually have a slide on this if I can share my screen. Oh, yeah. Go for it. Yeah. Let's do it. Yeah. Okay. Give me a second. There you go. Okay. So yeah. Basically, like, just to talk you through with the with the visual here. I'm not sure why this is taking so
53:18 long to go into presentation mode. Everything seems to be going wrong. Going live. Yeah. It's like Murphy's Law. K. Well, let's wait and see. Otherwise, I'll just exit. Okay. Whatever. Let me just exit. But I think maybe I can zoom in a little bit. No. Okay. Zoom out, it actually works. Okay. But in this picture, you can see basically, like, the bottom part of it is the raw data. And so raw data, you have a lot of these little data points coming in. And then what you would wanna do is you'd wanna use your raw data for certain
53:59 things. You'd only keep it around for a a certain amount of time. And then for kind of historical analysis so say for example, you have like a server and this server, you're monitoring like the CPU usage on the server. And you'd wanna keep that raw data around for, like, real time monitoring purposes. But then after, like, a day or so or after, like, a week or so, you'd no longer need to have, you know, millisecond level data about, like, what was going on on the server. So then what you would then do is use you you basically aggregate that data
54:31 to say, okay. Instead of keeping everything on a millisecond level, just give me things in five minute intervals. So showing me, like, every five minutes how my different metrics that I'm monitoring are changing. And so you'd use a feature called continuous aggregates for this that Timescale has. So this basically takes the raw data and says, okay, we're gonna just calculate the five minute average in our case. But you could use any function you want. So you instead of taking the average over five minutes, you can take the maximum. You can take some sort of calculation,
55:01 whatever method or, like, function you wanna aggregate on, you can you can do that. And then what often what people do is keep the aggregated data around, but discard the actual raw data. So in this case, you can see the trash can with the with the yellow is basically saying, After a certain amount of time, we're actually gonna get rid of this raw data, but the aggregated data still stays around. And then you can use that for reference or but it's you can see it's at a higher it's at a lower fidelity or lower resolution
55:36 as as you would say, David, whereas this one is at a higher resolution because it's just more data points. And then on the on the topic of data archival, you can then take this to say if we, like, go further move further in time, you can actually use your own data storage so that you can put stuff in like cold storage and stuff like that. It just depends on your your workflow and like what kind of data pipeline you wanna set up. But I know they are they are capabilities such that, you know, every you can set a procedure
56:09 to run, I don't know, every six months or so or something like that that says, okay. Data that's older than a certain amount of time, put it in my cold storage where I don't actually need to keep it on the database anymore. And, you know, have that cycle such that it goes from raw data to downsample data at the aggregate level. And then after a certain amount of time, discarding even the downsample data if you want, that's some option as well. Awesome. Yeah. Yeah. Great explanation there. So while you were kind of chatting through that, I
56:41 ran the numbers on our file. So we have a transfer rate of around 15,000 rows per second, which is gonna take roughly twelve minutes for the file that we're on, and we're currently just at eleven minutes. So we're almost there. Almost there. Okay. Well, hopefully, the the rest of it can be quick. But it's also assuming that my very crude math is is correct. But I think I think we'll be close for within a minute, probably. Awesome. So I guess, let's assume I'm even if I'm not right that now thereabouts, we're see what we're gonna be doing with
57:18 this data. So we're just going to do a very simple select from our rates table, limited to five. I guess we're just expecting to see a few a few rows come back. Like, that's Yeah. That's just to check that the import worked similar to what you were doing earlier with the count. Okay. We'll see some data like so. And then we're gonna start doing some analysis in this data. So this query wants to take a look at how many rates took place on each day for the dataset that we have. So this is just an a date trunk
57:55 as day with account and then grouping by. So pretty standard relational query or even a time series query, I guess. Yeah. Exactly. Yeah. It's just about, you know, for a certain day, you wanna look at what happened. So that's just an intro query. Moment of truth time. We're at twelve minutes. I don't wanna see a 12:30. Wanted to finish. Same. Yeah. Same. Yeah. In fact, we can see here that the total number of rows inserted were about 1,000,000, and my account that I did was 1.09. Yeah. Yeah. I never saw 12 start ace in my mouth. It was good. I'm happy.
58:52 Awesome. Okay. So that worked. So we know how. For those of you who are at home, definitely use the parallel copy if you are short on time or doing a live stream like we are. Well, yeah. There's a good point. Right? Because if we'd stuck with a single threaded copy, we run four workers there. So that would have taken forty eight minutes. Just crazy. I think maybe I just have a really fast computer because what I usually do when I insert this is I just use like p g admin and it inserts and then I just go make a
59:00 Querying our data
59:21 cup of tea and it's done. So I don't know. Maybe it's just like the the Mac that I'm using. But I wonder if PG and then does the parallelism too for you. Who knows? That might be something to investigate. I'm not sure. Alright. Now now I'm getting a bit more confident. So yeah. That's definitely gonna give me a big number. Okay. Any second. So let's grab our limit. That should be nice and quick. Yeah. So this is our check. Make sure we got the thing. Regretting that count now. Yeah. There we go. Well, this is also because
1:00:00 I think we're using, like, the very smallest instance. So Yeah. We did pick the smallest instance. But even then, it should be enough to do this tutorial. So We'll be fine. Something to talk to my team about. Alright. So let's try our first time series query. So this is gripping assistant account on a window. The window is per day. Right? Yeah. So this is just a query that, you know, to give people a taste of what sorts of questions you might ask when you're doing time series analysis or time series based analysis on something. So the way the tutorial is
1:00:36 broken up is the first part is historical analysis, and the second part is more like real time monitoring. Okay. So this is what you've So I'm gonna go slightly off script now. Go for it. Curious. So we've done a select as the current group. Now Yeah. Does does that mean I could do hour as hour? And let's limit this so it's not gonna be too long. So from where pickup date time is greater than or equal to and pickup. I it can just be one. Okay. Yeah. Less than or equal to yeah. That's fine. That's fine. Let's just do a
1:01:26 year. Right? So I want Aniston as oh, no. So the thing about this dataset is that it's data from, I think, 2017 or so. So we'll just have to use that that time. Well, I I can just give you just put in, yeah, January 1 and, like, January 8. Because the thirty one days is, like, the full month of the rides. So And let's just do three days. Three days. Yeah. Three days. And five hour. Because that's that's to me would be an another standard time to do. You know, when I have this kind of
1:02:01 an aggregated data, I wanna be able to see how many rides are were in a day or how many sales I had in a day. Then I may want I may want to break that down by the hour and say, okay. Well, which hours am I selling the most? So I have no idea whether my date trunk hour is gonna work, but I'm gonna assume okay. We still got the account from rides. Now that that the three day when three day period and an hourly window. Oh, I broke it. So what did I get wrong here?
1:02:31 I just made up that where syntax. Is that wrong? Do I need the time? It shouldn't really need the time. I think maybe the double quotes That what it is. Okay. Might be the issue. I have a reference query that I can check out as well. Alright. This is what I get for going off script and It's all good. But if I went to the Yeah. It's definitely the the it's use single quotes for the date. Okay. And it's the month first before the day. So, again, American data standards, unfortunately, for you know, I'm from South Africa, I'm still
1:03:19 not used to people in America, like, with their own ways of of notating things. So yeah. So it'll be o one can't be right. Surely, this is ISO standard though. Right? This is ISO eight six zero one. So the I I just can't see how that would be right. It has to be a three here. Yeah. So three there and then yeah. That's correct. Yeah. Yeah. So that would There we go. So we can see clearly this is people leaving clubs. Yeah. On New Year's Eve, twenty five thousand, thirty thousand taxis. It kinda dies down at 5AM,
1:03:59 and then normality seems to kick in eleven the next day or at least leave the larger numbers. Awesome. Nice. Cool. So thank you Stack Overflow once again. So now the tutorial is walking us through and it's asking us, yeah, what was the average fare then for these passengers? So this is a select by day run an average function on the fare amount where passenger count was one. Oh, okay. So you have quite a lot of data in the sample dataset then. But even though so many passengers were in each of the Yeah. So it's actually the reason
1:04:41 why we selected this. So just a good example of, the kind of things that you'd wanna track. Like, as much as you also you wanna know, like, when the rides took place and how many, you also wanna know all sorts of, like, metadata associated with. So it's a good example of, like, why where you would need both time series and metadata together to ask, like, real world questions that that people generally ask about. Okay. So we can see that the average fare for a New York City cab with one person is around 11 to $12.
1:05:17 Per per day. Yeah. So Oh, that's why I changed the number for the sake of it there. Right. Let's go. So now we can see our results. We can also do rate types. Let's see what this looks like. This one, if you just scroll down a bit on the tutorial, there's like a enriched version of that that query that is actually the next one. Yeah. So if you just do that one, that one actually might be better. It it calculates the same thing, but gives us a little bit more to talk about. Okay. So this is then
1:05:58 select from the rates description. So one of our metadata non time series tables. Yeah. Count vendor as number of trips. What's the rank function? I'm not familiar with that one. So that's just to rank the output. So we wanna have we're ranking the output by the number of trips that took place. There might be a lot of advance for some people. But, yeah, that's just you I I'm not sure if it's a % necessary, but I think it's just to make it quicker. Okay. Well, it's definitely two times for me so we can gloss over it. We then
1:06:38 joining our metadata web, the time series data, and then just providing on here. So we're just saying Exactly. Match on the rate of the case. So that's the sorry. On your go. No. I was gonna say that's just a big thing that to take away from this query. So you can see, like, the original question is, like, how many rides took place for each type of ride. So you can imagine, like, in any other case, wanna segment your measurements by something. So in this case, we're using, like, the the the type of ride, like, the fare type. So you can see,
1:07:10 like, rides that are grouped or going to a certain airport in New York City. JFK is one of the airports. And that's just a simple example of how you can join time series and relational data together in one query, which is again a a unique thing that's enabled by Timescale by keeping both of them on the same database. Awesome. Okay. So we're now gonna oh, we're getting a big one now. So let's actually see what this is doing. This one is gonna analyze the rates to JFK, I'm not familiar with my New York airport codes. Yeah. It's just another airport called Newark
1:07:47 International Airport. Yeah. It's there in the disc in the text. Alright. Okay. Let's run the query and see. Yeah. So this is an example of, like, a question that you might wanna know. So for any two types of things that you're monitoring, you wanna just do a quick comparison and, you know, see how how they compare to each other. This is an example of that just using rides to an airport as as example. Alright. So compared in those, we can see the number of trips, the average duration, forty five versus thirty five minutes. We can
1:08:31 see the average of the cost, the average of the tip. Weird that the forty five minute trip to JFK has an average tip of $7, but this thirty five minute drive has an average tip of nearly 10. That's weird as well. Also, the total is a lot higher for the the the shorter airport, the Newark Airport. Oh, yes. So it is. Yeah. So we actually if I don't wanna keep saying go back to the tutorial, but basically, this is this is an example of, like, you know, people keep talking about, you know, you wanna do data driven analysis and,
1:09:06 you know, have this kind of insight into, like, what's going on. So this is an example of, like, an ad hoc query that someone might wanna do on your time series data, but then combining it to the relational data and saying, like, okay. What's interesting in this dataset that we can talk about? And as you've just said, you know, there's some interesting things we can get from here. Like, even though the trip is shorter, the the the total is much higher. And even though the trip was was longer for the other airport, the tip is
1:09:36 much lower. And so, like, there's also the interesting things that you can get from Timescale's data. Alright. Yeah. Very cool. Let's see what else does it got. So mission three is is monitoring. So this is we our question is how many regs took place every five minutes for the first day of 2016? So this is very similar to the query that you did kind of off the script earlier, which is yeah. So so this is just using vanilla Postgres. What is it? Notation or syntax is the correct word. So that's how you would do it using the date trunk
1:10:21 Developer Productivity & Postgres Ecosystem
1:10:25 function. But the next query is gonna illustrate one of the time series specific functions that Timescale has, which is called time bucket, which basically allows you to more easily select, like, what kind of time periods you wanna analyze your data in. So, like, for example, you know, using day trunk earlier, we saw that it was easy to analyze things in time periods of, like, an hour or a day. But what if your main unit in your business is, like, fifty minutes or, like, six minutes or ten seconds or something some, you know, nonstandard thing? This is a function that we've developed time
1:11:09 bucket to to help make that really easy because this is the kind of question that's often asked in in time series use cases. Yeah. That's really cool. So it's just a convenience function that allows you to do window and work on my simpler syntax without worrying about trunking the dates and and things like that. That's nice. And you can just change it, you know, from if you don't wanna look at five minutes, you wanna look at, like, forty five minutes or something like that. It's a one line change whereas with the day trunk function, the math, you're
1:11:37 gonna have to do a little bit of maths to make sure that you're right that you're right. Yeah. Awesome. Very handy. Yeah. Okay. Our next question is how many rays on New Year's morning originated within four hundred miles of Times Square and thirty minute buckets? Yeah. Okay. Oh, we need to do something fancy here. So this is another example of, like, you know, talking about how timescales based on Postgres. You can use any other Postgres extension with Timescale. And one thing that we often see people using is combining time series analysis with geospatial analysis. So for example, instead of just knowing how
1:11:58 The Value of SQL for Time Series
1:12:19 something changes over time, you wanna know how something changes over time and space. So, you know, any logistics company or any someone that's tracking the movement of their fleet of trucks or something, this is, like, relevant to them. And so this is just a a quick query to show that, you know, if you wanna actually do geospatial analysis, this is how you do it with this extension called post GIS. K. So we enable the extension, and then we also have the table to tell it. Okay. Right. So this is we're saying that we've got some sort of coordinates
1:12:57 inside of our data and we need to change the column type over. And I'm assuming oh, we just got a quick update. Yeah. This is just allowing the the, what is it, the location data that we have and converting it to, like, standard coordinates so that, you know, the queries will work. There's an example that I can pull up after this to show you, like, what a what this would look like on, like, a live dashboard versus now we're just gonna obviously gonna get a number in the terminal of, like, the number of rides. Yeah. I must have been because we spun
1:13:27 Getting Started with Timescale Cloud/Forge
1:13:31 up a quite a small instance, the update here maybe just take a little. That might actually be true. Yeah. So I think this is this is really nice. I like that, you know, it's not just the, you know, the storage layer, the query plan and extensions, but it's the providing of helper functions to help make more traditional times use queries simpler for people to kind of type and and work with, I guess, on a on a daily basis. Yeah. The the integration with the post guest extension, I think, is is very clever. You know, you're very
1:13:40 Advanced demo of geospatial time series data with Grafana
1:14:07 anytime she just do it and it has latitude and longitude information on it, you're you're gonna wanna be able to map it over over time and space. And I think being able to visualize that. I can imagine that would work really well with your fan hooked up to it, and you could choose Exactly. Really cool visualization languages to see. While we wait, let me actually share the screen to to show people what this looks like. Yeah. Go for it. Let's see. So I'm just I've got a Grafana dashboard right here. And what's going on the screen in front of me is I'm
1:14:35 looking at the current locations of buses in New York City, and I'm using Timescale as my data source. So if I just zoom into one of these these panels in Grafana, So you can see here I'm using Timescale, the Postgres data source in Grafana. And I've got a little SQL query here to just basically show me the the the location of the different buses. So you can see I've got my latitude and longitude, and I'm just coloring the buses based on their route. So each there's a different, like, bus routes that they have. And so you
1:15:11 end up with something like this panel right here, which shows you, like, you know, for different buses. This is bus type m. It's in this location. These are bus type b. They're in originally mostly run-in that location. So if, you know, you get, like, this overview of, like, what's going on right now. And then this is another interesting example where, you know, you don't just wanna know the location of your buses. You also wanna know, like, when something is wrong. So in this case, let me actually zoom in a little bit. The purple buses are the buses that are
1:15:46 actually off their specified route. So what I've done is I've specified some some, like, kind of geographic lines where each bus has their own, like, line that they need to follow. And wherever they're off that line, I just have an is a case statement here in SQL that says, like, okay, then that means this bus is on route and off route. You can imagine doing this where, you know, if your device is on, like, low power, if you're monitoring, like, a bunch of devices in the field or something like that or if there's some kind of errant
1:16:01 Connecting & Database Setup
1:16:23 behavior, you can actually identify to say, like, okay. That's where it's happening and everything else is green. It's okay. And again, this is is using SQL to do it. So it's quite a powerful query that's just a few lines long. And that's another example of, like, the real time aspect of, like, okay, combining time series and geospatial data. Very cool. I like that. So this update query is still running. So we're also approaching about an hour and twenty minutes now. So why don't we is there any more demos that you'd like to show? Maybe we could finish on that. And then
1:16:59 I know we had a few problems and there's more content for us to cover, but maybe we could schedule a a part two where we do a slightly different live. I think we need to because the part one was mostly let's just get everything set up. So I I wouldn't be opposed to that. Yeah. Yeah. Let me let me actually share my screen again and and take you through some quick high level demos. So okay. So we're back in Grafana. So on this keeping on this theme of buses, you know, David talked about earlier, you know,
1:17:29 you wanna know how your volume is changing over time. So this is just like buses on a specific route, how they how many buses are on the road at any particular time. And you can see, you know, this pattern of during the day, obviously, the buses are gonna be at the highest and in the evenings or at after midnight, everything kinda shuts down. This is just a very easy way to to detect the patterns of, like, what's going on. And I'm using a function here, the time bucket function, to look at these in five minute intervals.
1:18:03 I could easily look at these things in something like sixty eight minute intervals for some reason. And I guess I'm running a quite powerful instance here so that the thing is quick. So you can actually see and change the fidelity at which you look at data. The other thing you can do as well is if I wanna look at something over, like, seven days, that's just, you know, a a quick change. And there, I can actually see the general trend is that it's it's trending downward over the past seven days. But I'm gonna keep this as five minutes.
1:18:36 Go back to the original one. And then some other things that are interesting is we have functions to deal with common time series time series problems that happen. So one of them is a function called gap fill, which is basically allows you to fill in gaps in your data. So the comparison here is let me find a table that has a gap in it. Okay. So say in this example here, you can see at 3AM on the February 7, I didn't have any vehicles on the road. And so it just shows me like a dash here, meaning like there's no there's no
1:19:21 data. But we have like two functions that you can use depending on how you wanna treat these gaps because often you wanna have like a continuous line when you're doing some sort of a time time series analysis. So one of them is gonna be to interpolate between the last two values. So let me just find the right time, And that's gonna be at 3PM. Three AM, I think. So, yeah, between two and 3AM. Okay. Cool. So you can see here the number here is a one and the number here is a three, and we have
1:19:41 Loading Data (Troubleshooting & Parallel Copy)
1:20:04 two gaps in the middle. So if I use this interpolate function, the output between the one and the three, I'm just gonna get the linear interpolation. So it's gonna assume that, okay, between one and two between, yeah, one and two, it's gonna go from three to three to two, and then from two to 1.6, and then all the way to one. But, of course, since I'm counting buses, there's no such thing as, like, 2.33 buses that doesn't exist. So we have another function that you can use called LOCF, which means last object or last observation carried
1:20:41 forward, which allows you to solve this problem by just taking the last number and carrying it forward. So you can see here between this time of, like, 1AM and 4AM, it just assumes that I just had three buses on the road at that time until I hit 4PM where my next data point comes in. And these functions, just to give you a sense of how they look, they're just these one line functions where you're saying LOCF, the the quantity that you want to count, and then you're using this function called time bucket gap fill instead of a date trunk
1:21:16 or time bucket to calculate your time values. So those are just two other examples of these, like, handy SQL functions for time series analysis. And then I just wanna show you one last thing, which is okay. So this is this is a different dataset. This is actually monitoring live air quality data in India. And so you can see here, I've got a bunch of, like, air quality metrics, ozone, nitrous dioxide. This is, like, particulate matter over 25 nanometers. So there's a bunch of metrics. And I've also got a certain location that I'm monitoring, which is in Agra, which is a fun
1:21:57 fact where the Taj Mahal is in India. So they probably wanna know what the air quality is in there. They got a lot of tourists. So this is actually quite a real use case. And so I've shown you a time bucket gap film already. There's another function that allows you just to get the last reading for each location. And so if I go to view this classic live scenario, it's taking long. But basically, over here, it shows me, like, for each of these parameters at the place that I'm looking at, this is the last reading that I have. And so you can
1:22:33 imagine, like, you wanna know the status, the last received reading from some sort of devices that you're monitoring, and you don't know where that when that's gonna where that's gonna be. And so we have this function handy time series function that just shows you the last value as ordered by time. So this is different from just calculating the value and then ordering by time because sometimes depending on this has to do with, like, the the query the way the query planner executes. If you use this function, it's a lot more efficient than scanning the whole database
1:23:12 for to find this this last value. This actually takes advantage of some of the chunking and and stuff that I talked about earlier with the hyper table. So that's a a an overview of the features of, like, the handy SQL functions. And then here, I've got just got an overview of, like you know, we talked about earlier about downsampling and continuous aggregates. The data that I've that what I'm showing here is going to be the daily average. Again, I'm not sure why this full screen view is not working. But basically, this shows, like, the daily average of different
1:23:49 air quality metrics. And that's an example of, like, okay, we have data coming in, like, every 15 minutes or something like that, but we've down sampled it here to be every day just to show the max, minimum, and average. So again, it's example where you don't just have to down down sample it to the average, you can take whatever function you want, whatever aggregate function. So here we have both the max, the average, and the minimum of all these different metrics for and that's using Timescale continuous aggregates. So, yeah, you know, that's just a a
1:24:23 little overview of, like, some of the real world use cases that you can do. I like to demonstrate things with Grafana dashboards because, you know, this is something that people would look at all the time. It's also easier than working with the terminal, which we see has caused us a lot of problems today. But yeah. This is just an overview of the sorts of things you can do at Timescale. And I've just got a slide later on just for with some links that people can go to if they wanna learn more. But we'll we'll get to that in a
1:24:49 in a few minutes. Yeah. You're right. Timescale is data is very visual, and I think Grafana dashboards are a great way to kinda understand trends and, you know, peaks and troughs and really just get a glance at what is happening through the data. So, yeah, Grafana dashboards and graphing in general is really important. %. Okay. So I think what we'll do is you and I can catch up after this episode. We will get something scheduled because I feel that we've got so much more we can cover. Yeah. Yeah. Yeah. Arranged that. And then if you wanna pull up your slides, we can
1:25:22 give some people some more resources to take a look at. And then I will let you get back to your day, and I will speak to you again soon. Yeah. No. For sure. I I think that there's definitely some interest in folks wanting to learn about, you know, how would, like, a data life cycle look like from raw data to downsample to discard it. So maybe that's actually a topic that we could do. But in the meantime, for folks who enjoyed what we saw today and who wanna get started with Timescale, the best place is to do so in
1:25:52 the docs, docs.timescale.com. We also have a newsletter that sends out any feature updates or any tutorials and stuff that we have, learning material. So you can just go to the short link tstb.co/timescale- newsletter. If you wanna get help using Timescale, we have a community of thousands of users in our developer Slack as well as folks like myself, Timescale CTO, Mike Friedman, Ryan, who helped out on this livestream. He's actually a developer advocate at Timescale. And so you can find us all in the Timescale Slack where we can give you help, you know, if you're trying to figure out if Timescale
1:26:33 fits your needs or if you can't quite get something to work. Folks are there to help you. And then at the end oh, and then and then if you wanted to use Timescale, the easiest way to get started is with the hosted database. You can start a free trial at tstb.co/getstartedcloud. That's actually quite a long short link. But, yeah, hit that up. Might have to change that one. But, yeah, there's there's so many things there. And then my name is at aftar in Slack. So you can just find me and at me, and I'll be happy to help you personally.
1:27:05 Awesome. Thank you very much. Alright. That was we had a few challenges, but we showed off some really cool features of Timescale. We'll get something started to show off even more of that. We'll tackle data. You know? I I think we can really dig in to the data sampling data over time and show people how that works and how to work within that. Lots of great content to come. Tara, thank you very much for joining me today. I will speak to you again soon. Have a great day. Thank you, David, and thanks everyone who watched and who's gonna watch the
1:27:33 recording. Please use Timescale. Awesome. Alright. I'll see you soon. Thank you. Bye bye.
Technologies featured
Meet the Cast
Stay ahead in cloud native
Tutorials, deep dives, and curated events. No fluff.
Comments