About this video
What You'll Learn
- Use GitLab Duo to turn a hardcoded frame extractor into a CLI tool.
- Reduce frame output by sampling every ten seconds instead of every frame.
- Combine OCR, OpenAI analysis, and SQLite inserts into one pipeline.
David uses GitLab Duo's chat and code suggestions to build a Python pipeline that extracts frames from a video with OpenCV, runs OCR via PyTesseract, classifies the text with OpenAI, and stores the results in SQLite.
Jump to a chapter
- 0:00 Starting with an AI Prompt
- 0:35 Introduction to GitLab Duo and Video Processing Goal
- 1:16 Task: Generating Script for Frame Extraction
- 1:28 Reviewing Initial Python Code
- 3:22 Refining Script: Adding CLI Argument
- 5:09 Testing the Frame Extraction Script
- 6:40 Refining Script: Limiting Frames per Second
- 7:29 Reviewing and Integrating Frame Interval Logic
- 8:47 Testing the Refined Frame Extraction Script
- 9:35 Task: Extracting Text from Frames
- 11:04 Exploring Text Extraction Methods (Local vs. Remote AI)
- 11:53 Generating Code for PyTesseract (Local OCR)
- 13:23 Generating Code for OpenAI API (Remote OCR)
- 13:36 Preparing to Test Both Extraction Methods
- 15:52 Testing Local OCR (PyTesseract)
- 16:54 Decision: Use Local OCR for Batch Processing
- 17:05 Refining Script: Batch Processing All Images
- 20:13 Using GitLab Duo's Refactor Feature
- 21:33 Testing the Batch Text Extraction Script
- 22:04 Debugging File Output Path
- 22:58 Batch Text Extraction Complete
- 23:03 Task: Analyzing Text with OpenAI API
- 23:20 Generating Script for OpenAI Text Analysis
- 24:18 Reviewing OpenAI Interaction Code
- 26:24 Initial Test and Debugging OpenAI API Calls
- 30:48 Refining OpenAI Prompt to Avoid Filters
- 33:49 Successful OpenAI Text Analysis
- 35:00 Requesting Structured (JSON) Output
- 36:28 Receiving and Reviewing JSON Output
- 37:00 Task: Storing Data in SQLite Database
- 37:06 Generating Database Connection and Schema
- 39:38 Applying Schema to the Database
- 40:31 Task: Parsing JSON and Inserting Data
- 40:54 Generating Code to Clean JSON Wrapper
- 42:40 Generating Test Cases for JSON Cleaner
- 43:46 Integrating JSON Cleaning and Parsing
- 45:13 Generating Code for Database Insertion
- 46:16 Testing and Debugging Database Insertion
- 49:00 Verifying Database Contents
- 49:42 Conclusion and Summary
Full transcript
Generated from the English captions. Timestamps jump the player to that moment.
Read the full transcript
0:00 Starting with an AI Prompt
0:00 Hey, GitLab Duo. I need you to help me write a script. I want to process a video file, and what I actually want is to save a screenshot or image into a directory called frames for every frame within the video file. I don't know what language is best for this, so you can choose. But I do ask that you comment the code to help me understand it. Thank you. Hello, and welcome back to the Rawkode Academy. I'm your host, David Flanagan, although mostly known across the Internet as Rawkode. Today is a fun video. I have the opportunity to play with and
0:35 Introduction to GitLab Duo and Video Processing Goal
0:49 explore GitLab Duo. This is GitLab's integrated AI that allows you to write code faster. I've decided that I want to do some processing of my YouTube videos so that I can use AI to generate thumbnails, descriptions, summaries, command extraction and a whole bunch more. So I figured let's put these two projects together and see what we can come up with. The first step was just to write some script that will generate images from every frame in a video. You've already seen the prompt. Let's see the code. Let's take a look. So just before we go through this line by line, I want
1:28 Reviewing Initial Python Code
1:32 to clarify that I am by no means a Python developer. I write Python a handful of times a year and not very well. I also wanna point out that none of the tasks that we're gonna ask it to do are that difficult to code yourself. And I think that's where AI really excels in the software development life cycle. Saving your time from things that aren't hard even if they are a little bit complicated, allowing you to free up more time to focus on novel solutions to more difficult challenges. With that being said, let's see what we
2:09 got. So I asked GitLab Duo to generate me some code. It doesn't specify a language. It decided that Python was a good decision because of the OpenCV library. And in retrospect, I completely agree. It was a fantastic decision, thank you. I asked it to comment. Now the comments aren't that useful, but then the code isn't really that difficult to understand. So I can kind of forgive it for that. As we give it more complicated tasks, hopefully, it'll be able to elaborate on these comments a little bit further. However, we can see that it's kind of
2:46 divided our task up into three major steps. One, a little bit of preparation. First, you just configure where the video lives and then create or ensure that a frames directory exists. And then opens the file and immediately starts to loop through reading it frame by frame. The last task is to save each of those frames as a JPEG. This is really easy with OpenCV, but would I have been able to write it as fast as GitLab Duo did? No chance. So let's make our first improvement. This is nice but I don't like the hardcoded video dot m p four even though
3:22 Refining Script: Adding CLI Argument
3:28 that is the name of our file. Let's assume we're going to be able to run this on any arbitrary file within my machine. Now again, I'm not a Python developer. I don't know how to write a CLI with Python. I don't even know how to process argv. I'm hoping GitLab Duo does. So let's take a look. Let's try and be bold. Let's select everything, come over to our chatbot and let's say if I turn this into a man line application, we want to ensure that the filename can be passed in by the user. Type in its code.
4:33 There we go. Let's copy this and replace. All this done is tell me or show me, educate me that we can use sys. Rv with a positional indicator, in this case one, I'm assuming zero is going to be the entire command line and it's even telling me how to use this which I think is a useful comment. So let's add this to our code. Now we haven't actually run this code yet though maybe now is a good time to test it out. So here we are in the terminal. We're going to run Python frames dot py
5:09 Testing the Frame Extraction Script
5:19 followed by video dot m p four. Now now occurs to me that we don't really have any print statements, so there's a lot of assumptions about whether this is actually running. So let's confirm that by splitting our pane and opening and finder. And right away, we can see frames directory. And if we just show this in preview mode, we can see the ticker scrolling along the top as we run through hundreds and I'm sure thousands of frames. Given that this video is probably about an hour and a half long at a minimum at 60 frames per second.
6:04 That is a lot of frame. But we'll let the script get there. And I won't bother going through it anymore. So our script does work. So let's use some movie magic and come back in a moment when our frames are rendered. Okay, so that is now finished and it did take a little bit of time but we did get a 71,000 frames. So that's 60 frames per second for a little over a hundred minutes. Now we don't want to spend a lot of time during this video waiting on my computer or cheating with movie magic.
6:40 Refining Script: Limiting Frames per Second
6:46 Well, we're going to limit this to not save every single frame. And of course, we're not going to do this manually. We're gonna ask GitLab Duo. So let's say instead of capturing every frame, let's capture one frame every ten seconds. Alright. Now we have some updates to our script. Now we're not gonna copy and paste all of those and put it over the top because we've been making modification. We've got comments, we know how to run the command, and we can see here that it's right back to hard coding our video dot m p four. So we're just gonna select the
7:29 Reviewing and Integrating Frame Interval Logic
7:46 things that we need. Now we can see that it imports math, so let's bring that in in alphabetical order and we can see that it gets frames per second after the video capture. So we're going to store that as FPS two. We then have a frame interval that we wish to configure for us that is 10. We want to skip 10 times the frames per second for one image every ten seconds. We're then just going to grab this if statement and indent our code. This time, we only save the JPEG when we have a round ten
8:42 second frame. Let's go back to our terminal. I'm going to remove our frames directory, which on itself may take another while at a 70,000 frames and rerun our script. And hopefully for the last time we'll be back in a moment. Okay. Let's take a look inside of brems, pipenet, wc and we can see that we are now working with 572 file. This is a lot less than a 70,000. This is something we can work with on a local machine. So let's go back to Versus Code. Now my mission here is I want to take the 600 frames and in fact let's
9:35 Task: Extracting Text from Frames
9:41 pop them open before we go back to GitLab Duo and take a look at our preview. Now that's only 600, we should be able to get an overview, at least a broad overview of what happens on the stream. We've got people, we've got Kubernetes commands, a whole bunch of YAML, Goes on and on and on. We've got comments from the audience. We've got lots of journal d, system d commands. I think we can have some fun with this for sure. Now wouldn't it be cool if you were watching an episode of clustered, and we could give you a list of
10:23 the commands that were executed on the episode, provided they were important or useful. It would also be nice if we were able to extract and highlight the error messages. Now, of course, we could do this all with manual work, but we live in the age of AI. We're using AI to write more AI. Can GitLab Duo help me interact with other AI models to remove the laborious and tedious task of watching every single episode to describe fine text and error messages, logs, and commands? I sure hope so and we're about to find out. So I really have no idea what the
11:04 Exploring Text Extraction Methods (Local vs. Remote AI)
11:06 best approach here is. We could continue using OpenCV, which I do think has text capabilities, OCR, or maybe we need to start shipping these images to other AI models. Let's ask GitLab Duo. I want to extract all the tech from the generated images. What is the best way to do so? Alright. So it is suggesting we use PyTeseride, which runs OCR against the imaging. Alright. And seems like a good first approach and has given me some code. We'll test that code in a moment, but let's ask it a follow-up question. Would it be easier or better,
11:53 Generating Code for PyTesseract (Local OCR)
12:18 better is a terrible word, or more accurate to send these images to an AI model. In fact, let's make it a bit more specific and let's just say to OpenAI's API. And it says yeah. Okay. Can I have code to do that? Please remember to be nice to our AI overlords. Oh, not enough details. Alright. Let ask our question again and keep our contact. Provide the code, Duo. No, please, because it made me ask twice. Awesome. And now we have code to interact with OpenAI's image to text API. That is very cool. So let's try both approaches and see how
13:36 Preparing to Test Both Extraction Methods
13:38 we get on. Let's copy the code from the py tesseract and call this local image to text and paste it in. This wants a single image. So right away we need to make some modifications because we are gonna have to loop over our directory. However, to test this maybe we will hard code one image and see what the results look like and we'll also do the same with the OpenAI code. So we'll call this remote image to tech. And now we have two scripts that we can work with, both of which want a single image.
14:22 Then we'll ask it to change the code. So let's find a suitable image. If we go back to the terminal, open frames and finder, scroll to the the the close my eyes here. Dreadful. Although we at we do just want to get text to be fair. Alright. Okay. Here we have some good tech. We have some pod output at the top. We've got some hidden, but it's a crash lit back off. You can't actually see a complete version of that, except for maybe the top array. I'm curious to see if the remote model OpenAI would be able to inject
15:07 some knowledge into that. And we have a command and I have one error message about daemon set cilium not found. I really want to know if we can start to process this with local OCR at remote. This image, this frame here, B4200. And why can I not copy that image name? I have no eye dear. Well, Mac just doesn't want me copying image names. Alright. V4200. 4 2 0 0. Alright. So let's start with low. I think that makes a lot more sense. Paste our comment, to ls frame and format is frames frame three four two zero zero dot JPEG.
15:52 Testing Local OCR (PyTesseract)
16:11 Easy. We open it. We get the text, and we're going to print it. I guess there's only one thing to do, run our local script. That was a face of pure joy. That has done really, really well. Awesome. But generally just impressed that I haven't written a single line of code myself and we're doing OCR against frames generated from a video file. Such a great time to be a developer. Okay, I'm so happy with the local text and I know it's gonna be a lot faster than using OpenAI to process 600 images that we're just gonna go with local.
17:05 Refining Script: Batch Processing All Images
17:05 So we're gonna update scripts to look over all the frames, grab all the text and save it to text file with the same name as the frame. Then we're gonna get GitLab Duo to send that text to OpenAI to try and help us understand all the text that we've found. Okay. Update this code to look through every image within the frames directory. And can protect. Alright. So it's telling me that we can use a clock. So we can remove this. Then we open the I don't think anything else has to change, but now we don't want to print the
18:30 text. But instead, printing the text. Save it with the same file name but with a text extension. Alright. So it's broken it down into tasks for me. Oh, we got this import statement that we're going to need. That's not changed. This hasn't changed. Now we're getting a base name. And then we write. Oh. That it? Oh, these imports are separate. Check that. We got that one. Oh, we got that one. Alright. I like them in alphabetical order. Thank you very much. So go up the frames directory, get the base name, do the OCR, raise it to fail.
19:52 Could not be more simple. Okay. Before we run this, which I'm sure we could do that would work just fine, let's take a look at some of the quick actions provided by GitLab Duo. Here we can ask GitLab Duo to refactor our code. Let's see what we get. So here's one way to refactor the selected code. It has now broken it down into individual tab, which will take. Alright. So we have one function, image detect, where we look over the globe. Perfect. Alright. I think we can refactor this further. Let's see. Image to text seems to do more than
20:13 Using GitLab Duo's Refactor Feature
20:55 one job. And and we refactor further. Yes, we can. Thank you. Let's copy this code. So now we've got our for loop of our glob, we got the base name, open the file, we extract the text and we save the text. This I'm quite happy with, but let's see if we can run it. So we run our local image to text and if we look inside frames, all we have are JPEGs right now. So local and we'll split this. Do we have any text files yet? No. Wow. Alright. Let's kill that. And make sure we write it to the
22:04 Debugging File Output Path
22:06 same directory. Oh, save tech. Filename. So we just need this to use our frame there. And we'll delete all dot text files. Python local l l print. And let's just r g. And now we're getting our text files. So I'm not sure how long it's gonna take to run the OCR and 600 files, but let's just get by the bottom. Alright. So that's now finished and we have our 572 text files. What we want to do is start to send these to OpenAI to do command extraction, error message extraction, and maybe we'll think of something else along
23:03 Task: Analyzing Text with OpenAI API
23:15 the way. So let's start chatting to GitLab Duo. We need a script. Let's batch up all the text files within the frames directory. We need to ensure that each batch does not surpass the 28 token limit of GPT four turbo preview. Let's send each batch to OpenAI and ask it to extract command and error messages and print them to the screen with a discriminator. Oh, we're getting the script. Undo, make me happy. So we need let's call this OpenAItextdiscovery.py. Alright. So we are importing OpenAI globin OS. We still have a frames directory. We have a max token limit. We know now that
24:18 Reviewing OpenAI Interaction Code
24:49 we can do a gate NV open AI token, this is what it gave us before. We've already imported away. Now it's creating a creating batches, which using the glob, looping over, reading the bytes, concatenating the bytes and oh yeah and we have a check here to make sure that no batch goes above the token length. When we have the batches, it's then enumerating over them, sending a completion to da Vinci o three which will change the model. We have a prompt to extract commands or error messages from the text with a print statement at the end. So
25:32 it's not filled everything, asked for a discriminator, that was maybe a bit of a long shot but we'll tweak it to our needs. We do however want to use GPT four turbo preview. Now this is using the completion API. I'm not that familiar with OpenAI stuff, but I know there's instructions, completions, and chat. I felt maybe we just chat, but I guess if this is a one time thing, not a big deal and I really just want to run this and see if it works. So we've changed the model because we need to context size with the large amount of
26:08 max tokens. Max token response I guess we're only asking for this I'm going to bump this up. I don't know what the normal value is, but let's just throw it up anyway and we'll print the batch and we'll see what we get back from API. No idea right now if this is going to work. Oh, so that's the bit I broke. We run it again. Okay. So OpenAI completion is no longer supported. We either have to downgrade or fix this to use the correct API. Okay, so just in the interest of time and I'm not sure how up to date
26:24 Initial Test and Debugging OpenAI API Calls
26:56 GitLab Duo's code samples are, I did just from the OpenAI docs grab this. But I know I said I wasn't gonna write a line of code but I'm just gonna make my life easier for the next few minutes. And we're gonna do from OpenAI import OpenAI. And it feels from what we've seen in the error message and what I've seen as a quick giggle there, just that this API has changed drastically for the one point zero release. And I don't feel it makes it too difficult to forget that too. But now that we have the right import,
27:36 we can create our client like so. And if I had called us the right key, I wouldn't have to actually do this, but it might just open AI token and we can remove this. And then we have our completion code here. So how close were we? Client chat. Completions create messages. Alright. That's format's different. Role user content f string like so. And the model, we're not using that one so that's okay. And if we don't need to set that then let's not bother. And we're gonna add a break here so we just do one batch and see the response.
28:42 Let's see if that helps things along. Oh. Oh, come on. Give me some commands and enter messages. Alright. Okay. I think I know what this is. Now because we changed this API here, I wonder what the response is actually going to be. Let's do what I do remember. The only thing I remember from my Python days is that we could do a PP print PP print response. We'll just comment this line out just now. I just want to know what we're gonna get. Alright. We have stuff and things. I mean, we can't in our message saying
29:52 that can't do what I want it to do, but we're gonna tweak our prompt. So we got a chat completion, choices, choice, message contents. Alright. It looks like it's this, the little and it's not text. It is message content. And I guess we can do that. Maybe. Again, my Python is abysmal. Okay. And then let's tweak the prompt so we try not to get an error message. So let's try and tell it what we wanted to do is extract anything that looks like a Linux command or error message from this text. And it's already slow, which is a good
30:48 Refining OpenAI Prompt to Avoid Filters
31:18 thing we didn't use this for the image to text. Right? Alright. So we still got that choices message. So I wonder if we could just paste that in. Help me out. What does how do I fix this? Alright. So it says that this is happening because it is an object. Oh, yeah. Okay. So choice is zero. And I wonder if that's gonna be the yeah. Message content. I could do. I'm sorry, but I cannot execute our parts. Linux commands are directly. If you have specific questions or needs about Linux commands, Kubernetes concept, or error message, feel
32:34 free to ask. Alright. So it seems to be struggling oh, execute. I don't say execute. Ah, okay. So I think we've had some sort of safety filter where it thinks I wanted to execute commands. So we're gonna change our language. We'll give it more context. This is the output from my terminal after executing some Linux commands. What was good is it it said Kubernetes and we hadn't mentioned Kubernetes yet. So it's definitely understanding what's going on. Can you tell me what command I executed and what error messages I received? Take 20. Feeling pretty positive about this one.
33:49 Successful OpenAI Text Analysis
34:42 Awesome. So awesome. Okay. So it's definitely doing what we want now. We bypass that filter about executing command and we've got a ton of information. Let's see if we can get this in a structured format. Return your answer in a structured JSON format. Oh, cool. Alright. We have JSON, we got all the executed commands and the outputs, which we didn't even ask for. We can see p s, a u x, scrap it for API. We got the response. We got kubectl describe daemon set cilium. We get the response. We got some system controls, and then we
36:28 Receiving and Reviewing JSON Output
36:47 have a whole bunch of error messages down there. That is phenomenal. Thank you so much. GitLab Duo and OpenAI. Okay. Let's go back to GitLab Duo. I need some more help. I need a function that creates an in memory equal light database connection. Alright. Let's put our import to the top. Now we have a create memory database. I'll take it. It's telling me how to use the database, so let's put this into our main bit of code. Before we start our batching, let's get our connection to the database. Now, I need a schema to store
37:06 Generating Database Connection and Schema
38:07 command and their output associated with a live stream. Let's see what we get. Alright. It looks like we're not getting the schema in SQL format, so let's follow-up. And I have this in SQL. Please and thank you. Let's copy that. Let's just store this in main dot SQL. Now it's already inserted and we don't need an insert. It's actually not give us any information about a livestream. I would like the command associated by foreign key to livestream. Alright. Let's grab our new version. So now we have a livestream, which has an ID, a name, a start
39:22 and end time. Perfect. And we have command, primary key, integers, date, title, etcetera. So now we have a valid schema. Let's write this to our database connection. How do I apply my schema, which I've stored in main dot SQL to the in memory SQLite connection? Alright. So the bit we need is command execute web are open. Like so. So we open the file, we read it in, and we do an execute. Perfect. Now what we want to do is parse this response instead of printing it and generate inserts and text to our database. How hard can that be?
40:31 Task: Parsing JSON and Inserting Data
40:46 Well, when we printed out this, I noticed that we got a markdown style wrapper on our JSON. We'll write a Python function to remove da da da code block from a string. The code block may contain a language such as JSON. This should be removed too. Alright. So we got two Verition. That's removing the entire block. Yeah. We don't want to remove the entire code block. I want the code, but not the wrapper. Please just remove the the the JSON and the the the from the end. I'll take it. Alright. So let's put this new function
40:54 Generating Code to Clean JSON Wrapper
42:34 towards the top and move our import further up. I'm not entirely convinced this is correct, I need this code to be tested. So let's select it and say give me test. And it looks like we have a whole bunch. Let's call this code wrapper test dot py. Well, important to bet is running a test against JSON hello world. Let's make sure it takes them off the end too. Or not, it's got test cases for that. Okay. Remove opening tag. That's pretty neat. The fact that we got actually, it hasn't just written a bunch of tests. It's written specific tests.
42:40 Generating Test Cases for JSON Cleaner
43:36 So we got opening tag, closing tag, both tag, no tags. Perfect. Okay. So let's bring back our code and now what we want to do is call remove code wrappers. Our response so that we get some sort of JSON like so. How do I parse JSON to a dict in Python? JSON Lloyd. Thank you very much. So now parses and we probably should call it JSON. I'm sure there's an import JSON. Yeah. So let's import JSON. And now we can print our deck data which should have you know what? Don't remember. Let's run. Well, we don't want to run our TET.
43:46 Integrating JSON Cleaning and Parsing
44:58 Alright. So our output gives us a list under executed command. So let's copy this structure. Alright. Let's get some help writing this to our SQLite. Let's see. Assuming the following JSON payload. Right? A Python function to look over executed command and insert the command into a SQL ite table. Alright. What have we got? Oh, we already have a connection. Before command and our deck data, execute a command. Insert into commands command. Let's just check our schema. Commands command. Perfect. Values command. Commit and close. Alright. Now we wanna be able to confirm this, so we're gonna make one more change
46:16 Testing and Debugging Database Insertion
46:20 to our script. Change this connection from memory to local file. Let's see how far we get. Python open. Don't. What'd I get? What'd get? Oh, it's con. Con. Con. Con. Oh, no. It's because this creates cursor. And I didn't think that was important. My fault, GitLab. My apologies. One last try. Let's open Beekeeper Studio, Or we can say SQLite. Here's file. Navigate my huge directory structure. Video enrichment server. RamesDB. Open command, and there we have it. Awesome. So that was a journey, but I barely wrote a single line of code. Using GitLab Duo, we were able to write a ton of
49:42 Conclusion and Summary
49:53 Python code. First, processing a video file and saving all the frames as JPEGs. We then looked over all those JPEGs extracting the text. We sent this text to OpenAI where it classified commands and error messages as well as the output for the commands, which we didn't even ask it to do. GitLab Duo then wrote test cases for functions that parsed and removed code block and provided a database schema and inserted our data into the database for us. Not bad for under a never's work. That is a fairly chunky task made easy by the power of AI
50:45 and GitLab Duo. So go forth. Get yourself some GitLab Duo and make your life a whole lot easier. Have a great day.
Technologies featured
Stay ahead in cloud native
Tutorials, deep dives, and curated events. No fluff.
Comments