Manage and Run AI/ML Models as OCI Artifacts with ORAS

Watch / Rawkode Live Live

The embedded player needs JavaScript.

Open the video stream (HLS) Download captions (VTT)

Expand player Shrink player

Overview

About this video

What You'll Learn

Package AI and ML models as OCI artifacts with ORAS.
Push model artifacts to any registry and mount them in Kubernetes.
Sign and verify model artifacts with Notation and ORAS attach.

Josh Duffney and Feynman Zhou from Microsoft show how to package AI/ML models as OCI artifacts with ORAS, store them in any container registry, sign them with Notation, and mount them into Kubernetes pods via OCI image volumes.

Transcript

Full transcript

Generated from the English captions. Timestamps jump the player to that moment.

Read the full transcript

0:02 Alright. Alright. Welcome back. You know the score. Rawkode Academy's live. We're knocking at your door. With knowledge bombs and tech that's red hot. Cloud native AI, yeah, we've got a lot. From ours to OCI, the future's in our sights, managing models day and through the night. Kubernetes clusters and registry so grand, we're taking AIML, putting it in your hand. Today we're diving deep. No time to delay stop. Cloud native AI, gonna go the extra mile. So buckle up, get ready to learn. Hello and welcome back to the Rawkode Academy. I'm your host, David Flanagan, also known across

2:23 the Internet as Rawkode. In today's session, we're gonna take a look at running AI ML workloads on Kubernetes and all of the fun, interesting aspects of doing so. There's a lot to learn here today and we've got two fantastic demos from two fantastic people from Microsoft. So let's move over there and meet them now. Hey, everybody. Hey, Fenman. How are you? Good. Good. I just have to say to start, that is probably the coolest intro music that I've had or I've seen. So props. Yeah. Well, you know, it's only fitting that we're doing an AI ML episode. We have

2:58 some AI generated music, and I love that you can just I mean, it's just so available these days. It's amazing what AI can do for us. And there's a lot of cloud infrastructure, CPU, GPUs working behind the scenes. And hopefully, today, we can give people a bit of insight on how to do this correctly, properly, safely. I'm not sure, but I'm interested to find out. Could you both please take a moment to introduce yourself to the audience and share a little bit about you, please? Sure. I'll go first. So my name is Josh Stephanie. I

3:27 am a senior cloud advocate on the cloud native team inside Microsoft. Been in the role a couple years. Before that, I was, NSRE at Stack Overflow, and then a DevOps lead in previous roles. So it it was kind of a good natural progression and and fit for me, but I really enjoy enjoy the work and then kind of, you know, just a little bit more background on me. Mostly writing Go, but starting to write in Rust and kind of float around a lot of CNCF projects and contribute and do conference talks and stuff like that. But that's kind of like a a

3:56 TLDR on on me. Feynman, over to you. Yeah. My name is Feynman. I'm a CNCF project maintainer and currently maintaining CNCF Notary Project, Auras, and RADIFY. Notar project and Auras will be the major focus today because we're going to share how to manage and run AI and machine learning models as OCR artifact with Auras and also verify it with notation. I'm also a product manager for Microsoft Azure. I have been working on the secure supply chain over the past three years, and I enjoy the work including open source and securing supply chain. All right, awesome. Thank you both so much.

4:46 Let's start off with a couple of questions and then I will ease into the demo today. Well, you know, I've been in this Kubernetes space for a while. AI and ML workloads are are very new to me. I'm gonna say running them on Kubernetes does fill me with a little bit of fear because when I think of Kubernetes, I think of stateless services, small containers, low memory and CPU footprint and been packing as much stuff into a single machine as I can. But everything I know about AI and ML is, gigs, terabytes, petabytes of data being shifted about.

5:20 I've seen some crazy container image sizes that I don't even want to think about. Is Kubernetes important in AI and ML workloads? And how do we make that sensible for people that are watching? It's really brave. Do you wanna take a a stab? I I can I can take a stab if you want, Feyman? Yeah. You can. Go ahead. For for me, it just kinda brings me back. You know, containers started about that the image size. You know, if you look if you look back, like, even the Windows containers, I know they're not as popular as the

5:51 Linux containers, but they they started around that same that same size. So Kubernetes as as a platform, so per so to speak, can handle that type of size. I think where it's most appealing for me to use Kubernetes is just the orchestration and the abstractions that it gives you and being able to move components around. So I I would hate to give up that flexibility of the orchestrator from a deployment perspective and then being able to have everything in YAML as much as YAML is a pain to work with From that the CICD perspective and the and the velocity that you can

6:24 achieve with those systems, I think, is still pretty important. Granted, our image sizes are increasing again, but it's really not too different from a recent past, to be honest. Alright. Awesome. Yeah. I mean, we've already gotten out ten years of expertise of Kubernetes under our belt. And as we explore these new workloads, which are AI and ML based at the moment, I mean, everything seems to be going this way, with agentic workflows now. You know, there is a potential future where we don't even write software anymore. We're just writing LLM agents and then having them talk

6:58 to each other. Like, I I I don't know how far away that is, but the AI buzz would tell us that it's next week or it was last week and we're just not. Yeah. It's hard to tell. But, yeah, I'm very excited for today's session. Can you both take a minute just to tell us what you're going to be showing the audience today and what they're gonna walk away with? Sure. I'll let Fei Men go first because he's doing the first demo. Yeah. Today, we're going to walk through the challenges of managing AI and machine learning models

7:25 in cloud native world, especially on Kubernetes. And then we're gonna briefly share what is OCI, the OCI standard, open container initiatives, and also what is the CNCF ORAAS project. And next, we're gonna briefly share the Notre project. I remember we have another fellow, Yi, who delved into the Notre project and ratified for signing the verification. But today, we're gonna demonstrate Notre project in another aspect, which is sign and verification of the AI models. And we will have two demos today. The first one is to package and run AI models as an OCI artifact, and we

8:10 can push it to the OCR registry. And the second demo will be taken by Josh to demonstrate how can you sign and verify your AI models locally and distribute it via OSA registries. All right, awesome. Well, thank you for sharing. I think it's about time we share a screen, get hands on, and show off some pretty cool open source technologies. So let's jump over to the screen share. Alright. Take it away, Venkman. Awesome. Thank you, David. So when we're thinking about managing AI or machine learning models, what's the challenges coming to your mind? Actually, not an AI expert. I'm also quite new

9:03 to AI, but what attracts my interest? Actually, have seen a lot of customers and community fellows who are asking questions like, hey, we have a very large size LM models. How can we run it in a container? How can we containerize it and make sure they can be stored in a centralized place, just like OSAT registry? And how can we distribute it to different platform and also from dev to production? And eventually, Devon built a central platform to host those models and also securely distribute it from on premise to multi cloud environment. That is the questions come from our real

9:56 customers and the real world. We have seen users like Bloomberg, Red Hat, and also a lot of end users, they are trying to store their AI models in the Container Registry. That's very magic, right? So previously, our awareness about the Container Registry is that we just use it to store container images and maybe Helm charts, right? But with the new standard such as OCR 1.1, we are able to store anything in OCR registries, such as arbitrary files, supply chain artifacts like SBONG and signature, and also maybe even AI models in a central place, right? So here's the thing. So we have seen

10:49 a lot of customers and industries are complaining about lack of a unified standard to manage and distribute AI models in a cloud native world. So basically, the lack of the standard to make the model management have versioning and reproducibility because AI models is evolving very fast. They may have multiple versions, but they didn't use Git and OCI to manage those versions. So versioning and the reproducibility might be the first challenge for the industry. Second, as we all know that the AI models sometimes have more than at least one gigabyte, right? It's very large. So it's a big challenge for them to

11:44 distribute it from their local to the remote securely. How can they secure the AI model distribution? That's another challenge. And how can they transfer those models from their local environment to maybe different cloud platform and eventually running it on Kubernetes. Even more, they are thinking about how can they deploy it in an efficient way. So those are the challenges that we have been observed from the industry and also from our communication with the customers, with our communities. Okay, so the AI or AI infra, machine learning infra developers, they may want to use the same tools

12:36 without any change. They want to have even minimal change in their workflow to achieve a unified management norm in their AI or machine learning standard, right? They want to have easy mode evaluation and deployment. Those are the real requirements that we heard from the communities. But from the engineering perspective, so we may receive the request from the engineering team that they want to ensure the software best practice by following the engineering conventions. For example, they want to have the model files management immutable, and they want to have very strict versioning for the model development. Because in their mind, models are also just

13:28 normal software. There's no big difference between other software and models. So they want to have very strict immusability and versioning, as well as proper release process and DevOps. They want to automate everything, right? Because they are facing a bunch of pain points to manage thousands of the software delivery in the company. And also they want to make sure the model distribution and deployment made the security and the vulnerability checks and eventually meet the company compliance criteria. So that comes to a problem. Should the ML or AI developers host those models on a separate model registry,

14:20 such as OLAMA registry, or they want to build an additional Snowflake. But from the engineering point of view, they want to have a unified platform. Maybe it's an OCR registry. Maybe it's an S3 bucket or is a blob storage. So the thing is too, people want to unify model management and also the model distribution in a central way. Okay. I heard a practice from Bloomberg. They demonstrated how they currently manage thousands of AI models in an OSAT registry because they noticed there are several benefits of using OCI registries as model registry. First, they can easily standardize the

15:25 packaging process. So let's say, for example, they can package the AI models as OCI artifacts. With OCI 1.1 standard, there's a property artifact type. They can easily define the type of the AI models and package it into an OCI artifact, eventually store the OCI artifact in the Container Registry. This should enable consistent distribution across different cloud environments because the OCI artifact will be stored in a central Container Registry. Then users can pull the image or the OCI artifact from the registry and easily distribute it across different environments. And also, all of those dependencies can also be packaged together with the AI

16:17 models in an OCI artifact. They can be distributed together with the AI models. The second benefit is that users can leverage the registry of efficient storage because the Container Registry use the content addressable storage to store the contained images. And if you look at the model file and the OSAT image, you will find the directory structure looks quite similar. I will have another we will have another picture to showcase the structure of between AI models and also the OCI image. Obviously, if you have anything in your Container Registry, you will have a centralized version control,

17:14 and it will be easy to roll back the OAI model. If you find there's any vulnerability or there's any unexpected change, you can roll back to the last version. And you can also reproduce the change for different environment, right? And obviously, you will also have a unified access control because you can leverage the role based access control for AI models and container images in a central place. And you can also enhance your supply chain security for your AI models by attaching signatures, S bombs, attestations with the model files in the OCI artifact. That is the benefits of using OCI Registry

18:01 as Model Registry to distribute and store your AI models. So go ahead. So we're going to deep dive into the OCI standard. So Josh, would you want to take it? Yeah. Of course. And apologies if I went a little bit darker. It looks like my my light died, and I don't know where the cord is to plug it in. So apologies. But, anyway, so, yeah, we're gonna take a look at the OCI specification here and what this looks like. You might have seen this already. Most likely, you've kind of inadvertently seen this in GitHub or whatever your repository looks like,

18:35 but you've probably not really taken a close look at what the metadata is. And what was interesting so I don't know. Was probably a year ago. I just decided to go on a little programming adventure and figure out, like, how could I like, what actually makes a container? What would I need to do to create my own, like, Go runtime and and Go packaging for a container? And I ran into this layout. I started to use the OCI and realized why every every registry uses this. But if we just take a look at this real quick I don't know if I

19:02 have a mark or anything, but I'll just kind of call it out. But if you look at it, there's kind of this bigger frame that you need to look at or a bucket called the manifest. And inside the manifest has the config, and then it has the layers. And then those layers are actually, in the case of a container, gonna be the file system that are gonna be archived and compressed into a particular layer. And those are all the layers. So, like, the first time you build it, you get your one layer. The next time you build the next one. And

19:28 then when it runs a container, it extracts them all out into a unified file system that gives you all the files. So, like, layer two might just have, you know, foo dot t x t in that layer, and the rest of it has the operating system, the root file system. And then the config is what would be passed to the runtime to modify that container. So if you had any kind of runtime constraints, like like privilege or CPU limitations, any of those different things are gonna be inside the configuration. And so what the manifest does is it

19:57 puts those together in in this metadata layout here that you see. So you you can see in here we have the media type that it's an image manifest, and that's more of a generic container than you might think. It's not just for containers. It's like any OCI artifact can be used and then bundled in there. And then each one of those things, each item in here, the configuration, which you see that next block down below, that has a media type of config, but then it has a digest. So each of these has a digest. It's just a unique

20:24 address for all that content. And then if we go down to layers, that's gonna be what's actually stored in there. And then annotations are actually what you'd see in GitHub if you're pushing to, like, GHCR. You can you can take it with different stuff. Like, here's gonna be the URL for the project and all that good stuff. So that's kinda just like a a quick rundown of what the OCI specification is. But if we can adhere to this standard, then we can push the models to any registry. We don't need to have a special models registry for it, and then it can

20:58 work straight into all of our clusters or any of our existing CICD pipelines because we're using the standard. So go ahead, Faiman. Go to the next slide, and I'll I'll dive in a little bit more. So I guess I should have waited for this slide. But if we look at this, this is a a deeper dive into all these. So we can see down in the layers, we're actually replacing. Instead of using container layers and archiving compressing, we can actually replace the layers with files themselves. So this would this is gonna be a way how we can take those models and

21:27 put them into OCI by using files instead of the the compressed layers themselves. And you just see that underneath kind of like the middle of the screen layers. You see the media media type there that is considered a layer, but then its reference is gonna be the foo dot t x t and then the bar t x t in in the end there. So this is how we can use the image manifest to not only store container images with their layers, but also files. And that'll be more important, later on. Let's go ahead and move to the next one.

21:56 So there's two ways to refer. So this is a relatively recent, development in the OCI standard before. It was just we're gonna push a container. It has a configuration. It has layers. We're gonna pull it down, extract them out so Docker can run it and other run times can run it. But with the the supply chain work with, like, signatures and SBOMs, it became important to be able to refer to other artifacts. So this is a signature. I'm just gonna push it up to the registry. How do I know that it ties into this container

22:28 image? How do I know it's related? So referrers, there's two different ways to refer. There's a referrer's API, which is newer, and then there's just a referrer. And I'll go into both of those. So we'll we'll go into referrer first. Referrer would use a subject section, so you can see that on the right side there, where it says, okay, here are the subjects that are referring to this other referring to this image or this artifact. And so then you can, with ORAZ Discover and other tools, you can use those links because those digests, unique addresses

23:01 between things that are pushed to the registry can now have relationships built. The Referrer's API lets basically gives you an API endpoint that you can relate to, and it keeps the manifest the same. There's no changes to your actual manifest, but through the referrer API, you can link them. And you can see that metadata with different network calls to the referrer's API. So that that is important for my part of the demo that I'll be doing with the signature, which is how can we relate or show relationship between and connect OCI artifacts. And then for the OCI

23:39 artifact format in the model so if we look at the container let's see. I'm just trying to refresh on the slide. And, Fengmin, if you have more contacts for the slide, let me know. But it looks like yeah. I'll actually let Fengmin take on this one because I think oh, this is more relevant. Yeah. Go ahead and and take it very well. Do you wanna go do you wanna go this slide? I'm trying to recall here. Oh, this is just the layout. Right? Yeah. So okay. Yes. Absolutely. So on the top here, we can see

24:09 so if we were to pull down this one makes much more sense to me. Thanks, Raymond, for switching. Yeah. I wanna compare. Like, there's actually not a lot of difference between the model registry format and the OCI format. And if you look at it, the the model registry is actually using a version of OCI in in this metadata here. It's just changing the format how it is, enough that it is you can't just take an existing tool for an OCI registry and use it. But we can we'll walk through changing that in this in this talk. But if we look at

24:42 the top part, this would be the equivalent if you were to point ORAS, the tool we're about to talk to, and pull down the artifact, and we're gonna pull down the files that it has. We're gonna extract it out. So it's not just some nebulous thing up up in the registry. This is actually what would happen if you pulled it on on the cluster, and it's gonna extract out and run it. So we have the container container decontext v one content, then we have the blobs. The blobs then have a SHA, and then inside that SHA

25:09 is gonna be the the digest itself minus the SHA. And so those are gonna be all the layers of that image. And if we go into manifests, there would be the overall manifest that would have the config and then have the layers inside of that that bundle it. And one of those layers in there would actually be the config JSON as well. And then Ollama just it's a little bit different. It actually splits out. It says, here's gonna be your manifest directory and for each one, so instead of each artifact having the manifest with the layers, it

25:41 splits it out and says, here is the manifest, and then here's all the blobs for all the models. So they're disconnected. The manifest that has the configuration for it and then the actual layer that holds the model are in separate directories. So Okay, back to me. Yep. So as Josh just shared, the structure, the directory structure between AI model, just like the OLAMA model and also the OCI image, you will find they are quite similar because they both use the content addressable storage to store the AI models and the OCI images in AI model registry and also the OCR registry

26:34 respectively. So why do not unify them in a central place? Can I add one thing to this, Feynman? Do you mind? Yeah. So it's It just popped in my head a good analogy. So, you know, back when containers started, we there was, you know, big awareness of, like, don't put your database inside the container image mounted as a volume. Right? It because it's it's large. Well, the same thing's really true with models, and this is what Fame is kinda setting up here is, well, what if we could store the model on the registry? That's cool. We

27:08 we talked about that. It could be on a different place. But how do we get that and use it inside of our container image? It's not gonna be pulled down the same way the container image is. It's to actually start the thing that's running. But what if you could take images or things that are on a registry and then mount them as volumes just like you would a database? Because it's too large to completely replicate and bake into each image. That's that's exactly what this feature here sets up. So I'll let Famey take over that,

27:33 but I just wanna draw that analogy. Like, think of it as we had this big database file that we wanna mount into our container. We don't wanna bake it in and copy it everywhere. How can we leverage volume mounts for that? And there's actually a new alpha feature for the and I'll let Feyman take it away, so I won't stall the thunder. But go ahead. That's a good point, Josh. So actually, if you're looking at the relatively new Kubernetes version, for example, Kubernetes v1.31, you will find there's a brand new feature, an alpha feature introduced in this release, which

28:08 is the OCI image volume. The Kubernetes community leveraged the benefits of OCI standard to make the OCI image volume as an OCI artifact so that Kubernetes users can mount the model directory, the file directory, into a container and run it in a pod. That is very efficient, right? And you can also take a look at the Kubernetes blog post for details and its use cases. Looking at their blog post, the major use case is to resolve the AI model and machine learning model management and also make users be able to run it in a container.

28:56 So how can we achieve that? So basically, you will need to first push your AI model files into a container registry or OCR registry. So to achieve this, we will need an artifact tool to push your from local environment to an OCR registry. The answer will be ORS. ORS is a fully functioning OCR management tool, and it can be used with any OCR registries. Today, most of the popular Container Registry has supported OCI Artifact, OCI 1.1, and also can be compatible with ORAs tool. So that means you can use ORAs to distribute any kinds of

29:49 arbitrary files and AI models across different registries, including Azure Container Registry, Elastic AWS ECR, Docker Hub, Harbor, and DOT. ORIS is also a CNCF sandbox project. You can use it in your on premise environment. And it is also fully OCI standard compliant. We provide CI command line tools and libraries in different programming languages. And also, we have a very strong community back this project. And we have the Aura CLI supported by Microsoft, Red Hat, and several community organizations. We have Aura's Go, Aura's Python, Aura's.net. And recently, we have Aura's Java SDK. You can go to the Aura's website for

30:51 more details. And also, you can install Aura CLI via different channels. Okay, go ahead. So as we can see, we have a lot of open source projects and registries embrace AURs and OCI artifacts. So let's say, for example, Helm has integrated auras since they are Helm V2. And also a lot of security projects like Falco and policy notation. They leverage ORANS to authenticate with the registries and also manage OCI artifacts across container registries and their local file system. And as I just shared, most of the popular container registry have been compatible with OCI artifacts. That means you can have very good portability

31:55 for your AI models if you install your AI models in the OSA registries. So here's the list of installing auras from different channel. On Windows, you can use Wingate. On MacOS, you can just use Brewing Store Auras. That is quite straightforward and very easy to set up. Okay. Let's jump to demo. Talking is cheap, right? So we can have a live demo right now. And I also have a step by step hands on article that can be shared after this session. So this is the hands on labs that I'm going to demonstrate. As a whole, we will share how to

32:47 package an LIM model. I use Olama model as a sample model and package it as an OCR artifact. Then I will use auras to push the AI model into an OCI registry. And next, I create a PV and PVC to mount the OCI artifact and some volume into Kubernetes cluster. This is just the latest alpha feature in Kubernetes. And finally, I'm going to deploy the AI model in your pod and mount the model from the OCI registry and also run the AI model inside the container. The second demo will cover how to sign the verified AI models locally and also in

33:41 the Container Registry. So this is the high level process for this demo. So let's jump to the terminal. And I'm going to show you the hands on steps. Okay, first of all, assume you already have the tool to manage your AI models. Here, I'm using the OLAMA, one of the most popular AI model management tool. So actually, I already have OLAMA in my local. And if I run OLAMA list, I already have the tiny LAMA model put in my local file system. Then I'm going to push these OLAMA models into a OCR registry. So here,

34:46 if you look into the models directory, you'll find there are two directories. One is blobs. Another one is manifest. This is the structure that Josh just shared, which is quite similar to OCI images. If we dig into the blobs folder, you'll find it has a bunch of model layers, which is similar to the OCI image layers in the blobs. And if you're looking at the manifest, it has a AI model manifest inside the manifest folder. So here are the raw files that Ollama client tool can be used to run an AI model in a file system, right? So the goal

35:49 here is to package the whole models folder into an OCI artifact and store it in a container, in a container image. So here we can use Auras tool. I installed Auras v1.2. This is the latest stable version in the community. So we just need to use Aura's push command and specify which directory you want to package and push to the remote registry. So here, we specify the registry. I'm just using the ACR as a sample registry, but you can use any registry on your demand. And then you can specify the target directory you want to package.

36:38 So here, as my model is located in the OLAMA folder, so I just specify the target directory here. And you can also specify the artifact type for your AI model. So here I just use a sample artifact type here. So if I run this command, I should use a relative file path. So I just specify V2 and models because I'm already in the olamas folder. So here, OLAS is going to package the directory into a TAR file and make it containerized in OCI artifact. So if we run this command, ORAS will be studied, packaged, and

37:51 archive the folder into an TAR file and make it as a layer of an OCR artifact. It may take some time to execute the command. As someone who tried to build that tooling on their own, I could definitely appreciate archiving it, zipping it up, creating the metadata layout for it. So I'm I'm a big fan of WebRTC for a number of things, even WebAssembly components. I actually hooked into that library as well. Yeah. I remember Josh also has a nice demo to demonstrate how to package WebAssembly modules as an OCI effect and store it in

38:37 OCI registry. This is similar to how we distribute and package AI models. So here, you'll find the AI model has been packaged into a TAR file, into a layer of an OCI artifact. So we can look at the registry site. And you will see there's repository named Olama TinyLlama. And if you're looking at the tags, this is the tag the V2 is the tag that we just pushed to the OC registry. So if we zoom in and look at the layers, you'll find there's a specific layer to store the AI model. It would just be a cool way to

39:34 show off some of that. This is what I end up doing that keeps me out of the the browser. I'm I'm a terminal junkie personally, so I like if I can get out of away from a browser, I can. So here, this is the the ORS command that you can use to to display your manifest information. Correct. Yeah. There you go. We can have the same JSON format on terminal, and we can use ORS manifest fetch to fetch an OCI artifact from the Container Registry. And it will be the similar view with the Container Registry portal.

40:07 Okay, I think we finished the first step, which is to push the AI model as an OCR effect to the registry. And then we're going to create the persistent volumes and also a piece of the persistent volume claim to mount the AI models into a container and run it into a pod. Here, you will need to create two YAML files for your persistent volumes and persistent volume claim, which is PV and PVC. And to save time, I already created those two storage stuffs. The first one is all Lama models PV. Here you have to specify

40:56 your volume handle here, which is your image reference. We can just paste the image reference from your Container Registry and put it into the property value volume handle here. And next, we can create a PVC to consume the PV. And here you just need to specify which PV you want to consume in this PVC. And let me show you how it looks like on my Kubernetes cluster. So here I have a Olama models PV created on my local file system. And I have a PVC that consumes this PV. And looking at the status, the PVC has been bounded

41:53 with the PV. That means we can use the PVC to mount the OCI image volume. I have also defined the part for our AI model. So here you can just use this piece of YAML file to craft your part. So here I have already created this YAML file on my local. I just need to apply it. Okay, it might be another folder. Oh, here. I just need to apply this Ollama pod YAML and create this pod on my Kubernetes cluster. So by default, the image volume will be mounted into your specified path in the container.

42:54 So I just need to apply that part and among the specified PVN PVC in that part. Okay. There might be something broken. But ideally, we should have the AI model as an artifact running as a part in our local file system. Maybe I can troubleshoot later, but there might be some configuration problem here. Let me check. Oh, there might be some disconnection of my Kubernetes API server because my kubectl is broken. But I can troubleshoot it later. So the idea here is to run the AI model as an OCR artifact and mount the image, the OCR image, into an image volume

44:13 so that it can be run as a normal pod in Kubernetes cluster. And we can use kubectl sq into the container and run the OLAMA list and also OLAMA run the tiny OLAMA to use the OLAMA model in your container. But unfortunately, the demo is broken, and I will provide a recording after this session and also share the script with all of you. I will hand it over to Josh. A question before we move over to Josh. Yeah. Yeah. Yeah. Shoot. So hack the Gibson, great username, by the way, dropped some of the questions in, you know,

45:01 when you use the RRAS commands, you push the entire model's directory. Mhmm. I'm assuming it's pushing all the models that are downloaded into a single OCI artifact, or does it know that you were trying to push the tiny LAMMA one? So how do you work with all models and individual models when you're using RRAS with the LAMMA? You would have to so he he got away with Raymond got away with it because he'd only downloaded or pulled one model, so it was only going to compress that. So that'd be one way. The other way is

45:28 you would have to look at the manifest and link what layers for each model, and then you'd have to run separate ORS commands to make sure you're only pushing the elements or the layers of the image that you want to bundle it. So he can more accurately instead of just models and putting or he could just put all the models in there and just, here's the models. But he would wanna have to pick out the digests for those layers and then push them as tiny llama and whatever the next model would be. Yeah. Alright. So I guess the preferred workflow, if

45:56 you are doing something like this, is to pull one image, build the OCI artifact, clean it up, and then repeat. That would be the simplest way. The simplest way. Yeah. Yep. Yeah. Alright. Awesome. Thank you. And if there are any more questions from the audience, please feel free to drop them in the chat. Other than that, I'll hand over to Josh for demo two. Alright. Let me see if I can move this. Oh, I'll just disappear for a while. My screen up? It's up there. Alright. Awesome. So we'll we'll go with this one. I couldn't help myself from

46:30 trying to get demo magic to work, so we'll see how well it works for me. But, yeah, just to to start. Hey, Famey. Do you mind muting your keyboard right off? Sure. You got MX Blues? I think. Could even come up with keyboards on this stream. That's the problem. So I I got the HH Keybys, which is the quietest one I've ever had. But, anyway, yeah, to set up this one, what I'm gonna show real quickly is a new feature from Notation, which is blobs signing or and verification. And to kinda set the stage, signatures are becoming more and more important. The

47:02 the analogy is I actually got this from Kelsey Hightower's talk before he retired, sadly. I know he's still around on Blue Sky and stuff, but I'm not working full time. But, anyway, he gave a supply chain talk a couple years ago, and he used this really great story where, you know, he he went into a coffee shop. He found a thumb drive, and it was labeled, you know, like, latest go packages or whatever else that for for compression or whatever. And so he's like, oh, well, naturally, I'm just gonna plug that into my computer, and I'm like, oh, this is exactly the

47:33 source code I need. I'm gonna pull it over. And he used that analogy to say, like, we, of course, we would never do that. We're not gonna take a random thumb drive from a coffee shop and plug it into our computer, but we pull down container images and packages that way. We don't have any verification. And so signing is one one way that's emerging now where we can actually have some trust with whatever it is. And so we have container image signing through notation and other tools like cosign, but we keep when we're doing these arbitrary

48:03 blobs since there isn't a predefined OCI format and everything for the image quite yet, and we're just pushing random bits of you know, random archive or whatever, can we still use those signature best practices and supply chain security on those things? And the answer is yes. And so that's what I'm gonna I'm gonna show off today. So without further ado, this is the notation. It should say blob, not blog. That's a typo. So blob signature. This is only available in version two, the alpha. So if you don't have this one, you won't have the subcommands.

48:36 So just to be aware of that. So first thing, I've I've gotten the model into an archive, and I've compressed it. So that's what this is. And I'm gonna create a signature for that with notation. That signature will be local, and then I will use another command in a second that will connect them, which is why we preface to talk with the referrer's subject. So I've successfully signed it. And let me just create a new one here so we could see it real quick. So if I do l s here, I have this signature file now. K?

49:09 We'll jump back to the demo. So now I'm gonna push the model as this artifact of layers tar, and I'm gonna push it to my registry. So now I have the model on my registry, and I have a signature locally. But now I want the registry to have the relationship between the signature and the artifact. So I'm gonna use ORAS attach, and this is gonna use the referrer's API. So if we were to look at the manifest, we're not actually gonna see the relationship. We actually have to hit the referrer's API to see the relationship between them. It's not

49:44 gonna be in the subject that I talked about before. That's a separate way to establish relationships in the registry. So now it's attached. Now that it's attached, I'm just gonna use notation l s, and this is gonna show us the relationship between the image and the signature. So we can see here, here's my model. It's yelling at me because I'm using the tag and not the the digest because that's more clear. Tags are mutable, so they could change. And so it's wanting me to use, the immutable one, which would be the digest. But, anyway, we have the model

50:16 here, and this is the digest here. So that's another way to reference the OCI artifacts. And then underneath of that, I can see that there's this CNCF notary signature with a different digest. So that's the digest of the signature itself being linked to it or referred to it. So I'm gonna create a policy, a trustor policy that will allow me to do verification. So just to create the signature, I don't do need to do any of this. But if you ever wanted to validate the signature, these are the steps now that you need to follow.

50:47 So I already have it. I'm just gonna override it locally. So now I have a policy that states, okay, it's it's basically setting up what certificate should I trust for this entity, for this this policy. And so I'm gonna this is information from the certificate that was generated that I used to sign, and so that needs to match. The trust store that it was stored in was the the CA demo. So now that I have a policy, I'm gonna just pull down the artifact, not validate the one that's local because that one was assigned to someone in the registry.

51:19 I'm gonna put it in a downloads folder. So I'll pull that down, and then blob verify, give it a policy, and then I'm gonna point to the signature and then the path to the thing that I wanna verify. And and there we have it. You know, pretty pretty straightforward. But that's how you can use the new notation blob, not blog, blob, verify and and sign together to kinda secure your ACM models. That's it for me for that demo. Is that the demo, both demos finished? Yes? Yeah. Actually, I just saved myself because I just got the problem here and I can

52:08 finish my demo. Oh yeah, over to you. Good work. If you could share your screen again, please. Yeah, sure. Let me do it. It turns out that live demo is sometimes not always stable, But actually, the Kubernetes cluster is not done. I actually have a kubectmin to run my local Kubernetes cluster. I just forgot to switch to the root user because I installed it on a root user. The kubectl only has the access to if I run it in root user. So here, if the I pod YAML file, and it will create a new pod,

52:59 consume the OCI image packaged with the AI model. So here, as we're looking at the OLAMA model file, the YAML file, so we can see the image the OCI artifact that we just pushed to the OCI registry. So we have two versions here. I use v1. That is the version that I previously pushed to the registry. That also turns out that we can have the versioning control for OCI artifact in the OCI registry. And also, it can apply to manage the AI models in the registry. So here, if we apply that part and create a

53:46 new part, then we got the We should have that part running right now. But as the model file is relatively large, so it takes some time to create a new container here. So ideally, if the container if the pod is running, we should be able to execute into the container of that pod and run our LAMMA tiny LAMMA, run this model in the container, and interact with the AI model. So here, assume we already zoom into the container, and we should be able to interact with the AI model. So who you are? It should tell me, hey, I'm an artificial

54:43 bot. Okay. That's it. Alright. Awesome. Thank you. So let's tackle a couple of questions, and then and then we'll wrap this up just on the hour. I I think ORAS is an absolutely fantastic tool. Right? I'm seeing it used more and more. It's almost the factor standard now for, you know, pushing and pulling the things aren't, you know, typically container images. I mean, even to the point, I think anyone watching this session is probably already using OCI artifacts, and they don't even know it. Because if you run the brew command, all of those packages are

55:21 now coming from GitHub's container registry to the best of my knowledge. Mhmm. So it's just it's nice to see the OCI standard and these registries being used for such non container images cases. What I'm curious about though is I'm assuming that people maybe are not as familiar with the Notary project as they are with OCI and ORS, at least at the start of 2025. And I think that Notary version one was always considered quite academic and a difficult program to use. You've shown us now that there's Notary V2 currently in alpha one. So my question is,

55:57 if I get to it is the Notary developer experience improving with two point zero and what is the roadmap like to get to a GA release of Notary two? Just to correct the branding, actually, we should say Notary Project two point zero because we rebranded the project and we no longer use the Notary V2 as a brand. So in terms of the roadmap, we want to make sure the block signing feature is stable. And also going forward, we will support formatted output. It will be used to pipe the output to the CICD pipelines and also the automation scenarios.

56:43 So here, all of the demos are running in our terminal. But in production environment, developers will run notation or RARS commands in a CICD pipeline or in scripts. So with the formatted output support, developers will be able to pipe the output and use it in a CICD pipeline. So two things. One is to make the block signing and verification stable and also support formatted output. One quick note too. The note does have a GitHub action that works for container images. If you're worried about signing container images, you don't need to have some kind of bash script running

57:29 ins inside your workflow anymore. You can just use an action, and it gives you all the the features and stuff like that. So do you have any personal experience? Like, what were some of the rough edges that you had in your mind that that you'd like to see addressed? Maybe they have been. I don't know. Yeah. I think the the scriptability of it was always a major challenge, but also just, you know, the commands themselves are quite verbose. You know? I'm assuming there's a reason you recorded your demo or at least set up some automation because those commands are

57:55 not an easy type, especially. So I don't think that's necessarily a bad developer experience, but it can be quite daunting following along with tutorials and copying big masses. Yeah. It it does take a fair amount of context. Like, okay. Well, what what like, the envelope size or or the envelope itself. You know? Like, I don't even fully understand the difference between the two. You know? Like, I know that there's a difference. I know that it's changing how the signature is formatted, but a lot of that gets into cryptography. So, yeah, maybe, like, some defaults to shorten the commands would be good.

58:28 So you don't need to necessarily what's a good default for those more verbose flags? I think one of the other challenges was also was key management. You know, people don't like having to manage root certificates and such like that. Uh-huh. Yeah. But, you know, when it comes to to trust, right, you gotta trust someone. And if you don't meant those keys, there is no trust. So Right. I guess what people really should be doing is is leaning in to Azure and all the other providers that have KMS because, you know, these tools are invaluable for management. This type of support material. I

58:59 could definitely see that. Yeah. We're getting getting notation getting bundled in with a PKI infrastructure. Right? Like, to use it, you actually have to have a fairly mature PKI infrastructure, and that's a big prerequisite for a lot of people that would be interested. Yeah. One more thing. So actually, Notation is also working with the Azure Trusted Signing Service to provide keyless signing experience. That would be That would be coming soon. All right. And you're not gonna stack a date on that, are you? You're just gonna leave me with the same here. Coming soon. Do a theater near you. Maybe spring, who

59:35 knows? Yeah. I mean, that this definitely is is addressing some of those challenges, and I would like to see projects like this get more uptake because security is often an afterthought. And I think that is because, you know, developers have a lot on their plate with cloud native and Kubernetes in general. Right? They're already going from, just wanted to write some code to know I'm responsible for shipping it and packaging it and distributing it and all of these things. And the easier and the more hurdles we can remove to allow people to use something like Notary and have attestations

1:00:00 and all these things. Would honestly, I I would love sorry to interrupt, but I would love where you can just it's done for you on your push command. Like, Docker push signed. You know? Like, how awesome would that be? Yeah. Work on that, please. I think that would be absolutely fantastic. And, you know, it's it's because you could have this whole integration in the cluster if we just speak into the registry and we get these keys and this key exchange and all this, we can do it for OpenID Connect and stuff. I'm sure there's ways

1:00:34 that we can bridge this. But it's all identifying challenges and solving them in time. Right? And I think the cloud native community in general is is really good at that. We're seeing so much progress with Kubernetes and all these other tools. I only have one more question, and then I'll leave you both to have any final thoughts that you guys to share. We've taken a look at ORAS as well now, and I love that we can push anything we want with ORAS. I'm curious about those type definitions that we have, the ability to see that this is

1:01:00 an image layer. This is an OLAMA model. Who's responsible and controls those definitions? Can I make them up to ship my Rawkode data layer and just type in whatever string I want, or is there some sort of OCI spec and registration process? Like, how does that work? There is predefined types in the spec that you'll wanna adhere to. It depends on what you're packaging, but if you're kind of packaging an arbitrary thing, you can you ought to pick, like, a a generic one, right, like what we do with TAR. So there's a couple to find. You can

1:01:33 define your own, but then it's not gonna give any help to, like, how it should actually be structured. But you totally can, if there's nothing on the end, expecting that format to be there. So, yeah, if you go to the OCI I think he had a link to it. But the OCI spec, the different type definitions, there's a number of generic ones that you could probably squeeze into, like, whatever file you're gonna just archive and zip it. You could use that that type like we did in the demo, or you can create your own if there's

1:01:58 no tooling again, like on the other side expecting a particular format. Nice. Awesome. Yeah. Actually, the industry is kind of wanting to standardize the AI model spec. There's a working group under CNCF named Cloud Native AI Model Spec Group. We are working with them to define the standards for package and managing AI models in Cloud Native world. Here's the open source repository with the AI model spec. Coming soon. We don't have commitment right now. It's still in a very stage. Yeah. Awesome. I mean, it's just nice to know that all these things are being on. And in

1:02:47 due course, you know, we're making the cloud native ecosystem and landscape that a little bit easier day by day. So, you know, thank you for your continued commitment and making things awesome for us. Do you either of you have any final words to share with the audience before we wrap this up? Just always welcome the feedback. Like, even just your couple notes on notation and the pain points and where that would be good. So, yeah, if anybody has any feedback from this session, things that you'd like to dive you deeper in or questions, you know, feel free to reach out to

1:03:13 us. We're both on many different social media platforms. So you can you can find us in all those places. But, yeah, just mainly feedback if anybody has any. More more than welcome to take that in. Alright. And for everyone watching this not live, all of the links and things that we have discussed will be in the description soon. I'm not gonna put time on that very soon. So go check them out and please feedback to the projects. The GitHub links will also be in the show notes below. Alright. Thank you both so much for taking

1:03:42 time out for your day to join us. Those are great demos, revealing aspects of cloud native and ORAZ and notation that I think people should just go and kick the tires on it and try it out and definitely provide feedback. So thank you both again, and I I hope you both have a lovely day and a lovely week. Thank you. Thanks for having us. Thank you, everyone. Bye, all. Native patterns are sure to please. Thank you for watching R OCI show. Your support helps Rawkode Thank you for watching Doctor. KELLY:

Meet the Cast

David Flanagan

@rawkode

Josh Duffney

@duffney

Feynman Zhou

@FeynmanZhou

Weekly Cloud Native insights

Stay ahead in cloud native

Tutorials, deep dives, and curated events. No fluff.

Documentation

Code

Notation GitHub Action

Cloud Native AI Model Spec repository

Additional Resources

Kubernetes blog post on OCI image volumes

More from Rawkode Live

View all 173 episodes

Hands-on Introduction to Odin

Hands-on Introduction to Odin

Hands-on Introduction to Iroh

Hands-on Introduction to Iroh

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Hands-on Introduction to sympozium

Hands-on Introduction to sympozium

Friday, January 23rd, 2026 - Chevron7

Friday, January 23rd, 2026 - Chevron7

Hands-on Introduction to jujutsu (jj)

Hands-on Introduction to jujutsu (jj)

More about ORAS

View technology

Securing Cloud-Native Workloads: Hands-On with Notary Project, ORAS, and Ratify

Securing Cloud-Native Workloads: Hands-On with Notary Project, ORAS, and Ratify

More about Notary Project

View technology

Securing Cloud-Native Workloads: Hands-On with Notary Project, ORAS, and Ratify

Securing Cloud-Native Workloads: Hands-On with Notary Project, ORAS, and Ratify

More about Kubernetes

View all 172 videos

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Navigating Kairos: Immutable Operating Systems with a Cloud Native Twist

Kubernetes Security Scanning: The 4 Tools You Actually Need

Kubernetes Security Scanning: The 4 Tools You Actually Need

More about Helm

View all 49 videos

Hands-on Introduction to Yoke

Hands-on Introduction to Yoke

Platform Engineering: Asking "Why"? with Evelyn Osman

Platform Engineering: Asking "Why"? with Evelyn Osman

Build a Production-Ready Kubernetes Cluster with Spectro Cloud Palette | No-Code Tutorial

Build a Production-Ready Kubernetes Cluster with Spectro Cloud Palette | No-Code Tutorial