About this video
What You'll Learn
- Use Guidepad environments to provide AWS credentials and run Python control planes.
- Model reconciliation with state machines and CloudTrail requirements that react to bucket deletions.
- Build S3 behavior in code, including create, ACL repair, rename, and undeploy handling.
Terraform's resource model only understands CRUD and polls for drift. Using Guidepad primitives (environments, control planes, state machines, requirements), we build an event-driven S3 provider in ~100 lines of Python that reconciles from CloudTrail events.
Jump to a chapter
- 0:00 Introduction
- 0:41 The Problem with Traditional IaC (Terraform Model)
- 1:39 The Problem
- 3:22 The Future of IaC: Real-time & Event-Driven Reconciliation
- 4:19 Terraform Way
- 4:20 What is Guidepad? Platform Overview
- 5:16 Guidepad Environments and Control Planes
- 6:14 Terraform Control Plane in Guidepad (Briefly)
- 7:04 Guidepad State Management
- 8:05 Guidepad Way
- 8:08 Building the Event-Driven Solution with Guidepad Primitives
- 8:34 Deep Dive into the Event-Driven S3 Service Code (Python)
- 10:01 Advantages of Custom Code (Handling Complex Scenarios)
- 11:40 Guidepad State Machines for Resource Management
- 12:25 **Demonstration:** Event-Driven S3 Bucket Reconciliation
- 13:53 How it Works: CloudTrail Requirements & Event Detection
- 14:56 Guidepad User Interface Overview
- 17:09 Conclusion
- 17:11 Recap: Guidepad Primitives Used
- 18:55 Broader Applications & Potential
- 19:29 Conclusion & Final Thoughts
Full transcript
Generated from the English captions. Timestamps jump the player to that moment.
Read the full transcript
0:00 Introduction
0:00 If you follow me in Twitter, you no doubt will have seen that I have some pretty spicy takes on infrastructure as code. And you know what? I feel like I am allowed. I've been doing infrastructure as code with pretty much every tool there is since around 2015. I even spent twelve months working at Pulumi, one of the many infrastructure code tools there are. And my specialty take is that while all these tools have gotten us so far, they're not what we need to take the next step. And the problem is the TerraForm resource model. It relies on individually managing resources
0:41 The Problem with Traditional IaC (Terraform Model)
0:49 and polling to check for drift. The Terraform resource model has no understanding of real time or event. So what if that were to change? We're gonna take a look at what the future of IAC could look like if we built a real time event driven system to manage individual resources, and not even just individual resources, but build in the ability for actor based resources where resources communicate with one another. Now we can't dive into all of this today because it just doesn't exist. But I set a challenge for the team at Guidepad. Can you take us one step closer to
1:34 where we need to be? Let's take a look. So let's focus on the problem statement first. Why is the Terraform resource model broken? Well, maybe broken is a strong word, but that set the scene. You write enough HCL to have Terraform create an s three bucket with an ACL and access control list that dictates whether it's public or private and so forth. You run Terraform plan and it tells you that it wants to create a bucket and set the ACL. You do a Terraform apply and the job is done. Now, unfortunately, that's not really enough.
1:39 The Problem
2:20 There are still ways for people to modify and tweak that resource, notably, the AWS portal. And while I'm using AWS for the example, this is not unique to them and is a case for all cloud providers. The challenge is someone can go into the admin interface, the portal, whatever you wanna call it, the console, and start to make changes through the UI. We sometimes call this click ops because you'd make changes with a click. The way that Terraform or any other infrastructure's code tool will detect this change is to then rerun a Terraform plan or apply to
3:02 detect that the desired state is different from the current state. This gives us a plan that may have to recreate a bucket if it's been deleted or fix the ACL. And this is just a trivial case. And even with this trivial case, there are some huge problems. What if the future of IAC can account for all of these external influences on our resource. Now how would that work? Well, fortunately, with pretty much every cloud provider, we have access to some sort of audit log, which is a real time or near real time depending on your cloud, set of
3:22 The Future of IaC: Real-time & Event-Driven Reconciliation
3:43 events that tell us everything that happened within our accounts. So whatever IAC tool could detect a deleted bucket or a change in ACL and trigger the reconciliation for us. And that's what we're gonna try to demonstrate today. And not just that, I'm gonna show you how to use Guidepad just to execute Terraform on Guidepad because, well, why the hell not? But Guidepad gives us the primitives to take things up one step closer to where we need to be. I hope you understand the problem. I hope you're excited to see a potential solution. Let's have some fun. So let's do a
4:20 What is Guidepad? Platform Overview
4:20 very quick recap about what Guidepad is and how it works. Guidepad is a meta platform that helps you build applications. It's data driven, meaning that you provide your data, your values using the Guidepad type system in a declarative fashion, and it makes magic happen. Now what's actually happening under the hood is that we could request for a service to be running. Guidepad is composed of multiple environments which have control planes, classes. We'll get into this in a second, which will then orchestrate and manage those services within those those environments to have access to secrets, load balancers,
5:09 ingress, and all the bits that you need for an application to be deployed to production. If we run Guidepad entity list environments, like so, you will see that I have two environments available on my Guidepad instance. These are two environments I can use to run Guidepad services. Now you may have seen I deleted an attribute where we were requesting the control plane class. And I just want to show this in its full value so we can see what's happening. But our AWS environment is configured to specifically run Python code as we can see by the control plane
5:16 Guidepad Environments and Control Planes
5:53 class. Now this control plane class could be Rust, Go, Kubernetes, Ansible, and so forth. Guidepad already supports a bunch of different types of environments off the bat that you can use. And, of course, based on today's video, one of them is Terraform. So what do these control plane classes look like? Let's go into the Guidepad source code where we can open Guidepad, environment, and control planes. If we pop open Terraform, we will see the code. Now all this does is give Guidepad enough information to tell it how to execute Terraform programs. It knows it needs to do a Terraform
6:14 Terraform Control Plane in Guidepad (Briefly)
6:40 edit followed by a Terraform apply. Now because we're running this in an automated fashion, of course, we need to auto approve and there's not much point in running a plan. However, that doesn't mean you couldn't have a bespoke control plane that executes the plan and publishes it to a Slack channel, where it then waits for another event before executing the apply. With Guidepad, anything is possible. What's also very cool is that Guidepad handles state management for you. It stores the outputted state files as artifacts within the Guidepad system, meaning you don't need to worry about where
7:04 Guidepad State Management
7:17 to put it and so forth. With this control plane, we could, in theory, create a new service, deploy it to Guidepad, and all we'd have to do is provide the HCL. It would run Terraform on a loop based on an interval that we configure, and that would be job done. But, of course, you can do that with your CICD systems, with a cron job interval, and there wouldn't really be a lot of value proposition other than a managed state file and secrets being in a consolidated place. So, yeah, there are some benefits. However, we want to talk about the future of
7:55 infrastructure as code, And that means real time and event driven. So we're not gonna use the Terraform control plane, although it exists. We're going to try something different. And as I said, I gave this challenge to the Guidepad team. And I wanna just highlight that when I gave this to them, they came back to me with a solution and not months, weeks, but days. Two days to be exact. And we'll take a look at the code. It's probably less than 200 lines of Python. Let's take a look. So let's take a look at our service.
8:34 Deep Dive into the Event-Driven S3 Service Code (Python)
8:37 It is called s three provider, and it is 105 lines long. Now there's a lot of things that were implemented here that were done just to keep things simple and quick, like embedding Python code into a variable. We could fetch us another way by using a git repository that we pull in, loading local files or remote files and so forth. However, for today, we have a very simple Python program This is the boto package to speak to the AWS API. The bucket name is just going to be the name of the service. If we want
9:16 to manage more than one bucket, we deploy more than one service. We could take this a step further and actually define a bucket type within Guidepad, and then we're just creating a type that service manages the resources. But for today's demo, let's keep things simple. This fetches our s three credentials from the environment and then does a list buckets. If it doesn't find a bucket, it creates a web or configured ACL. And if it does find the bucket, it just makes sure that the ACL is correct. Now there's a few things I wanna point out here because I'm sure you're staring at
9:57 this and yelling going, why are we doing it this way? Yes. We are writing handcrafted Python code to handle resource creation on a cloud provider. But let me reiterate that the Terraform resource model is broken. It only understands CRUD, create, read, update, and delete. It understands that you can update some resources, but there are times where it'll have to do a delete and a recreate. With the Python approach, we can build in much more smarts. At the start of this video, I talked about renaming a bucket. And I'm sure a few of you realized that, well, in most
10:01 Advantages of Custom Code (Handling Complex Scenarios)
10:38 cloud providers, you can't rename the buckets. You have to actually create a new one and copy over all the files and then delete the old one. And Terraform would delete the bucket. However, it wouldn't copy over all your files. With this approach, we could bake that in. We could create a new bucket, copy all the files, and delete the old because we're right in the code. We get to decide how this works. For an undeploy, all we're doing is deleting all the objects and our bucket. Again, for this, we could decide that we want to keep it back up. Now, of
11:17 course, there are life cycle policies that we could apply across AWS and other cloud providers. But again, we're trying to keep things relatively simple for today. The goal here is not to show you a future complete solution, but give you an idea of how you could build up using Guidepad primitives, a future facing infrastructure as code tool. So stay with me. Now, we're using a state machine as part of this service to understand what phases we go through. Now we keep things simple in this demo where we go from deployed state to deployed state, just always running a Terraform apply.
11:40 Guidepad State Machines for Resource Management
11:55 But this could be elaborated on. And in fact, I think I may spend a little bit more time on a future video building something a bit more feature complete. But all we're doing is telling it to run the deploy commands for deploy and the undeploys for the not deploy plan. Simple. We then configure the state plan with the two different phases, deployed and not deployed. And job done. So let's deploy this and then see what happens. So the first thing we're going to do is run AWS s three API get ACL for our existing bucket.
12:25 **Demonstration:** Event-Driven S3 Bucket Reconciliation
12:45 Well, we have a bucket. Now what? Well, let's run our Guidepad service like so. We can then come down here and using the AWS SD API on the command line, we're just going to delete our bucket, like so. And what's going to happen is our service is going to start, realize the bucket doesn't exist and create it. We'll then delete it again. This time where it's going to detect the CloudTrail event and then rerun the create command. And as we can see, the bucket has been recreated. Now, we can continue to delete this as many times as we want and it will
13:39 continue to come back because we have a long running process which is pulling the AWS API as well as subscribing to the CloudTrail log to monitor for events. So how do we make this work with Guidepad? Let's take a look at some code. Inside of our plugin, we have requirements dot py. In this, we define a new class which extend requirement called CloudTrail requirement. All this requirement is doing is allowing us to set a number of lookups and parameters to filter the CloudTrail log and then to trigger things within Guidepad. At this case, we're using a state machine and
13:53 How it Works: CloudTrail Requirements & Event Detection
14:23 a state transition. Our state transition has some CloudTrailer requirements that if they become true, the transition will be executed from deployed to deployed, meaning reconciled. Now we could have different handlers for each type of event and have an array of phases where we handle lots of specific scenarios, but that would require me sitting here for hours upon hours showing you all the amazing stuff that we can do. And we want to keep this brief. So let's come over to Guidepad and we're going to take a look at our state transition. Specifically, want to see the requirements.
14:56 Guidepad User Interface Overview
15:07 Now what we can see here is that we have a logical or and a list of requirements with the first one being a type of AWS CloudTrail where is monitoring for an event name put bucket public access block. We then have another AWS CloudTrail where the event name is delete bucket. So if any of these events come through, satisfy our requirements, our state transition will be executed and voila, everything that we want to be true for our bucket remains to be true. Now, of course, because everything in Guidepad is is data written to a system, we can query in this
15:50 way. You may be thinking, isn't there something a little bit nicer when I don't have to learn a ton of commands? And yes, there is a UI. From here, we can see my Guidepad dashboard. We have lots of services, lots of deployments, lots of events, a pretty solid overview of everything happening at the moment. We can click on state machines where we can take a look at our s three provider and we can see that it's a relatively trivial state machine that just goes from deployed to deployed. But you can imagine as you build out
16:28 more complicated scenarios with lots of stages and phases and transitions, that this diagram is a very good visual representation of what your provider is doing. We could also dig into the state plan where we can take a look at our SV provider and we can see the code that we've seen earlier inside of our plugin and the Python file. And you can browse anything you want using the UI. I jump it over to types. You can see the types available within my plugin itself and the global Guidepad system. Like so. So that's a rough idea
17:11 Recap: Guidepad Primitives Used
17:13 of what we can do to improve infrastructure as code using a system like Guidepad. It provides a whole bunch of primitives that allow you to build anything you can dream up. In this case, we took advantage of Guidepad environment and control planes. We used an AWS environment, which means our service had the ability to query the AWS API because the environment provided secret credentials that could be consumed at runtime. We used a very specific control plane that allowed us to execute Python. And just as reminder, that control plane could be written by you in less than a hundred lines of code
17:56 to run Go, Rust, Perl, PHP, OCaml, whatever you want. Then use the state machine and state transition primitives from Guidepad, which take requirements. These requirements allow you to hook into any system. That could be CloudTrail, it could be Webhooks, it could be Syslog, it could be container logs. You can filter and transform those events and handle them to emit more events. And when those requirements are satisfied, you can execute a service or a state transition. That state transition is part of a state machine that can have one, two, or more phases depending on the complexity of the reconciliation that
18:43 you want, allowing you to build in real time or near time event loops for handling events within your infrastructure. CloudTrail and an s three bucket was just the easiest implementation to show you for today's video. There's no reason those requirements couldn't hook into the Kubernetes audit log, allowing you to get information out of your Kubernetes cluster to detect potential bad actors. You could also monitor DNS records, TLS certificate expiry times, and anything else than your infrastructure through a single control plane, which allows you to use a command line interface or a UI to build that understanding.
19:29 Conclusion & Final Thoughts
19:29 I've struggled to decide or describe what Guidepad really is, but I am confident that it is a new platform for building sophisticated, distributed event driven applications. I hope you enjoyed this video. We'll see you all next time. And to the Guidepad team, thank you for letting me play with this. It's been a whole lot of fun.
Technologies featured
Stay ahead in cloud native
Tutorials, deep dives, and curated events. No fluff.
Comments