Reading Puppet: The Transaction

So far in this blog series, we’ve talked about the configurer and pluginsync, and how those both use the catalog. However, these don’t go into the nuts and bolts mechanisms of the actual application method of the catalog. The catalog itself is only really a data structure, and when you call catalog.apply, the catalog itself actually hands off all of the work to the Transaction.

The Transaction is the part of Puppet that drives the actual state changes performed on the system. Part of this process involves the handling of dependencies and ordering for resources (which is harder than you might expect). Another part of this is taking a single resource, determining what state that it’s currently in, and applying changes if they’re required. Another role of the Transaction is recording all of these events, like resources being out of sync and how they were synced, and logging them accordingly.

The Puppet::Transaction is designed in a similar manner to the Puppet::Configurer class insofar as they both act as what I’m going to call “execution classes.” That is, they are built to coordinate and perform a set of complex operations. With things like these, you’re unlikely to do more with a Transaction than instantiate an object and then call a method that kicks off all of the resulting behavior. In the case of the Configurer this is the #run method; in the case of the Transaction it’s the #evaluate method.

NOTE:

While traversing the graph and the ordering used for this traversal is core to the Transaction, I’m going to blithely skip over it because this is a very big topic by itself. Instead, I’m going to mainly focus on how the data from a catalog gets converted to actual changes on a system.

Before we dive into the work that the Transaction performs, let’s look at the data that it’s going to be using.

Instance Attributes

attr_accessor :catalog

Class: Puppet::Resource::Catalog

When the catalog#apply is called, the catalog creates a new transaction and passes itself to the new Transaction. This is how the Transaction is able to inspect the Catalog that created it.

attr_accessor :ignoreschedules

Class: Boolean

Puppet can be configured to only apply resources during specific resource windows, so you can avoid running resource intensive commands during peak times. The :ignoreschedules option can be turned on to ensure that all resources are applied. This is handled in a somewhat strange way, because :ignoreschedules is also a respected Puppet option, and Puppet checks on a per resources basis to see if the system-wide option is set. It seems to be a code path duplication.

attr_accessor :for_network_device

Class: Boolean

Puppet is able to configure network devices like F5 load balancers and Cisco switches via a proxy host; if this catalog is destined for a device then this will be enabled. This option is used to prevent insane things from happening, like device resources being applied to a host and vice versa.

attr_reader :report

Class: Puppet::Transaction::Report

The report is instantiated by the Configurer and passed to the Catalog, which then hands the report instance to the Transaction. Since the Configurer is responsible for sending the report at the end of the run it’s created there, and the Transaction passes all events, such as state changes to the report, and the Catalog acts as an intermediary between the two.

attr_reader :event_manager

Class: Puppet::Transaction::EventManager

The event manager is responsible for inter-resource events. When you have resources doing notifications and subscriptions, this class is responsible for propagating events. So say if you have a bunch of configuration files that will notify a services, as they’re processed this will queue up the notifications. When the resource is evaluated, it’ll process all of those queued events.

attr_reader :resource_harness

Class: Puppet::Transaction::ResourceHarness

The resource harness is the actual component that takes a resource, inspects its state, ensure that it’s present or absent, and then synchronizes each one of the properties. This is the actual point where all the higher layers of Puppet like the Catalog and Transaction interact with the underlying layer of Types and Providers.

attr_reader :prefetched_providers

Class: Hash<String:'type name', Hash<String:'provider class name', Class<Puppet::Provider>>>

Holy nested data structures, batman! This attribute is used to track which provider instances have been prefetched. Since prefetching is lazy and only occurs before the first instance of that provider is evaluated, this is used to track what has and has not been prefetched.


There are also a few attributes that aren’t actually referenced in any of the code, which is interesting. Their presence doesn’t hurt anything so they linger on. They’re like the hips on whales; entirely vestigial.

attr_accesor :configurator

Source code isn’t generally read top to bottom, and when you do this some interesting things jump out. This seems to be a reference to the Configurer, but funny story - it’s never used. After some serious git bisecting, I tracked this down to this commit, and while the changes were removed, the attr_accessor was never removed. Since it didn’t hurt anything, it hasn’t been removed, and has been left adrift in a sea of code, a remnant of the distant past…

attr_accesor :component

This appears to be a reference to the Puppet Component type, but it’s not used anywhere either. As a matter of fact, I wasn’t able to actually figure out when it was actually used.

Evaluating a transaction

So now that we have a general idea of what sort of data the Transaction is going to be using, let’s see how it does all sorts of neat things, starting at #evaluate.

#evaluate

The #evaluate function itself is reasonably simple at first glance. It calls #add_dynamically_generated_resources, and then proceeds to run over all resources in the graph and call #eval_resource on each one. If you are using the --evaltrace option, then it will print out how long it took to evaluate each resource. If you haven’t used the --evaltrace resource, definitely give it a shot - it can give you a lot of insight as to what resources are taking the most time.

So this method leads us to two methods, each of which are rabbit holes in their own right:

#add_dynamically_generated_resources

Some resources need to be able to generate other resources in order to do their job - for instance, some resource might need to ensure that other resources are present or absent. The tidy and resources types are the primary examples of this; these resources primarily exist to remove or manipulate other resources. The #add_dynamically_generated_resources method itself is pretty simple; it runs over all the resources in the catalog and passes them to the #generate_additional_resources method. This obviously named method checks a single resource to see if it responds to the #generate method, and if it does respond to that method then it’s run and those resources are added to the catalog.

#eval_resource

This method is an important point of demarcation. Before we reached this point, everything was about high level constructs - fetching catalogs, applying them, generating reports, and so on. There was very little focus on the actual application of resources. From here on out though, we’re drilling down to the the actual method calls that cause state changes.

This method checks to see if a resource should be applied with the #skip? method, and if it should, then it’s passed to #apply for evaluation. After this is done, the resource is passed to the event manager to handle any events that the resource generated.

#skip?

This method runs over the checks to see if the resource should be/can be applied. It has to check the following:

If any condition fails, then this resource is skipped and Puppet goes on its merry way. If you’ve ever seen this line…

“Skipping because of failed dependencies”

Then this is the method that notes the failure, throws this warning, and skips the resource.

#apply

This method is responsible for handing the resource over to the resource harness, recording the result of the evaluation, and queuing up events emitted by the resource during the valuation. This is the last method call inside of the Transaction before the resource is handed off to Puppet::Transaction::ResourceHarness#evaluate


The Resource Harness

While we’ve been continually drilling down through high level concepts and degrees of abstraction, the transition from the Transaction to the ResourceHarness is the last level of abstraction that we have to penetrate. A lot of work has been invested in Puppet to keep different components separate and delegate responsibilities.

This is the reason that the Transaction and the ResourceHarness were split; the Transaction handles the details of how an entire catalog is applied, while the ResourceHarness handles just the details of how a single resource is applied. While the responsibilities of the ResourceHarness are really just a subset of the Transaction, this split means that all details of applying a single resource is contained in a single fairly small source file.

On top of handling resource application, the ResourceHarness also performs auditing. Auditing is a lesser known Puppet feature that is powered by the way that Puppet does modelling. For every resource, Puppet has a method of determining what the current state is, and what it should be. Auditing a resource allows you to specifically track resource changes to distinguish them from normal state changes. In addition, you can audit a resource without actually managing it, so you can use Puppet in a tripwire fashion to make sure that the system hasn’t been tampered with - for instance, you can audit /etc/shadow for changes without smashing people’s passwords.

Sadly, this code is a bit hairy. In fact, the complexity of this file is one of the reasons I started this blog, because no matter how many times I read this file I couldn’t retain understanding of how it works. The tests for this code are a bit insane; there are around a thousand test cases applied to this to make sure it’s behaving correctly. A while back there was a major rewrite of the Resource Harness, and since then there were only a handful of bugs in the code that were rapidly squashed, which indicates that this code is quite solid. Even so… it’s hairy.

#evaluate

The #evaluate method hints that Puppet::Transaction::ResourceHarness is another “execution class,” meaning that this class is a sort of fire and forget object. A ResourceHarness object is called with #evaluate and passed a resource, and it does the necessary work to apply the resource to the system.

On top of applying a resource, this method generates a Puppet::Resource::Status, and captures all events to this Status object that were generated when the resource was applied.

It functions by starting off some timing functions, calls #perform_changes on the resource, and captures the events returned by this method. If the resource responds to #flush, then it calls this method to perform any sort of final actions desired by this resource, such as closing database connections or writing files to disk.

#perform_changes

This is the method that directly inspects your system. It takes a resource, configures auditing for that resource if desired, checks for properties that are out of sync and synchronizes them accordingly, and generates events when things change. This method is the absolute core of Puppet, so there’s a lot of moving parts. Let’s break this into the different tasks that are being performed.

Configure auditing

For auditing to work, Puppet needs to have a few snapshots of the resource before any changes were made. The state of the resource from the last run is loaded from Puppet::Util::Storage so that Puppet can determine if the resource was modified outside of Puppet. This means that we have three states for a given resource:

Once the historical state from the last Puppet run is loaded and the current state is recorded, the current state is flushed to disk as the historical state for the next run of Puppet.

Side note: You know that Puppet::Util::Storage bit? That was created in the Configurer during preparation; so this is one of those places there that this generic cache is used.

Checking and syncing properties

The next thing to be done is determining what work needs to be done. There are two main cases for this: if the ensure parameter is out of sync then the resource needs to be created or destroyed, else the properties need to be sync.

It may seem strange that if the ensure value is out of sync, then it’s the only property being synced. However, if the ensure parameter is out of sync then the resource is either absent and needs to be created, or exists and needs to be removed. The implication of this code is that if something is absent that should be present, making it present will synchronize all of its values. Alternately, if a resource exists but should be absent, we don’t need to touch any other properties if we’re about to blow away the entire resource.

If the resource has the right ensure value, then each one of the properties is checked. All of the resource properties are fetched and checked to see if they’re in sync. This is a very big point that helps clear up the difference between parameters and properties. Simply put, parameters do not have state. When applying a resource, Puppet does not do anything with parameters by themselves. They may serve to alter how a resource is applied, but it doesn’t directly matter if a parameter changes state.

If any property is out of sync, ensure included, it’s handed over to #apply_parameter for the actual synchronization.

Record events

Puppet has a very robust event system, and since this method does a great deal of work it has a lot of events to record. Every time a property is changed, that emits an event that needs to be captured. These events drive the subscribe and notify Puppet metaparameters, and events also make up the logs component of the Report. These events are generated when the property is handed over to #apply_resources.

Record the change of any audited parameters

As properties are synchronized, if any of those properties is audited then that state change will be specially recorded for auditing. However, Puppet can audit resources that it doesn’t manage. After the normal properties were synchronized, all of the unmanaged but audited resources are checked to see if the historical values match with the current values.


So #perform_changes applies state for a single resource. However, we need to drop down one more level to see how the different parts of a resource are applied. This final leap goes from a single resource, to a single property of a given resource.

#apply_parameter

This method takes the responsibilities that #perform_changes has on a resource and applies them to a single property. It takes the following arguments:

The really annoying bit is that this method is very poorly named. This method will never apply an object of class Puppet::Parameter by itself (unless something has gone horribly, HORRIBLY wrong). Instead, it will only apply objects of Puppet::Property. When you see parameter in this context, think property. Yes, this is really fucking confusing.

Here’s how it works.

If the property is being audited, there’s a historical value, and it doesn’t match the current value, that means that something outside of Puppet changed this property, and an audit message is generated.

If the property is marked as noop, then an event is generated noting the current state and the (unapplied) desired state.

“current_value is X, should be Y (noop)”

This is the section of code that generates those events.

Otherwise, the property is updated on the system by calling property.sync. The changes made are noted in an event.

This method call marks the crossover from the Catalog and Transaction to the Resource Abstraction Layer, which are the Types and Providers that you’re familiar with.

Any event generated by this method is passed back to #perform_changes, which passes them back to the Transaction for further handling.


So, that’s it. That’s how we go from a catalog to actual system state changes. Hopefully this demystifies some of the stuffing of Puppet.

The Transaction is the core of a Puppet run; it takes an abstract catalog and turns it into actual state changes. After this point, the next thing to look at is the Resource Abstraction Layer, or RAL. This is the type and provider layer, which is another big part of what makes Puppet so unique and powerful.

I would like to take this moment to proudly announce that this is my longest blog entry. I like words. Congratulations on making it all the way down here.

Addendum: Puppet 2.7.19 was used as the reference version for this blog.