So far in this blog series, we’ve talked about the configurer and
pluginsync, and how those both use the catalog.
However, these don’t go into the nuts and bolts mechanisms of the actual
application method of the catalog. The catalog itself is only really a data
structure, and when you call catalog.apply, the catalog itself actually hands
off all of the work to the Transaction.
The Transaction is the part of Puppet that drives the actual state changes performed on the system. Part of this process involves the handling of dependencies and ordering for resources (which is harder than you might expect). Another part of this is taking a single resource, determining what state that it’s currently in, and applying changes if they’re required. Another role of the Transaction is recording all of these events, like resources being out of sync and how they were synced, and logging them accordingly.
The Puppet::Transaction is designed in a similar manner to the
Puppet::Configurer class insofar as they both act as what I’m going to call
“execution classes.” That is, they are built to coordinate and perform a set of
complex operations. With things like these, you’re unlikely to do more with a
Transaction than instantiate an object and then call a method that kicks off all
of the resulting behavior. In the case of the Configurer this is the #run
method; in the case of the Transaction it’s the #evaluate method.
NOTE:
While traversing the graph and the ordering used for this traversal is core to the Transaction, I’m going to blithely skip over it because this is a very big topic by itself. Instead, I’m going to mainly focus on how the data from a catalog gets converted to actual changes on a system.
Before we dive into the work that the Transaction performs, let’s look at the data that it’s going to be using.
Instance Attributes
attr_accessor :catalog
Class: Puppet::Resource::Catalog
When the catalog#apply is called, the catalog creates a new transaction and
passes itself to the new Transaction. This is how the Transaction is able to
inspect the Catalog that created it.
attr_accessor :ignoreschedules
Class: Boolean
Puppet can be configured to only apply resources during specific resource windows, so you can avoid running resource intensive commands during peak times. The :ignoreschedules option can be turned on to ensure that all resources are applied. This is handled in a somewhat strange way, because :ignoreschedules is also a respected Puppet option, and Puppet checks on a per resources basis to see if the system-wide option is set. It seems to be a code path duplication.
attr_accessor :for_network_device
Class: Boolean
Puppet is able to configure network devices like F5 load balancers and Cisco switches via a proxy host; if this catalog is destined for a device then this will be enabled. This option is used to prevent insane things from happening, like device resources being applied to a host and vice versa.
attr_reader :report
Class: Puppet::Transaction::Report
The report is instantiated by the Configurer and passed to the Catalog, which then hands the report instance to the Transaction. Since the Configurer is responsible for sending the report at the end of the run it’s created there, and the Transaction passes all events, such as state changes to the report, and the Catalog acts as an intermediary between the two.
attr_reader :event_manager
Class: Puppet::Transaction::EventManager
The event manager is responsible for inter-resource events. When you have resources doing notifications and subscriptions, this class is responsible for propagating events. So say if you have a bunch of configuration files that will notify a services, as they’re processed this will queue up the notifications. When the resource is evaluated, it’ll process all of those queued events.
attr_reader :resource_harness
Class: Puppet::Transaction::ResourceHarness
The resource harness is the actual component that takes a resource, inspects its state, ensure that it’s present or absent, and then synchronizes each one of the properties. This is the actual point where all the higher layers of Puppet like the Catalog and Transaction interact with the underlying layer of Types and Providers.
attr_reader :prefetched_providers
Class: Hash<String:'type name', Hash<String:'provider class name', Class<Puppet::Provider>>>
Holy nested data structures, batman! This attribute is used to track which provider instances have been prefetched. Since prefetching is lazy and only occurs before the first instance of that provider is evaluated, this is used to track what has and has not been prefetched.
There are also a few attributes that aren’t actually referenced in any of the code, which is interesting. Their presence doesn’t hurt anything so they linger on. They’re like the hips on whales; entirely vestigial.
attr_accesor :configurator
Source code isn’t generally read top to bottom, and when you do this some interesting things jump out. This seems to be a reference to the Configurer, but funny story - it’s never used. After some serious git bisecting, I tracked this down to this commit, and while the changes were removed, the attr_accessor was never removed. Since it didn’t hurt anything, it hasn’t been removed, and has been left adrift in a sea of code, a remnant of the distant past…
attr_accesor :component
This appears to be a reference to the Puppet Component type, but it’s not used anywhere either. As a matter of fact, I wasn’t able to actually figure out when it was actually used.
Evaluating a transaction
So now that we have a general idea of what sort of data the Transaction is
going to be using, let’s see how it does all sorts of neat things, starting at
#evaluate.
#evaluate
The #evaluate function itself is reasonably simple at first glance. It calls
#add_dynamically_generated_resources, and then proceeds to run over all
resources in the graph and call #eval_resource on each one. If you are using
the --evaltrace option, then it will print out how long it took to evaluate
each resource. If you haven’t used the --evaltrace resource, definitely give
it a shot - it can give you a lot of insight as to what resources are taking the
most time.
So this method leads us to two methods, each of which are rabbit holes in their own right:
- Generate additional resources at runtime, with the
#add_dynamically_generated_resourcesmethod. - Traverse the graph and for each resource, evaluate it with the
#eval_resourcemethod.
#add_dynamically_generated_resources
Some resources need to be able to generate other resources in order to do their
job - for instance, some resource might need to ensure that other resources are
present or absent. The tidy and resources types are the primary examples
of this; these resources primarily exist to remove or manipulate other
resources. The #add_dynamically_generated_resources method itself is pretty
simple; it runs over all the resources in the catalog and passes them to the
#generate_additional_resources method. This obviously named method checks a
single resource to see if it responds to the #generate method, and if it does
respond to that method then it’s run and those resources are added to the
catalog.
#eval_resource
This method is an important point of demarcation. Before we reached this point, everything was about high level constructs - fetching catalogs, applying them, generating reports, and so on. There was very little focus on the actual application of resources. From here on out though, we’re drilling down to the the actual method calls that cause state changes.
This method checks to see if a resource should be applied with the #skip?
method, and if it should, then it’s passed to #apply for evaluation. After
this is done, the resource is passed to the event manager to handle any events
that the resource generated.
#skip?
This method runs over the checks to see if the resource should be/can be applied. It has to check the following:
- If we’re only applying resources with a certain tag, does this resource have that tag?
- If the resource is scheduled, are we in that schedule window?
- Does this resource have any failed dependencies?
- Is this resource entirely virtual?
- Is this resource on the right device type? IE, is this a device resource
and the
:for_resource_deviceattribute has been set?
If any condition fails, then this resource is skipped and Puppet goes on its merry way. If you’ve ever seen this line…
“Skipping because of failed dependencies”
Then this is the method that notes the failure, throws this warning, and skips the resource.
#apply
This method is responsible for handing the resource over to the resource
harness, recording the result of the evaluation, and queuing up events emitted
by the resource during the valuation. This is the last method call inside of the
Transaction before the resource is handed off to
Puppet::Transaction::ResourceHarness#evaluate
The Resource Harness
While we’ve been continually drilling down through high level concepts and degrees of abstraction, the transition from the Transaction to the ResourceHarness is the last level of abstraction that we have to penetrate. A lot of work has been invested in Puppet to keep different components separate and delegate responsibilities.
This is the reason that the Transaction and the ResourceHarness were split; the Transaction handles the details of how an entire catalog is applied, while the ResourceHarness handles just the details of how a single resource is applied. While the responsibilities of the ResourceHarness are really just a subset of the Transaction, this split means that all details of applying a single resource is contained in a single fairly small source file.
On top of handling resource application, the ResourceHarness also performs auditing. Auditing is a lesser known Puppet feature that is powered by the way that Puppet does modelling. For every resource, Puppet has a method of determining what the current state is, and what it should be. Auditing a resource allows you to specifically track resource changes to distinguish them from normal state changes. In addition, you can audit a resource without actually managing it, so you can use Puppet in a tripwire fashion to make sure that the system hasn’t been tampered with - for instance, you can audit /etc/shadow for changes without smashing people’s passwords.
Sadly, this code is a bit hairy. In fact, the complexity of this file is one of the reasons I started this blog, because no matter how many times I read this file I couldn’t retain understanding of how it works. The tests for this code are a bit insane; there are around a thousand test cases applied to this to make sure it’s behaving correctly. A while back there was a major rewrite of the Resource Harness, and since then there were only a handful of bugs in the code that were rapidly squashed, which indicates that this code is quite solid. Even so… it’s hairy.
#evaluate
The #evaluate method hints that Puppet::Transaction::ResourceHarness is
another “execution class,” meaning that this class is a sort of fire and forget
object. A ResourceHarness object is called with #evaluate and passed a
resource, and it does the necessary work to apply the resource to the system.
On top of applying a resource, this method generates a Puppet::Resource::Status, and captures all events to this Status object that were generated when the resource was applied.
It functions by starting off some timing functions, calls #perform_changes on
the resource, and captures the events returned by this method. If the resource
responds to #flush, then it calls this method to perform any sort of final
actions desired by this resource, such as closing database connections or
writing files to disk.
#perform_changes
This is the method that directly inspects your system. It takes a resource, configures auditing for that resource if desired, checks for properties that are out of sync and synchronizes them accordingly, and generates events when things change. This method is the absolute core of Puppet, so there’s a lot of moving parts. Let’s break this into the different tasks that are being performed.
Configure auditing
For auditing to work, Puppet needs to have a few snapshots of the resource before any changes were made. The state of the resource from the last run is loaded from Puppet::Util::Storage so that Puppet can determine if the resource was modified outside of Puppet. This means that we have three states for a given resource:
- The state of the resource from the last Puppet run
- The state of the resource right now
- The desired state of the resource if we’re managing it.
Once the historical state from the last Puppet run is loaded and the current state is recorded, the current state is flushed to disk as the historical state for the next run of Puppet.
Side note: You know that Puppet::Util::Storage bit? That was created in the Configurer during preparation; so this is one of those places there that this generic cache is used.
Checking and syncing properties
The next thing to be done is determining what work needs to be done. There are two main cases for this: if the ensure parameter is out of sync then the resource needs to be created or destroyed, else the properties need to be sync.
It may seem strange that if the ensure value is out of sync, then it’s the only property being synced. However, if the ensure parameter is out of sync then the resource is either absent and needs to be created, or exists and needs to be removed. The implication of this code is that if something is absent that should be present, making it present will synchronize all of its values. Alternately, if a resource exists but should be absent, we don’t need to touch any other properties if we’re about to blow away the entire resource.
If the resource has the right ensure value, then each one of the properties is checked. All of the resource properties are fetched and checked to see if they’re in sync. This is a very big point that helps clear up the difference between parameters and properties. Simply put, parameters do not have state. When applying a resource, Puppet does not do anything with parameters by themselves. They may serve to alter how a resource is applied, but it doesn’t directly matter if a parameter changes state.
If any property is out of sync, ensure included, it’s handed over to
#apply_parameter for the actual synchronization.
Record events
Puppet has a very robust event system, and since this method does a great deal
of work it has a lot of events to record. Every time a property is changed, that
emits an event that needs to be captured. These events drive the subscribe
and notify Puppet metaparameters, and events also make up the logs component
of the Report. These events are generated when the property is handed over to
#apply_resources.
Record the change of any audited parameters
As properties are synchronized, if any of those properties is audited then that state change will be specially recorded for auditing. However, Puppet can audit resources that it doesn’t manage. After the normal properties were synchronized, all of the unmanaged but audited resources are checked to see if the historical values match with the current values.
So #perform_changes applies state for a single resource. However, we need to
drop down one more level to see how the different parts of a resource are
applied. This final leap goes from a single resource, to a single property of a
given resource.
#apply_parameter
This method takes the responsibilities that #perform_changes has on a resource
and applies them to a single property. It takes the following arguments:
-
property[Puppet::Property]: The resource property to apply. -
current_value: [String] The state of the resource as it is right now, as a string. -
do_audit: [Boolean] Whether this property is being audited -
historical_value: The value of this property on the last run of Puppet.
The really annoying bit is that this method is very poorly named. This method will never apply an object of class Puppet::Parameter by itself (unless something has gone horribly, HORRIBLY wrong). Instead, it will only apply objects of Puppet::Property. When you see parameter in this context, think property. Yes, this is really fucking confusing.
Here’s how it works.
If the property is being audited, there’s a historical value, and it doesn’t match the current value, that means that something outside of Puppet changed this property, and an audit message is generated.
If the property is marked as noop, then an event is generated noting the current state and the (unapplied) desired state.
“current_value is X, should be Y (noop)”
This is the section of code that generates those events.
Otherwise, the property is updated on the system by calling property.sync. The
changes made are noted in an event.
This method call marks the crossover from the Catalog and Transaction to the Resource Abstraction Layer, which are the Types and Providers that you’re familiar with.
Any event generated by this method is passed back to #perform_changes, which
passes them back to the Transaction for further handling.
So, that’s it. That’s how we go from a catalog to actual system state changes. Hopefully this demystifies some of the stuffing of Puppet.
The Transaction is the core of a Puppet run; it takes an abstract catalog and turns it into actual state changes. After this point, the next thing to look at is the Resource Abstraction Layer, or RAL. This is the type and provider layer, which is another big part of what makes Puppet so unique and powerful.
I would like to take this moment to proudly announce that this is my longest blog entry. I like words. Congratulations on making it all the way down here.
Addendum: Puppet 2.7.19 was used as the reference version for this blog.