Reading Puppet: the Configurer

Every time I dive into the source of Puppet, I seem to forget everything about as fast as I figure it out. I have the attention of a small overstimulated chipmunk, and there’s just a lot of detail to absorb so contents tend to slip out of my brain. In light of this, I’ve decided that I’m going to try to blog on each module/class that I manage to decipher. It’ll force me to get my thoughts in one place. I’m also hoping that this will help other people who go delving into the source.

Note: All of this is done against 2.7.x. While I would love to start tearing into 3.0.0, it introduces some new behavior that I don’t want to talk about yet.

Also, this is just what I’ve been able to derive while reading the source, so I could be wrong. If you find something erroneous, please find me in #puppet on freenode and let me know. (Yes, I need to add comments to my blog. It’s on the TODO list.)


Getting started: Puppet::Configurer

The Configurer is the heart of the normal Puppet agent. When you think about the different stages of a normal agent run, it’s all kicked off by the Configurer. It handles pluginsync, uploading facts, retrieving a catalog, applying the catalog, and then submitting the report.

The Configurer class doesn’t seem to be designed much as a general use class. From what I’ve gleaned, the expectation is that you’ll instantiate the object, call #run on it, and call it a day. But considering that it’s the class that drives pretty much everything, it’s definitely good to be familiar with it.

It’s also worth noting that the Configurer might eventually become obsolete. With the advent of Puppet Faces, the work that the Configurer does now can probably be replaced by assembling Faces. In fact, I believe the secret agent face does just this. It does make sense to see things moving from the monolithic, one shot architecture used by the Configurer to behavior more akin to the secret agent face.

That being said, if you’re running puppet agent then you’re using this code.

Before we get started, this code makes heavy use of the indirector. If you aren’t familiar with the indirector, you should read Masterzen’s blog post on the indirector.

Alright, let’s go source diving.

Class Attributes

Puppet::Configurer.instance

Example:

Puppet::Configurer.new      # => Your configurer
Puppet::Configurer.instance # => the same configurer

It’s important to note that the Configurer is expected to be a singleton instance. If you instantiate a Configurer object, Configurer.instance is how you can get it.

Now what’s interesting is that the agent itself handles locking. By the look of it, this looks like the Puppet::Agent class was split off from the Configurer. In both classes, there’s this comment:

# Just so we can specify that we are "the" instance.

It looks like the configurer and agent were split, and some of the locking/singleton logic was left here. This is more of a historical reference; it may not be used but this will be relevant later.

Instance methods

Puppet::Configurer#run

(Grossly oversimplified) example:

c = Puppet::Configurer.new
c.run # OMG PUPPET RUN! No, really, this is basically all you need to do a run.

This is where the magic happens. There’s a pattern that pops up in Puppet fairly frequently, where there’s a number of normal methods, and one method that basically runs everything else. Nothing too unusual, it just means that there’s one point that ties together all the class logic. This is it.

This does a lot, so I’ll summarize.

1. Set up reporting

The first thing we do is generate a report by adding it as a new log destination; all logged actions will make it here. It’s done by creating a new Puppet::Transaction::Report object, and adding it as a log destination. This way, the report that’ll be submitted to the master will be populated in the same way that logging would be done to syslog, or to the console if you’re using puppet agent -t.

2. Prepare storage and sync plugins

Some basic prep is done with the #prepare method. It sets up caching for the application. If pluginsync is turned on, #prepare will download our plugins - Facter facts, types, providers, etc.

After that, facts for catalog compilation are gathered with the #facts_for_uploading method.

3. Retrieve and apply the catalog

Once we have our facts, we have everything we need to actually perform the run. The #retrieve_and_apply_catalog method is called with the facts we just retrieved.

4. Upload the report

After we’ve applied the catalog, then the run is complete. The report generated at the beginning of the run is then sent with the #send_report method.


Whew. Yeah, this method does a lot. Starting from the top, let’s work down through the methods that #run calls to see what’s done at a lower level.

Puppet::Configurer#prepare

This method handles two things - setting up a cache for puppet, and running pluginsync if necessary.

The first part instantiates the Puppet::Util::Store singleton object for the rest of the run. This way, the rest of the system can use that for caching, and not have to worry about how it gets there.

Have you ever CTRL-C’d a puppet run, rerun it, and got an error about a corrupt state file? This is where it whines, and then nukes the old statefile.

(Taken from the aforementioned code)

Puppet.err "State got corrupted"

Familiar? If some part of Puppet was writing to the statefile when Puppet was terminated, this statefile might get mangled. If this file exists and is corrupted, it’s deleted.

The other part of #prepare is pluginsync. It’s been entirely delegated to Puppet::Configurer::PluginHandler, which in turn uses Puppet::Configurer::Downloader. We’ll discuss this later, just know that the first thing that’s really done in a puppet run is the pluginsync, and it’s kicked off by this method.

Puppet::Configurer#facts_for_uploading

This is the part where we go out and grab our facts. Fact retrieval has actually been indirected, so we don’t directly go and grab the facts from Facter. Instead, the indirector is called, which defaults to Facter itself on the agent. This behavior does allow for some interesting injection of behavior.

So you know the b64_zlib_yaml format mentioned all over the place when you’re running puppet agent -t --debug? It turns out that this is a custom format that’s built for handling facts. It’s YAML (a standard Puppet serialization format), that’s been compressed with zlib, that has been base 64 encoded. This compressed format was added because of some size limits on the size of the fact upload, which has since been fixed.

So we have these facts, and they might be really hefty. We attempt to use the aforementioned b64_zlib_yaml format on them, else we fall back to uncompressed yaml. After this is done, the format used to store the facts is returned, as well as the CGI escaped facts. The goal of all of this is to have our facts in a format that’s best suited to send to the master.

The logic for all of this is implemented in the Puppet::Configurer::FactHandler module, and it’s mixed into Configurer.

Puppet::Configurer#retrieve_and_apply_catalog

So we have all our plugins, we have a facts, so we’re ready to roll. We need to run our prerun command if it exists, apply the catalog, and then run the postrun command.

Getting a catalog is more complex than it looks, because Puppet can either fetch a new catalog, or apply an existing catalog. Once we have it, we do catalog.apply and we’re off to the races. Once the catalog is applied, we send the report. And that’s it! That’s a puppet run!

The logic for catalog retrieval is split into a few methods, so I’ll address them individually.

Puppet::Configurer#retrieve_catalog

This method tries to get a catalog from somewhere. We’ve got the two cases mentioned above - by default get a new catalog, or reuse an existing catalog. This method delegates a lot of work to two other methods.

Puppet::Configurer#retrieve_new_catalog

The default behavior implemented in this method is to do a standard REST call to the master. This REST call uploads the facts generated earlier, which the master uses to compile a new catalog. This is then downloaded and cached on the client.

Puppet::Configurer#retrieve_catalog_from_cache

If the configuration indicates that a cached catalog should be used, or if catalog retrieval fails and :usecacheonfailure is enabled, we’ll try to use the catalog that we cached on the last successful run. This is where catalogs cached on the client in $vardir/yaml come into play.

Puppet::Configurer#send_report

After the run has been completed, the resulting report data needs to be handled in one of a number of ways. If the :summarize option is turned on in Puppet, then the last run summary will be displayed to the console. A copy of the run report will be saved to /var/lib/puppet/state, and if reporting is turned on then a copy of that report will be sent to the master.


In summary, when you think of a typical puppet agent run, this is where it’s done. Pluginsync is performed, facts are prepared, they’re sent to the master when the catalog is retrieved, that catalog is applied, and then the report of this all is sent to the master. This is enough of a view from 50,000 feet that you’ll be able to see how other parts fit in later.

Coming up next: pluginsync, in more detail than you EVER WANTED TO KNOW!

Addendum: Puppet 2.7.17 was used as the reference version for this blog.