Stop writing scripts and start writing libraries

A lot of us (especially the sysadmin types) started off programming with scripts. It was probably bash, or perl, or (like me, to my chagrin) Windows batch scripts. It was the simple way of getting started - no fighting with compilers, simple syntax, and bam, shit started getting done. This was awesome, and I’m sure that the excitement of creation has gotten many people excited about coding.

However, a lot of people decided that scripts were good enough and stopped there. Work was getting done, so why fix what’s not broken? And admittedly, this seems like a pretty straightforward approach, and it’s why shell and perl scripts are the underpinnings of a good chunk of the Internet.

Let’s talk about gitweb.

No, really, go take a quick look. Go get a line count and then come back.

I’m guessing that gitweb started out as a small CGI script. It didn’t need to do much, just print out trees and blobs, right? Well, it would be cool if we could view diffs. And logs would be nice. And hey, being able to fork would rock! And you know, we should print better HTML, so have a nice header! And having authorization would be really nice too. Oh wait, we almost forgot syntax highlighting -

And then you have a seven thousand line monstrosity.

How do you test this? How do you validate that it’s doing what you expect? Can you reuse it elsewhere? Admittedly, for perl it’s quite clean and well documented (if it isn’t I would be in a corner, weeping softly). While it’s all self contained, you’re stuck with it as it is. There is no meaningful separation of the logic and the views, there’s no real model, it’s one block of functions. How do you extend this? How would you start adding logging or debugging? Where do you go from here?

Here’s the thing - if you’re writing something that’s non-trivial, more than 100 lines - stop, and make it a library. Split it up. Break up classes. Stop passing around hashes and arrays, and make proper objects. Figure out what you want it to do, and write tests for it. Write a proper README and add documentation for the parts. And then, write a small executable to call into the library and launch the necessary parts.

If you have found a problem, chances are someone else is going to have to fight with this same issue too. It’ll be a greater investment of time up front, but if you take the time up front to write something clean and reusable, you’ll get more benefit from this than you would otherwise, and you’ll be able to contribute back.

A good example of a program that’s written as a library is thor. Thor was initially written as a replacement for rake and sake, so that you could run various tasks easily, without Rakefiles or the like. You write a class, install it, and you’re done. You can also use Thorfiles, and treat it like rake. However, thor is at heart a scripting framework, not just an application. You can use it as a library to drive your own application, so you don’t have to reinvent the wheel. One example of this is vagrant; the entire command framework is thor. I’ve done the same thing with gtool. This is a really fantastic demonstration of this point - by itself, thor is great; as a library it’s amazing.

We can also look at the converse - something that’s basically a script, that could have been generalized a lot better. At a previous gig, I wrote a tool called copycfg that was used to back up files like SSH and puppet keys, so when a host was reloaded we just copied them back in. I was actually rewriting a ruby script into something more modular, so while it was a step in the right direction I missed a lot of things. A lot of the code is tightly coupled, and I was so determined to solve this one problem of copying files that I failed to break things up. We had a Frankenstein-ish LDAP server that was half NIS, and so a wrapper around that to do general queries and return data would have been great. I didn’t do that. I also could have improved the logic for handling NFS shares, and instead I just smashed something together instead of trying to solve the issue of dynamically sharing out files. I didn’t do that either.

Admittedly, when I wrote the aforementioned code, I was still quite new to ruby, and the abstraction that I added was a definite improvement. But when you’re focused on just solving one problem, you miss the bigger picture and you miss opportunities to solve problems once and for all. A big part of writing good, reusable code is taking a moment to look at the bigger picture and see what problems are actually out there that you’re trying to solve, rather than focusing on getting a single thing done.

It’s hard to solve problems right the first time, and it’s a fair amount of work to try to handle general cases instead of taking care of your immediate problem. But in the end, you’re doing things right and you’re saving time in the long run. And if you’re going to do something, it seems you should try to do it as well as you can.