For part three of the Change Mis-management series, I want to pick on the tradition of NOT keeping system management scripts in version control. This is a fascinating illustration of the cultural difference between Development and Operations. Operations is obsessed with ensuring stability and yet tolerates fairly loose control over things that can decimate the environment at the full speed of whatever machine happens to be running the script. Development is obsessed with making incremental changes to deliver value and would never tolerate such loose control over their code. I have long speculated that this level of discipline for Development is in fact a product of the fact that they have to deal with and track a LOT of change.
Whatever the cause and whether or not you believe in Agile and/or the DevOps movement, this is really a fundamental misbehavior and we all know it. There really is no excuse for not doing it. Most shops have scripts that control substantial swaths of the infrastructure. There are various application systems that depend on the scripts to ensure that they can run in a predictable way. For all intents and purposes these scripts represent production-grade code.
This is hopefully not a complex problem to explain or solve. The really sad part is that every software delivery shop of any size already has every tool needed to version manage all of their operations scripts. There is no reason that there can’t be an Ops Scripts tree in your source control system. Further, those repositories are often set up with rules that force some sort of notation for the changes that are being put into those scripts and will track who checked it in, so you have better auditing right out of the gate.
Further, you now have a way to, if not know, then at least have a good idea, what has been run on the systems. That is particularly important if the person who ran the script is not available for some reason. If your operations team can agree on the doctrine always running the ‘blessed’ version and never hack it on the filesystem, then life will get substantially better for everyone. Of course, the script could be changed after checkout and the changes not logged. Any process can be circumvented – most rather easily when you have root. The point is to make such an event more of an anomaly. Maybe even something noticeable – though I will talk about that in the next part of this series.
This is really just a common-sense thing that improves your overall organizational resilience. Repeat after me:
- I resolve to always check in my script changes.
- I resolve to never run a script unless I have first checked it out from source to make sure I have the current version.
- I resolve to never hack a script on the filesystem before I run it against a system someone other than me depends on. (Testing is allowed before check-in; just like for developers)
- I resolve to only run scripts of approved versions that I have pulled out of source control and left unmodified.
It is good, it is easy, it does not take significant time to do and saves countless time-consuming screw-ups. Just do it.