Change Mis-management (Part 2)

In my last post, I mentioned three things that need to be reliably happening in order to achieve a faster, more predictable release process.  The first one was to unify change management for the system between the Ops and Development sides.  On the surface,  this sounds like a straightforward thing.  After all a tool rationalization exercise is almost routine in most shops.  It happens regularly due to budget reviews, company mergers or acquisitions, etc.

Of course, as we all know, it is never quite that easy even when the unification is happening for nominally identical teams.  For example, your company buys another and you have to merge the accounting system.  Pretty straightforward – money is money, accountants are accountants, right?  Those always go perfectly smoothly, right?  Right?

In fact, unifying this aspect of change management can be especially thorny because of the intrinsic differences in tradition between the organizations.  Even though complex modern systems evolve together as a whole, few sysadmins would see themselves as ‘developing’ the infrastructure, for example.  Additionally, there are other problems.  For instance, operations are frequently seen as service providers who need to provide SLAs for change requests that are submitted through the change management system.  And a lot of operational tracking in the ticketing system is just that – operational – and it does not relate to actual configuration changes or updates to the system itself.

The key to dealing with this is the word “change”.  Simplified, system changes should be treated in the same way as code changes are handled.  Whatever that might be.  For example, it could be a user story in the backlog.  The “user” might be a middleware patch that a new feature depends on and the work items might be around submitting tickets to progressively roll that up the environment chain into production.  The goal is to track needed changes to the system as first-class items in the development effort.  The non-change operational stuff will almost for sure stay in the ticketing system.  A simple example, but applying the principle will mean that the operating environment of a system evolves right along with its code – not as a retrofit or afterthought when it is time to deploy to a particular environment or there is a problem.

The tool part is conceptually easy – someone manages the changes in the same system that backlog/stories/work items are handled.  However, there  is also the matter of the “someone” mentioned in that sentence.  An emerging pattern I have seen in several shops is to cohabitate an Ops-type with the development team.  Whether these people are the ‘ops representative’ or ‘infrastructure developers’ their role is to focus on evolving the environment along with the code and ensuring that the path to production is in sync with how things are being developed.  This is usually a relatively senior person who can advise on things, know when to say no, and know when to push.  The real shift is that changes to the operating environment of an application become first-class citizens at the table when code or test changes are being discussed and they can now be tracked as part of the work that is required to deliver on an iteration.

These roles have started popping up in various places with interesting frequency.  To me, this is the next logical step in Agile evolution.  Having QA folks in the standups is accepted practice nowadays and organizations are figuring out that the Ops guys should be at the table as well.  This does a great job of pro-actively addressing a lot of the release / promotion headaches that slow things up as things move toward production.  Done right, this takes a lot stress and time out of the overall Agile release cycle.


New Toy!!! IBM Workload Deployer

The company I work for serves many large corporations in our customer base, many of whom are IBM shops with the commensurately large WebSphere installed bases.  So, as you might imagine, it behooves us to keep abreast of the latest stuff IBM delivers.

We are fortunate enough to be pretty good at what we do and are in the premiere tier of IBM’s partner hierarchy and were recently able to get an IBM Workload Deployer (IWD) appliance in as an evaluation unit.  If you are not familiar, the IWD is really the third revision of the appliance formerly known as the IBM WebSphere Cloudburst appliance.  I do not know, but I would presume the rebrand is related to the fact that the IWD is handling more generic workloads than simply those related to WebSphere and therefore deserved a more general name.

You can read the full marketing rundown on the IBM website here:  IBM Workload Deployer

This is a “cloud management in a box” solution that you drop onto your network, point at one one or more of the supported hypervisors, and it handles images, load allocation, provisioning etc.  You can give it simple images to manage, but the thing really lights up when you give it “Patterns” – a term which translates to a full application infrastructure (balancing webservers, middleware, DB, etc.).  If you use this setup, the IWD will let you manage your application as a single entity and maintain the connections for you.

I am not an expert on the thing – at least not yet, but a couple of other points that immediately jump out at me are:

  • The thing also has a pretty rich Python-based command line client that should allow us to do some smart script stuff and maintain those in a proper source repository.
  • The patterns and resources also have intelligence in them where you can’t break dependencies of a published configuration
  • There are a number of pre-cooked template images that don’t seem very locked down that you can use as starter points for customization or you can roll your own.
  • The Rational Automation Framework tool is here, too, so that brings up some migration possibilities for folks looking to bring apps either into a ‘cloud’ or a better managed virtual situation
I do get to be one of the first folks to play with the thing, so I’ll be drilling into as many of these these and other things as time permits.  More on it as it becomes available.

Change Mis-management (part 1)

One of the pillars of DevOps thinking is that the system is a whole.  No part can function without the others, so they all should be treated as equals.  Of course, things rarely work that way.  One of the glaring examples in a lot of shops is the disparity in the way changes are managed / tracked between Dev and Ops.  There are multiple misbehaviors we can examine in just this one area.  Some other day we can discuss how there are different tracking systems for different parts of the system and how many shops have wholly untracked configurations for some components.  Today, instead, we’re going to talk about the different levels of diligence that get applied in Ops versus Dev when dealing with change.

Think about this for a second.  No developer in an enterprise shop would ever think of NOT checking in all of their code changes.  And if they did, they would view it as a pretty serious bypass of good practice.  Code goes into the repository and is pulled from that repository for build, test, and deployment.  It is a backbone constant practice of commercial software development.  Meanwhile, the Ops team has a bunch of scripts that they use to maintain the environment.  How many of those are religiously checked into a version control system (VCS)?  And of those that do end up in a VCS, how many have change tickets attached when they are modified?  And then there are the VM template images, router configs, etc. that may / may not be safely stored someplace.

All too often the change management that happens here is a script update or a command executed someplace on some piece of infrastructure.  The versioning takes the form of a file extension; you know –  “.old”, “.bak”, “.orig” “.[todaysdate]” so that there is some… evidence… that a change was made to the system.  The tracking of the change is often a manually updated trouble ticket or change request.  And let’s not forget that the Ops ticketing system probably does not talk to the change management system the developers use.  Is it any wonder that things get screwed up when something comes down the pipe from Dev?

To really have things working properly, you have to:

  • Unify the change management between Ops and Dev
  • Track scripts the way you would any source code on which your apps function depends.
  • Have a method to automatically capture changes made to the environment and log them.

All three of these things are necessary if you really want to achieve a higher-speed and more predictable release process.

A Sports Analogy for DevOps Thinking

I have been known to go off a bit on how typical management culture self-defeats on its attempts to execute more quickly.  This is a pretty common cultural problem as much as management problem.  Here is some perspective.  Football (American Football, that is) has an ineligible receiver rule.  The roles of the individual players, on offence especially, are so specialized that only certain people can receive a pass.  Seriously.  Then there are very specialized ‘position coaches’ who make sure that individual players focus on the subset of skills they need to perform their specific job.  There is also very little cross-training.  This works fine in the very iterative, assembly-line way the game is played.  Baseball is the same way – very specialized.  And both are quintessentially 20th century American games that grew during (and reflect) an industrial mindset.

However, business is a free-flowing process.  There are no ‘illegal formations’.  Some work.  Some don’t.  The action does not stop.  A better game analogy for releasing software (or running IT, or even the whole business) in the modern era would be Soccer (Football in the rest of the world).  The game constantly flows.  There are no codified rules about who passes the ball to whom.  The goalie is actually only special in that he can use his hands – when standing in his little area.  There is no rule that says it is illegal for him to come out of that area and participate as a regular player.  This occasionally happens in the course of elimination tournaments, in fact.

I draw this comparison to point out the relative agility of a soccer team to adapt to an ever changing game flow.  Football teams only function when there is a very regulated flow of events and where there are a number of un-realistic throttles and limits on the number and types of events.  When you compare this to how most IT shops are set up, you find a lot of Football teams and very few soccer teams.  And guess which environments are seen as more adaptable to the needs of their overall organizations…

Authors note:  I picked soccer over hockey and basketball principally because the latter two sports rely heavily on rapid substitution and aggressive use of timeouts.  Those are luxuries that modern online business most certainly does not have.  Substitutions happen slowly in the enterprise and there darn sure are no timeouts.

“Enterprise” DevOps

Anyone in the IT industry today will note that much of the DevOps discussion is focused on small companies with large websites – often tech companies providing SaaS solutions, consumer web services, or some other solution content.  There is another set of large websites, supported by large technology organizations that have a need for DevOps.  These are large commerce sites for established retailers, banks, insurance companies, etc.  Many of these companies have had large-scale online presences and massive software delivery organizations behind them for well over a decade now.  Some of these enterprises would, in fact, qualify among the largest software companies on the planet dwarfing much more ‘buzz-worthy’ startups.  It also turns out that they are pretty good at delivering to their online presence predictably and reliably – if not as agilely – as they would like.

Addressing the agility challenge in an enterprise takes a different mindset than it does in a tech startup.  This has always been the case, of course.  Common sense dictates that solving a problem for 100 people is intrinsically different than for 10,000.  And yet so many discussions focus on something done for a ‘hot’ website or maybe a large ‘maverick’ team in a large organization.  And those maverick team solutions more often than not do not scale to the enterprise and have to be replaced.  Of course, that is rarely discussed or hyped.  They just sort of fade away.

It is not that these faded solutions are bad or wrong, either.  A lot of times, the issue is simply that they only looked at part of the problem and did not consider the impact of improving that part on the other parts of the organization.  In large synchronized systems, you can only successfully accelerate or decelerate if the whole context does so together.  There are many over-used analogies for this scenario, so let’s use the one about rowers on a boat to make this point and move on.

Let’s face it, large organizations can appear to be the “poster children” for silos in their organizational structure.  You have to remember, though that those silos often exist because the organization learned hard lessons about the value of NOT having someone maniacally focused on one narrow, specialized activity.  Think about this as your organization grows, runs into problems or failures, and puts infrastructure in place to make sure it doesn’t happen again.

One of the main points of this blog is going to be to look at the issues confronted by organizations that are or are becoming ‘enterprises’ and how they can balance the need for the Agile flexibility of DevOps with the pragmatic need to synchronize large numbers of people.