A System for Changing Systems – Part 5 – Top-level Categories

The first step to understanding the framework is to define the broad, top level capability areas. A very common problem in technology is the frequent over-use of terms that can have radically different meanings depending on the context of a conversation. So, as with any effort to clarify the discussion of a topic, it is very critical to define terms and hold to those definitions during the course of the discussion.

Top level categories of capabilities around various environments in which applications typically must run.

Top level capability areas for sustaining application systems across environments.

At the top level of this framework are six capability groupings

  • Change Management – This category is for capabilities that ensure that changes to the system are properly understood and tracked as they happen. This is a massively overused term, but the main idea for this framework is that managing changes is not the same thing as applying them. Other capabilities deal with that. This capability category is all about oversight.
  • Orchestration – This category deals with the ability to coordinate activity across different components, areas, and technologies in a complex distributed application system in a synchronized manner
  • Deployment – This category covers the activities related to managing the lifecycles of an application systems’ artifacts through the various environments. Put more simply this area deals with the mechanics of actually changing out pieces of an application system.
  • Monitoring – The monitoring category deals with instrumenting the environment for various purposes. This instrumentation concept covers all pieces of the application system and provides feedback in the appropriate manner for interested stakeholders. For example, capacity usage for operations and feature usage for development.
  • System Registry – This refers to the need for a flexible and well-understood repository of shared information about the infrastructure in which the application system runs. This deals with the services on which the application system depends and which may need to be updated before a new instance of the application system can operate correctly.
  • Provisioning – This capability is about creating and allocating the appropriate infrastructure resources for an instance of the application system to run properly. This deals with the number and configuration of those resources. While this area is related to deployment, it is separate because in many infrastructures it may not be desireable or even technically possible to provision fresh resources with each deployment and linking the two would blunt the relevancy of the framework.

The next few posts will dig into the sub-categories underneath each of these top-level items.

Advertisement

A System for Changing Systems – Part 4 – Groundwork for Understanding the Capabilities of a System Changing System

In the last couple of posts, we have talked about how application systems need a change application system around them to manage the changes to the application system itself. A “system to manage the system” as it were. We also talked about the multi-part nature of application systems and the fact that the application systems typically run in more than one environment at any given time and will “move” from environment to environment as part of their QA process. These first three posts seek to set a working definition of the thing being changed so that we can proceed to a working definition of a system for managing those changes. This post starts that second part of the series – defining the capabilities of a change application system. This definition will then serve as the base for the third part – pragmatically adopting and applying the capabilities to begin achieving a DevOps mode of operation.

DevOps is a large problem domain with many moving parts. Just within the first set of these posts, we have seen how four rather broad area definitions can multiply substantially in a typical environment. Further, there are aspects of the problem domain that will be prioritized by different stakeholders based on their discipline’s perspective on the problem. The whole point of DevOps, of course, is to eliminate that perspective bias. So, it becomes very important to have some method for unifying the understanding and discussion of the organizations’ capabilities. In the final analysis, it is not as important what that unified picture looks like as it is that the picture be clearly understood by all.

To that end, I have put together a framework that I use with my customers to help in the process of understanding their current state and prioritizing their improvement efforts. I initially presented this framework at the Innovate 2012 conference and subsequently published an introductory whitepaper on the IBM developerWorks website. My intent with these posts is to expand the discussion and, hopefully, help folks get better faster. The interesting thing to me is to see folks adopt this either as is or as the seed of something of their own. Either way, it has been gratifying to see folks respond to it in its nascent form and I think the only way for it to get better is to get more eyeballs on it.

So, here is my picture of the top-level of the capability areas (tools and processes) an organization needs to have to deliver changes to an application system.

Capabilities

Overview of capability areas required to sustain environments

The quality and maturity of these within the organization will vary based on their business needs – particularly around formality – and the frequency with which they need to apply changes.

I applied three principles when I put this together:

  • The capabilities had to be things that exist in all environments that application system runs (ie dev, test, prod, or whatever layers exist). THe idea here is that such a perspective will help unify tooling and approaches to a theoretical ideal of one solution for all environments.
  • The capabilities had to be broad enough to allow for different levels of priority / formality depending on the environment. The idea is to not burden a more volatile test environment with production-grade formality or vice-versa. But to allow a structured discussion of how the team will deliver that capability in a unified way to the various environments. DevOps is an Agile concept, so the notion of minimally necessary applies.
  • The capabilities had to be generic enough to apply to any technology stack that an organization might have. Larger organizations may need multiple solutions based on the fact that they have many application systems that were created at different points in time, in different languages, and in different architectures. It may not be possible to use exactly the same tool / process in all of those environments, but it most certainly is possible to maintain a common understanding and vocabulary about it.

In the next couple of posts, I will drill a bit deeper into the capability areas to apply some scope, focus, and meaning.

A System for Changing Systems – Part 3 – How Many “Chang-ee”s

As mentioned in the last post, once there is a “whole system” understanding of an application system, the next problem is that there are really multiple variants of that system running within the organization at any given time. There are notionally at least three: Development, Test, and Production. In reality, however, most shops frequently have multiple levels of test and potentially more than one Development variant. Some even have Staging or “Pre-production” areas very late in test where the modified system must run for some period before finally replacing the production environment. A lot of this environment proliferation is based on historic processes that are themselves a product of the available tooling and lessons organizations have learned over years of delivering software.

Example Environment Flow

This is a simplified, real-world example flow through some typical environments. Note the potential variable paths – another reason to know what configuration is being tested.

Tooling and processes are constantly evolving. The DevOps movement is really a reflection of the mainstreaming of Agile approaches and cloud-related technologies and is ultimately a discussion of how to best exploit it. That discussion, as it applies to environment proliferation, means we need to get to an understanding of the core problems we are trying to solve. The two main problem areas are maintaining the validity of the sub-production environments as representative of production and tracking the groupings of changes to the system in each of the environments.

The first problem area, that of maintaining the validity of sub-production envrionments, is a more complex problem than it would seem. There are organizational silo problems where multiple different groups own the different environments. For example, a QA group may own the lab configuraitons and therefore have a disconnect relative to the production team. There are also multipliers associated with technical specialities, such as DBAs or Network Administration, which may be shared across some levels of environment. And if the complexity of the organization was not enough, there are other issues associated with teams that do not get along well, the business’ perception that test environments are less critical than production, and other organizational dynamics that make it that much more difficult to ensure good testing regimes are part of the process.

The second key problem area that must be addresssed is tracking the groups of changes to the application system that are being evaluated in a particular sub-production environment. This means having a unique identifier for the combination of application code, the database schema and dataset, system configuration, and network configuration. That translates to five version markers – one for each of the main areas of the application system plus one for the particular combination of all four. On the surface, this is straightforward, but in most shops, there are few facilities for tracking versions of configurations outside of software code. Even when they are, they are too often not connected to one another for tracking groupings of configurations.

They typical pattern for solving these two problems actually begins with the second problem first. It is difficult to ensure the validity of a test environment if there is no easy way to identify and understand the configuration of the components involved. This is why many DevOps initiatives start with configuration management tools such as Puppet, Chef, or VMWare VCenter. It is also why “all-in-one” solutions such as IBM’s Pure family are starting to enter the market. Once an organization can get a handle on their configurations, then it is substantially easier to have fact-based engineering conversations about valid test configurations and environments because everyone involved has a clear reference for understanding exactly what is being discussed.

This problem discussion glosses over the important aspect of being able to maintain these tools and environments over time. Consistently applying the groups of changes to the various environments requires a complex system by itself. The term system is most appropirate because the needed capabilities go well beyond the scope of a single tool and then those capabilities need to be available for each of the system components. Any discussion of such broad capabilities is well beyond the scope of a single blog post, so the next several posts in this series will look at framework for understanding the capabilities needed for such a system.