A System for Changing Systems – Part 7 – Deployment Capabilities

The third capability area is that of Deployment. Deployment deals with the act of actually putting the changes into a given target environment. It is not prescritive of how this happens. Many shops mechanically deal with deployment via their provisioning system. That is obviously a good thing and an efficiency gain by removing a discrete system for performing deployment activities. It is really a best practice of the most mature organizations. However, this taxonomy model is about identifying the capabilities needed to consistently apply changes to a whole application system. And, lets face it, best practices tend to be transient; as new, even better, best practices emerge.

Deployment Capability Area

Deployment Capability Area

Additionally, there are a number of reasons the capability is included in this taxonomy. First of all, the framework is about capabilities rather than technologies or implementations. It is important to be deliberate about how changes are deployed to all environments and simply because some group of those changes are handled by a provisioning tool does not remove the fact that not all are covered nor does it remove the fact that some deliberate work is expended in fitting the changes into the provisioning tool’s structure. Most provisioning tools, for example are set up to handle standard package mechanisms such as RPM. The deployment activity in that scenario is more one of packaging the custom changes. But the provisioning answer is not necessarily a solution for all four core areas of an applpication system, so there needs to be a capability that deals generically with all of them. Finally, many, if not most, shops have some number of systems where there are legacy technical requirements that require deployment to happen separately.

All of that being true, the term “Deployment” is probably confusing given its history and popular use. It will likely be replaced in the third revision of this taxonomy with something more generic, such as “Change Delivery”.

The sub-category of Asset Repository refers to the fact that there needs to be an ability to maintain a collection of changes that can be applied singly or in bulk to a given application system. In the third revision of the taxonomy, it is likely to be joined by a Packaging sub-capability.  Comments and thoughts are welcome as this taxonomy is evolving and maturing along with the DevOps movement.

Advertisement

A System for Changing Systems – Part 6 – Change Management and Orchestration Capabilities

This post covers the first two capability areas in the system taxonomy. This discussion will begin with where the changes come into the “system for changing systems”, Change Management, and proceed around the picture of top-level capability areas.

The first capability area to look at is Change Management. Change is the fundamental reason for this discussion and, in many ways, the discussion is pointless unless this capability is well understood. Put more simply, you can not apply changes if you do not know what the changes are. As a result, this capability area is the change injector for the system. It is where changes to the four components of the application system are identified, labeled and tracked as they are put into place in each environment. For convenience and in recognition of the fact that changes are injected from both the “new feature” angle as well as from the “maintenance item” angle, the two sources of change are each given their own capability sub-area.

Change Management Capability Area

Change Management Capability Area

The second capability area is that of Orchestration. In a complex system that is maintained by a combination of human and machine-automated prcoesses, understanding what is done, by whom, and in what order is important. This capability area has two sub-areas – one for the technical side and one for the people. This reflects the need to keep the technical dependencies properly managed and also to keep everyone on the same page. Orchestration is a logical extension of the changes themselves. Once you know what the changes are, everyone and everything must stay synchronized on when and where those changes are applied to the application system.

Orchestration Capability Area

Orchestration Capability Area

A System for Changing Systems – Part 5 – Top-level Categories

The first step to understanding the framework is to define the broad, top level capability areas. A very common problem in technology is the frequent over-use of terms that can have radically different meanings depending on the context of a conversation. So, as with any effort to clarify the discussion of a topic, it is very critical to define terms and hold to those definitions during the course of the discussion.

Top level categories of capabilities around various environments in which applications typically must run.

Top level capability areas for sustaining application systems across environments.

At the top level of this framework are six capability groupings

  • Change Management – This category is for capabilities that ensure that changes to the system are properly understood and tracked as they happen. This is a massively overused term, but the main idea for this framework is that managing changes is not the same thing as applying them. Other capabilities deal with that. This capability category is all about oversight.
  • Orchestration – This category deals with the ability to coordinate activity across different components, areas, and technologies in a complex distributed application system in a synchronized manner
  • Deployment – This category covers the activities related to managing the lifecycles of an application systems’ artifacts through the various environments. Put more simply this area deals with the mechanics of actually changing out pieces of an application system.
  • Monitoring – The monitoring category deals with instrumenting the environment for various purposes. This instrumentation concept covers all pieces of the application system and provides feedback in the appropriate manner for interested stakeholders. For example, capacity usage for operations and feature usage for development.
  • System Registry – This refers to the need for a flexible and well-understood repository of shared information about the infrastructure in which the application system runs. This deals with the services on which the application system depends and which may need to be updated before a new instance of the application system can operate correctly.
  • Provisioning – This capability is about creating and allocating the appropriate infrastructure resources for an instance of the application system to run properly. This deals with the number and configuration of those resources. While this area is related to deployment, it is separate because in many infrastructures it may not be desireable or even technically possible to provision fresh resources with each deployment and linking the two would blunt the relevancy of the framework.

The next few posts will dig into the sub-categories underneath each of these top-level items.

A System for Changing Systems – Part 4 – Groundwork for Understanding the Capabilities of a System Changing System

In the last couple of posts, we have talked about how application systems need a change application system around them to manage the changes to the application system itself. A “system to manage the system” as it were. We also talked about the multi-part nature of application systems and the fact that the application systems typically run in more than one environment at any given time and will “move” from environment to environment as part of their QA process. These first three posts seek to set a working definition of the thing being changed so that we can proceed to a working definition of a system for managing those changes. This post starts that second part of the series – defining the capabilities of a change application system. This definition will then serve as the base for the third part – pragmatically adopting and applying the capabilities to begin achieving a DevOps mode of operation.

DevOps is a large problem domain with many moving parts. Just within the first set of these posts, we have seen how four rather broad area definitions can multiply substantially in a typical environment. Further, there are aspects of the problem domain that will be prioritized by different stakeholders based on their discipline’s perspective on the problem. The whole point of DevOps, of course, is to eliminate that perspective bias. So, it becomes very important to have some method for unifying the understanding and discussion of the organizations’ capabilities. In the final analysis, it is not as important what that unified picture looks like as it is that the picture be clearly understood by all.

To that end, I have put together a framework that I use with my customers to help in the process of understanding their current state and prioritizing their improvement efforts. I initially presented this framework at the Innovate 2012 conference and subsequently published an introductory whitepaper on the IBM developerWorks website. My intent with these posts is to expand the discussion and, hopefully, help folks get better faster. The interesting thing to me is to see folks adopt this either as is or as the seed of something of their own. Either way, it has been gratifying to see folks respond to it in its nascent form and I think the only way for it to get better is to get more eyeballs on it.

So, here is my picture of the top-level of the capability areas (tools and processes) an organization needs to have to deliver changes to an application system.

Capabilities

Overview of capability areas required to sustain environments

The quality and maturity of these within the organization will vary based on their business needs – particularly around formality – and the frequency with which they need to apply changes.

I applied three principles when I put this together:

  • The capabilities had to be things that exist in all environments that application system runs (ie dev, test, prod, or whatever layers exist). THe idea here is that such a perspective will help unify tooling and approaches to a theoretical ideal of one solution for all environments.
  • The capabilities had to be broad enough to allow for different levels of priority / formality depending on the environment. The idea is to not burden a more volatile test environment with production-grade formality or vice-versa. But to allow a structured discussion of how the team will deliver that capability in a unified way to the various environments. DevOps is an Agile concept, so the notion of minimally necessary applies.
  • The capabilities had to be generic enough to apply to any technology stack that an organization might have. Larger organizations may need multiple solutions based on the fact that they have many application systems that were created at different points in time, in different languages, and in different architectures. It may not be possible to use exactly the same tool / process in all of those environments, but it most certainly is possible to maintain a common understanding and vocabulary about it.

In the next couple of posts, I will drill a bit deeper into the capability areas to apply some scope, focus, and meaning.

A System for Changing Systems – Part 3 – How Many “Chang-ee”s

As mentioned in the last post, once there is a “whole system” understanding of an application system, the next problem is that there are really multiple variants of that system running within the organization at any given time. There are notionally at least three: Development, Test, and Production. In reality, however, most shops frequently have multiple levels of test and potentially more than one Development variant. Some even have Staging or “Pre-production” areas very late in test where the modified system must run for some period before finally replacing the production environment. A lot of this environment proliferation is based on historic processes that are themselves a product of the available tooling and lessons organizations have learned over years of delivering software.

Example Environment Flow

This is a simplified, real-world example flow through some typical environments. Note the potential variable paths – another reason to know what configuration is being tested.

Tooling and processes are constantly evolving. The DevOps movement is really a reflection of the mainstreaming of Agile approaches and cloud-related technologies and is ultimately a discussion of how to best exploit it. That discussion, as it applies to environment proliferation, means we need to get to an understanding of the core problems we are trying to solve. The two main problem areas are maintaining the validity of the sub-production environments as representative of production and tracking the groupings of changes to the system in each of the environments.

The first problem area, that of maintaining the validity of sub-production envrionments, is a more complex problem than it would seem. There are organizational silo problems where multiple different groups own the different environments. For example, a QA group may own the lab configuraitons and therefore have a disconnect relative to the production team. There are also multipliers associated with technical specialities, such as DBAs or Network Administration, which may be shared across some levels of environment. And if the complexity of the organization was not enough, there are other issues associated with teams that do not get along well, the business’ perception that test environments are less critical than production, and other organizational dynamics that make it that much more difficult to ensure good testing regimes are part of the process.

The second key problem area that must be addresssed is tracking the groups of changes to the application system that are being evaluated in a particular sub-production environment. This means having a unique identifier for the combination of application code, the database schema and dataset, system configuration, and network configuration. That translates to five version markers – one for each of the main areas of the application system plus one for the particular combination of all four. On the surface, this is straightforward, but in most shops, there are few facilities for tracking versions of configurations outside of software code. Even when they are, they are too often not connected to one another for tracking groupings of configurations.

They typical pattern for solving these two problems actually begins with the second problem first. It is difficult to ensure the validity of a test environment if there is no easy way to identify and understand the configuration of the components involved. This is why many DevOps initiatives start with configuration management tools such as Puppet, Chef, or VMWare VCenter. It is also why “all-in-one” solutions such as IBM’s Pure family are starting to enter the market. Once an organization can get a handle on their configurations, then it is substantially easier to have fact-based engineering conversations about valid test configurations and environments because everyone involved has a clear reference for understanding exactly what is being discussed.

This problem discussion glosses over the important aspect of being able to maintain these tools and environments over time. Consistently applying the groups of changes to the various environments requires a complex system by itself. The term system is most appropirate because the needed capabilities go well beyond the scope of a single tool and then those capabilities need to be available for each of the system components. Any discussion of such broad capabilities is well beyond the scope of a single blog post, so the next several posts in this series will look at framework for understanding the capabilities needed for such a system.

A System for Changing Systems – Part 2 – The “Chang-ee”

As discussed last time, having a clear understanding of the thing being changed is key to understanding how to change it. Given that, this post will focus on creating a common framework for understanding the “Change-ee” systems. To be clear, the primary subject of this discussion are software application systems. That should be obvious from the DevOps discussion, but I prefer not to assume things.

Application systems generally have four main types of components. First, and most obviously, is the software code. That is often referred to as the “application”. However, as the DevOps movement has long held, that is a rather narrow definition of things. The software code can not run by itself in a standalone vacuum. That is why these posts refer to an application *system* rather than just an application. The other three parts of the equation are the database, the server infrastructure and the network insfrastructure. It takes all four of these areas working together for an application system to function.

Since these four areas will frame the discussion going forward, we need to have a common understanding about what is in each. It is important to understand that there are variants of each of these components as changes are applied and qualified for use in the production environment. In other words, there will be sub-production environments that have to have representative configurations. And those have to be considered when deciding how to apply changes through the environment.

  • Application Code – This is the set of functionality defined by the business case that justifies the existance of the application system in the first place and consists of the artifacts created by the development team for the solution including things such as server code, user interface artifacts, business rules, etc.
  • Database & Data – This is the data structure required for the application to run. This area includes all data-related artifacts, whether they are associated with a traditional RDBMS, “no sql” system, or just flat files. This includes data, data definition structures (eg schema), test datasets, and so forth.
  • Server Infrastructure (OS, VM, Middleware, Storage) – This represents the services and libraries required for the application to run. A broad category ranging from the VM/OS layer all the way through the various middleware layers and libraries on which the application depends. This area also includes storage for the database area.
  • Network Infrastructure – This category is for all of the inter-system communications components and links required for users to derive value from the application system. This includes the connectivity to the users, connectivity among servers, connectivity to resources (e.g. storage), and the devices (e.g. load balancers, routers, etc.) that enable the application system to meet its functional, performance, and availability requirements
Application System Components

Conceptual image of the main system component areas that need to be in sync in order for a system to operate correctly

The complicating factor for these four areas is that there are multiple instances of each of them that exist in an organization at any given time. And those multiple instances may be at different revision levels. Dealing with that is a discussion unto itself, but is no less critical to understanding the requirements for a system to manage your application system. The next post will examine this aspect of things and the challenges associated with it.

Another Example of Grinding Mental Gears

I recently got a question from a customer who was struggling with the ‘availability’ of their sub-production environments. The situation brought into focus a fundamental disconnect between the Ops folks who were trying to maintain a solid set of QA environments for the Dev team and what the Dev teams needed. To a large extend this is a classic DevOps dilemma, but the question provides an excellent teaching moment. Classic application or system availability as defined for a production situation does not really apply to Dev or multi-level Test environments.

Look at it this way. End user productivity associated with a production environment is based upon the “availability” of the application. Development and Test productivity is based upon the ability to view chagnes to the application in a representative (pre-production) environment. In other words the availability of the _changer_ in pre-production is more valuable to Dev productivity than any specific pre-production instance of the application environment. Those application environment instances are, in fact, disposable by definition.

Disposability of a running application environment is a bit jarring to Ops folks when they see a group of users (developers and testers in this case) needing the system. Everything in Ops tools and doctrine is oriented toward making sure that an application environment gets set up and STAYS that way. That focus on keeping things static is exactly the point to which DevOps is a reaction.  Knowing that does not make it easy to make the mental shift, of course.  Once made, however, it is precisely why tools that facilitate rapidly provisioning environments are frequently the earliest arrivals when most organizations seek to adopt DevOps.

About Those “QA” Environments…

DevOps is about getting developed software to users faster and getting feedback from that software back to developers faster. The notion of a clean cycle with very low latency is a compelling vision. Most IT shops are struggling with how to get there and many must maintain a division between the full production environment and the test environments that lead to production. Some of the reasons are historical, some are managerial, and some are truly related to the business environment in which that organization operates.

Fortunately or unfortunately, most shops have plenty to do before they need to worry about tying production in more directly. The state of QA environments is usually relatively weakly managed. Part of that is historic – the environments are not viewed as very important and it is never quite clear who owns keeping them properly current to the production environment. Part of this is the traditional focus of narrowly testing features of the application to minimize test cycles. And part, too, is a historical view that lab configurations did not matter as long as they were ‘close enough’ for testing features.

Of course recent lessons are teaching us that the historical approach is not necessarily conducive to rapid iteration of software. We are also learning that theoretically small changes to the production environments can potentially invalidate theoretically tested deliverables. Combined with the advent of very good rapid provisioning systems, automated configuration management tools, and highly virtualized infrastructures there are few reasons to not have a first-rate QA environment.

But is the paradigm of QA “environments” really the right paradigm for how we approach rapidly releasing features into the wild? As teams try to lessen the notion of a big, standing lab environment for testing software, the approach looks somewhat less like traditional testing and more like qualifying a new feature for use in the system. This is a subtle difference. “Quality Assured” and “Qualified for Use” are two different notions. One says you delivered what you set out to deliver. The other says you know it works in some situation. Some would say that “Quality” implies the latter, but I would answer that if you have to parse a definition to get a meaning, you probably are using the wrong word.

But words only matter to a point, it is the pardigm they represent that is ultimately interesting and impactful. There are extreme examples in the “real” world. For example, just because a part is delivered with quality to its design goals does NOT mean that it is certified for use in a plane. As someone who flies often, I view this as good.

So the question I would ask is whether you simply test to see if it meets design goals in some hopefully representative lab somewhere or do you use DevOps techniques to truly qualify releases for use in the real production environment.

How Fast Should You Change the Tires?

I am an unabashed car nut and like to watch a variety of motor racing series. In particular I tend to stay focused on Formula 1 with a secondary interest in the endurance series (e.g Le Mans). In watching several races recently, I observed that the differences in how each series managed tire changes during pit stops carried some interesting analogies to deploying software quickly.

Each racing series has a different set of rules and limitations with regard to how pit stops may be conducted. These rules are imposed for a combination of safety reasons, competitive factors, and the overall viability of the racing series. There are even rules about changing tires. Some series enable very quick tire changes – others less so. The reasons behind these differences and how they are applied by race teams in tight, time competitive situations can teach us lessons about the haste we should or should NOT have when deploying software.

Why tire changes? The main reason is that, like deploying software, there are multiple potential points of change (4 tires on the car – software, data, systems, network with the software). And, in both situations, it is less important how fast you can change just one of them than how fast you change all of them. There is even the variants where you may not need to change all 4 tires (or system components) every time, but you must be precise in your changes.

Formula 1

Formula 1 is a fantastically expensive racing series and features extreme everything – including the fastest pit stops in the business. Sub 4-second stops are the norm, during which all 4 tires are changed. There are usually around18 people working on the car – 12 of whom are involved in getting the old tires off and clear while putting new tires on (not counting another 2 to work the jacks). That is a large team, with a lot of expensive people on it, who invest a LOT of expensive time practicing to ensure that they can get all 4 tires changed in a ridiculously short period of time. And they have to do it for two cars with potentially different tire use strategies, do it safely, while competing in a sport that measures advantage in thousandths of a second.

But, there is a reason for this extreme focus / investment in tire changes. The tire changes are the most complex piece of work being done on the car during a standard pit stop. Unlike other racing series, there is no refueling in Formula 1 – the cars must have the range to go the full race distance. In fact, the races are distance and time limited, so the components on the cars are simply engineered to go that distance without requiring service, and therefore time, during the race. There are not even windows to wash – it is an open cockpit car. So, the tires are THE critical labor happening during the pit stop and the teams invest accordingly.

Endurance (Le Mans)

In contrast to the hectic pace of a Formula 1 tire change is Endurance racing. These are cars that are built to take the abuse of racing for 24 hours straight. These cars require a lot of service over the course of that sort of race and the tires are therefore only one of several critical points that have to be serviced in the course of a race. Endurance racers have to be fueled, have brake components replaced, and the three drivers have to switch out periodically so they can rest. The rules of this series, in fact limit the number of tire wrenches the team can use in the pits to just one. That is done to discourage teams from cutting corners and also to keep team size (and therefore costs) down.

NASCAR

NASCAR is somewhere between Formula 1 and Endurance racing when it comes to tire changes. This series limits tire wrenches to two and tightly regulates the number of people working on the car during a pit stop. These cars require fuel, clean-up, and tires just like the Endurance cars, but generally do not require any additional maintenance during a race, barring damage. So, while changing tires quickly is important, there are other time eating activities going on as well.

Interestingly, in addition to safety considerations, NASCAR limits personnel to keep costs down to help the teams competing in the series afford the costs of doing so. That keeps the overall series competition healthy by ensuring a good number of participants and the ability of new teams to enter. Which, to contrast, is one of the problems that Formula 1 has had over the years.

In comparing the three approaches to the same activity, you see an emerging pattern where ultimate speed of changing tires gets traded based on cost and contextual criticality. These are the same trade-offs that are made in a business when it looks at how much faster it can perform a regular process such as deploy software. You could decide you want sub-four second tire changes, but that would be dumb if your business needs 10 seconds for refueling or several minutes for driver swaps and brake overhauls. And if they do, your four second tire change would look wasteful at best as your army of tire guys stands around and watches the guy fueling the car or the new driver adjusting his safety harnesses.

The message here is simple – understand what your business needs when it comes to deployment. Take the thrill of speed out of it and make an unemotional decision to optimize; knowing that optimal is contextually fastest without waste. Organizations that literally make their living from speed undestand this. You should consider this the next time you go looking to do something faster.

DevOps is NOT a Job Title

Given my recent posts about organizational structure, I feel like I need to clarify my stance on this…

You know a topic is hot when recruiters start putting it in job titles.  I do believe that most organizations will end up with a team of “T-shaped people” focused on using DevOps techniques to ensure that systems can be support an Agile business and its development processes.  However, I am not a fan of hanging DevOps on the title of everyone involved.

Here’s the thing, if you have to put it in the name to convince yourself or other people you are doing it, you probably are not.  And the very people you hope to attract may well avoid your organization because it fails the ‘reality’ test.  In other words, you end up looking like you don’t get it.  A couple of analogies come to mind immediately.

  • First, let’s look at a country that calls itself the “People’s Democratic Republic of” somewhere.  That is usually an indicator that it is not any of those modifiers and the only true statement is the ‘somewhere’ part.  Similarly, putting “DevOps Sysadmin” on top of a job description that, just last week, said “Sysadmin” really isn’t fooling anyone.
  • Second, hanging buzzwords on job titles is like a 16 year old painting racing stripes on the four door beater they got as their first car.  With latex house paint.  You may admire their enthusiasm and optimism.  You certainly wish them the best.  But you have a pretty realistic assessment of the car.

Instead, DevOps belongs down in the job description.  DevOps in a job role is a mindset and an approach used to define how established skills are applied.  You are looking for a Release Manager to apply DevOps methods in support of your web applications.  Put it down in the requirements bullet points just as you would put things like ‘familiar with scripting languages’, ‘used to operating in an [Agile/Lean/Scrum] environment]’, or ‘experience supporting a SaaS infrastructure’.

I realize that I am tilting at windmills here.  We went through a spate of “Agile” Development Managers and the number of  “Cloud” Sysadmins is just now tapering.  So, I guess it is DevOps’ turn.  To be sure, it is gratifying and validating to see such proof that DevOps is becoming a mainstream topic.  I should probably adopt a stance of ‘whatever spreads the gospel to the masses’.  But I really just had to get this rant off my chest after seeing a couple of serious “facepalm” job ads.