Predictability is Predictably Hard

In order to successfully automate something, the pieces being automated have to be ‘predictable’. I use ‘predictable’ here – rather than ‘consistent’ – deliberately. A ‘predictable’ environment means you can anticipate its state and configuration. ‘Consistent’ gets misconstrued as ‘unchanging’, which is the opposite of what Agile software delivery is trying to achieve.

Consider deploying a fresh build of an application into a test environment. If you cannot predict what the build being deployed looks like and how the environment will be set up, why would you expect to reliably be able to get that build working in that environment in a predictable window of time? And yet, that is exactly what so many teams do.

The proposed solution is usually to automate the deployment. That, however, leads to its own problems if you do not address the predictability of the underlying stuff being automated. I talk to teams with stories about how they abandoned automation because it ‘slowed things down’ or ‘just did not work’. That leads teams to say, and in some cases believe, that their applications are ‘too complex to deploy automatically’.

At the heart of achieving predictability of the code packages and environments is the fact that they are different teams. Somehow it is harder to collaborate with the developers or operations team than it is to spend months attempting to build a mountain of hard to maintain deployment code. A mountain of code that stands a good chance of being abandoned, by the way. That represents months of wasted time, effort, and life because people working on the same application do not collaborate or cooperate.

And we get another example of why so many DevOps conversations become about culture rather than technology… Which really sucks, because that example is at the expense of a fair bit of pain from the real people on those teams.

The lesson here is that there is no skipping the hard work of establishing predictability in the packaging of the code and environments before charging into automating deployments. We are in an era now where really good packaging and configuration management tools are very mature.
And the next generation of tools that unifies the code and environment changes into immutable, deployable, and promotable artifacts is coming fast. But even with the all of these awesome tools, cross-disciplinary experts will have to come together to contribute to the creation of predictable versions of those artifacts.

The ‘C’ in CAMS stands for “Collaboration”. There are no shortcuts.

This article is also on LinkedIn here: https://www.linkedin.com/pulse/predictability-predictably-hard-dan-zentgraf/

Advertisements

Your Deployment Doc Might Not be Useful for DevOps

One of the most common mistakes I see people making with automation is the assumption that they can simply wrap scripts around what they are doing today and be ‘automated’. The assumption is based around some phenomenally detailed runbook or ‘deployment document’ that has every command that must be executed. In ‘perfect’ sequence. And usually in a nice bold font. It was what they used for their last quarterly release – you know, the one two months ago? It is also serving as the template for their next quarterly release…

It’s not that these documents are bad or not useful. They are actually great guideposts and starting points for deriving a good automated solution to releasing software in their environment. However, you have to remember that these are the same documents that are used to guide late night, all hands, ‘war room’ deployments. The idea that their documented procedures are repeatablly automate-able is suspect, at best, based on that observation alone.

Deployment documents break down as an automate-able template for a number of reasons. First, there are almost always some number of undocumented assumptions about the state of the environment before a release starts. Second, using the last one does not account for procedural, parameter, or other changes between the prior and the upcomming releases. Third, the processes usually unconsciously rely on interpretation or tribal knowledge on the part of the person executing the steps. Finally, there is the problem that steps that make sense in a sequential, manual process will not take advantage of the intrinsic benefits of automation, such as parallel execution, elimination of data entry tasks, and so on.

The solution is to never set the expectation – particularly to those with organizational power – that the document is only a starting point. Build the automation iteratively and schedule multiple iterations at the start of the effort. This can be a great way to introduce Agile practices into the traditionally waterfall approaches used in operations-centric environments. This approach allows for the effort that will be required to fill in gaps in the document’s approach, negotiate standard packaging and tracking of deploy-able artifacts, add environment ‘config drift’ checks, or any of the other common ‘pitfall’ items that require more structure in an automated context.

This article is also on LinkedIn here: https://www.linkedin.com/pulse/your-deployment-doc-might-useful-devops-dan-zentgraf

A System for Changing Systems – Part 4 – Groundwork for Understanding the Capabilities of a System Changing System

In the last couple of posts, we have talked about how application systems need a change application system around them to manage the changes to the application system itself. A “system to manage the system” as it were. We also talked about the multi-part nature of application systems and the fact that the application systems typically run in more than one environment at any given time and will “move” from environment to environment as part of their QA process. These first three posts seek to set a working definition of the thing being changed so that we can proceed to a working definition of a system for managing those changes. This post starts that second part of the series – defining the capabilities of a change application system. This definition will then serve as the base for the third part – pragmatically adopting and applying the capabilities to begin achieving a DevOps mode of operation.

DevOps is a large problem domain with many moving parts. Just within the first set of these posts, we have seen how four rather broad area definitions can multiply substantially in a typical environment. Further, there are aspects of the problem domain that will be prioritized by different stakeholders based on their discipline’s perspective on the problem. The whole point of DevOps, of course, is to eliminate that perspective bias. So, it becomes very important to have some method for unifying the understanding and discussion of the organizations’ capabilities. In the final analysis, it is not as important what that unified picture looks like as it is that the picture be clearly understood by all.

To that end, I have put together a framework that I use with my customers to help in the process of understanding their current state and prioritizing their improvement efforts. I initially presented this framework at the Innovate 2012 conference and subsequently published an introductory whitepaper on the IBM developerWorks website. My intent with these posts is to expand the discussion and, hopefully, help folks get better faster. The interesting thing to me is to see folks adopt this either as is or as the seed of something of their own. Either way, it has been gratifying to see folks respond to it in its nascent form and I think the only way for it to get better is to get more eyeballs on it.

So, here is my picture of the top-level of the capability areas (tools and processes) an organization needs to have to deliver changes to an application system.

Capabilities

Overview of capability areas required to sustain environments

The quality and maturity of these within the organization will vary based on their business needs – particularly around formality – and the frequency with which they need to apply changes.

I applied three principles when I put this together:

  • The capabilities had to be things that exist in all environments that application system runs (ie dev, test, prod, or whatever layers exist). THe idea here is that such a perspective will help unify tooling and approaches to a theoretical ideal of one solution for all environments.
  • The capabilities had to be broad enough to allow for different levels of priority / formality depending on the environment. The idea is to not burden a more volatile test environment with production-grade formality or vice-versa. But to allow a structured discussion of how the team will deliver that capability in a unified way to the various environments. DevOps is an Agile concept, so the notion of minimally necessary applies.
  • The capabilities had to be generic enough to apply to any technology stack that an organization might have. Larger organizations may need multiple solutions based on the fact that they have many application systems that were created at different points in time, in different languages, and in different architectures. It may not be possible to use exactly the same tool / process in all of those environments, but it most certainly is possible to maintain a common understanding and vocabulary about it.

In the next couple of posts, I will drill a bit deeper into the capability areas to apply some scope, focus, and meaning.

A System for Changing Systems – Part 3 – How Many “Chang-ee”s

As mentioned in the last post, once there is a “whole system” understanding of an application system, the next problem is that there are really multiple variants of that system running within the organization at any given time. There are notionally at least three: Development, Test, and Production. In reality, however, most shops frequently have multiple levels of test and potentially more than one Development variant. Some even have Staging or “Pre-production” areas very late in test where the modified system must run for some period before finally replacing the production environment. A lot of this environment proliferation is based on historic processes that are themselves a product of the available tooling and lessons organizations have learned over years of delivering software.

Example Environment Flow

This is a simplified, real-world example flow through some typical environments. Note the potential variable paths – another reason to know what configuration is being tested.

Tooling and processes are constantly evolving. The DevOps movement is really a reflection of the mainstreaming of Agile approaches and cloud-related technologies and is ultimately a discussion of how to best exploit it. That discussion, as it applies to environment proliferation, means we need to get to an understanding of the core problems we are trying to solve. The two main problem areas are maintaining the validity of the sub-production environments as representative of production and tracking the groupings of changes to the system in each of the environments.

The first problem area, that of maintaining the validity of sub-production envrionments, is a more complex problem than it would seem. There are organizational silo problems where multiple different groups own the different environments. For example, a QA group may own the lab configuraitons and therefore have a disconnect relative to the production team. There are also multipliers associated with technical specialities, such as DBAs or Network Administration, which may be shared across some levels of environment. And if the complexity of the organization was not enough, there are other issues associated with teams that do not get along well, the business’ perception that test environments are less critical than production, and other organizational dynamics that make it that much more difficult to ensure good testing regimes are part of the process.

The second key problem area that must be addresssed is tracking the groups of changes to the application system that are being evaluated in a particular sub-production environment. This means having a unique identifier for the combination of application code, the database schema and dataset, system configuration, and network configuration. That translates to five version markers – one for each of the main areas of the application system plus one for the particular combination of all four. On the surface, this is straightforward, but in most shops, there are few facilities for tracking versions of configurations outside of software code. Even when they are, they are too often not connected to one another for tracking groupings of configurations.

They typical pattern for solving these two problems actually begins with the second problem first. It is difficult to ensure the validity of a test environment if there is no easy way to identify and understand the configuration of the components involved. This is why many DevOps initiatives start with configuration management tools such as Puppet, Chef, or VMWare VCenter. It is also why “all-in-one” solutions such as IBM’s Pure family are starting to enter the market. Once an organization can get a handle on their configurations, then it is substantially easier to have fact-based engineering conversations about valid test configurations and environments because everyone involved has a clear reference for understanding exactly what is being discussed.

This problem discussion glosses over the important aspect of being able to maintain these tools and environments over time. Consistently applying the groups of changes to the various environments requires a complex system by itself. The term system is most appropirate because the needed capabilities go well beyond the scope of a single tool and then those capabilities need to be available for each of the system components. Any discussion of such broad capabilities is well beyond the scope of a single blog post, so the next several posts in this series will look at framework for understanding the capabilities needed for such a system.

A System for Changing Systems – Part 1 – Approach

This is the first post in a series which will look at common patterns among DevOps environments.  Based on these patterns, they will attempt to put a reasonable structure together that will help organizations focus DevOps discussions, prioritize choices, and generally improve how they operate.

In the last post, I discussed how many shops take the perspective of developing a system for DevOps within their environments.  This notion of a “system for changing systems” as a practical way of approaching DevOps requires two pieces.  The first is the system being changed – the “change-ee” system.  The second is the system doing the changing – the “DevOps”, or “change-er” system.  Before talking about automatically changing something, it is necessary to have a consistent understanding of the thing being changed.  Put another way, no automation can operate well without a deep understanding of the thing being automated.  So this first post is about establishing a common language for generically understanding the application systems; the “change-ee” systems in the discussion.

A note on products, technologies and tools…  Given the variances in architectures for application (“change-ee”) systems, and therefore the implied variances on the systems that apply changes to them, it is not useful to get product prescriptive for either.  In fact, a key goal with this framework is to ensure that it is as broadly applicable and useful as possible when solving DevOps-related problems in any environment.  That would be very difficult if it overly focused on any one technology stack.  So, these posts will not necessarily name names other than to use them as examples of categories of tools and technologies.

With these things in mind, these posts will progress from the inside-out.  The next post will begin the process with a look at the typical components in an application system (“change-ee”).  From there, the next set of posts will discuss the capabilities needed to systematically apply changes to these systems.  Finally, after the structure is completed, the last set of posts will look at the typical progression of how organizations build these capabilities.

The next post will dive in and start looking at the structure of the “change-ee” environment.

DevOps is about Developing a Change Application System

As the DevOps movement rolls on, there is a pattern emerging. Some efforts are initiated by development, seeking relief on test environment management. Others are initiated by operations departments trying to get more automation and instrumentation into the environments they manage. I frequently hear comments that are variations on “same stuff I’ve been doing for xx years, different title” from people who have DevOps in their job title or job description. Their shops are hoping that if they encourage folks to think about DevOps and maybe try some new tools, they will get the improvements promised by DevOps discussions online. Well, just like buying a Ferrari and talking it up won’t make you Michael Schumacher, having Puppet or Chef to do your configuration management won’t “make you DevOps” (whatever that means). Successful DevOps shops are bypassing the window dressing and going at DevOps as a project unto itself.

There are a number of unique aspects to undertaking a project such as this. They require a holistic perspective on the process, touch a very broad range of activities, and provide an approach for changing other systems while being constantly changed themselves.

These projects are unique in the software organization because they require a team to look at the whole end-to-end approach to delivering changes to the application systems within that organization FROM THE SIDE; rather than from a position somewhere in the middle of the process. This is an important difference in approach, because it forces a change in perspective on the problem. Typically, someone looking from either the development or the operations “end” of the process will often suffer from a perceptive problem where the “closer” problems in the process look bigger than the ones “farther” up or down the process line. It is a very human thing to be deceived by the perspective of our current position. After all, there are countless examples of using perspective for optical illusions. Clever Leaning Tower of Pisa pictures (where someone appears to be holding it up) and the entire Lord of the Rings movie trilogy (the actors playing the hobbits are not that short) provide easy examples. Narrowness of perspective is, in fact, a frequent reason that “grassroots” efforts’ fail outside of small teams. Successfully making large and impactful changes requires a broader perspective.

The other breadth-related aspect of these programs is that they touch a very wide range of activities over time and seek to optimize for flow both through and among each. That means that they have some similarities with supply chain optimization and ERP projects if not in scale, then in complexity. And the skills to look at those flows probably do not exist directly within the software organization, but in the business units themselves. It can be difficult for technology teams, that see themselves as critical suppliers of technology to business units, to accept that there are large lessons to be learned about technology development from the business units. It takes a desire to learn and change at a level well above a typical project.

A final unique part is that there must be ongoing programs for building and enhancing a system for managing consistent and ongoing changes in other systems. Depending on your technology preference, there are plenty of analogies from pipelines, powergrids and aircraft that apply here. Famous and fun ones are the flight control systems of intrinsically unstable aircraft such as the F-16 fighter or B-2 bomber. These planes use technology to adjust control surfaces within fractions of a second to maintain steady and controled flight within the extreme conditions faced by combat aircraft. Compared to that, delivering enhancements to a release automation system every few weeks sounds trivial, but maintaining the discipline and control to do so in a large organization can be a daunting task.

So the message here is to deliberately establish a program to manage how changes are applied. Accept that it is going to be a new and unusual thing in your organization and that it is going to require steady support and effort to be successful. Without that acceptance, it will likely not work out.

My next few posts are going to dig into this deeper and begin looking at the common aspects of these programs, how people approach the problem, how they organize and prioritize their efforts, and the types of tools they are using.

Another Example of Grinding Mental Gears

I recently got a question from a customer who was struggling with the ‘availability’ of their sub-production environments. The situation brought into focus a fundamental disconnect between the Ops folks who were trying to maintain a solid set of QA environments for the Dev team and what the Dev teams needed. To a large extend this is a classic DevOps dilemma, but the question provides an excellent teaching moment. Classic application or system availability as defined for a production situation does not really apply to Dev or multi-level Test environments.

Look at it this way. End user productivity associated with a production environment is based upon the “availability” of the application. Development and Test productivity is based upon the ability to view chagnes to the application in a representative (pre-production) environment. In other words the availability of the _changer_ in pre-production is more valuable to Dev productivity than any specific pre-production instance of the application environment. Those application environment instances are, in fact, disposable by definition.

Disposability of a running application environment is a bit jarring to Ops folks when they see a group of users (developers and testers in this case) needing the system. Everything in Ops tools and doctrine is oriented toward making sure that an application environment gets set up and STAYS that way. That focus on keeping things static is exactly the point to which DevOps is a reaction.  Knowing that does not make it easy to make the mental shift, of course.  Once made, however, it is precisely why tools that facilitate rapidly provisioning environments are frequently the earliest arrivals when most organizations seek to adopt DevOps.