Predictability is Predictably Hard

In order to successfully automate something, the pieces being automated have to be ‘predictable’. I use ‘predictable’ here – rather than ‘consistent’ – deliberately. A ‘predictable’ environment means you can anticipate its state and configuration. ‘Consistent’ gets misconstrued as ‘unchanging’, which is the opposite of what Agile software delivery is trying to achieve.

Consider deploying a fresh build of an application into a test environment. If you cannot predict what the build being deployed looks like and how the environment will be set up, why would you expect to reliably be able to get that build working in that environment in a predictable window of time? And yet, that is exactly what so many teams do.

The proposed solution is usually to automate the deployment. That, however, leads to its own problems if you do not address the predictability of the underlying stuff being automated. I talk to teams with stories about how they abandoned automation because it ‘slowed things down’ or ‘just did not work’. That leads teams to say, and in some cases believe, that their applications are ‘too complex to deploy automatically’.

At the heart of achieving predictability of the code packages and environments is the fact that they are different teams. Somehow it is harder to collaborate with the developers or operations team than it is to spend months attempting to build a mountain of hard to maintain deployment code. A mountain of code that stands a good chance of being abandoned, by the way. That represents months of wasted time, effort, and life because people working on the same application do not collaborate or cooperate.

And we get another example of why so many DevOps conversations become about culture rather than technology… Which really sucks, because that example is at the expense of a fair bit of pain from the real people on those teams.

The lesson here is that there is no skipping the hard work of establishing predictability in the packaging of the code and environments before charging into automating deployments. We are in an era now where really good packaging and configuration management tools are very mature.
And the next generation of tools that unifies the code and environment changes into immutable, deployable, and promotable artifacts is coming fast. But even with the all of these awesome tools, cross-disciplinary experts will have to come together to contribute to the creation of predictable versions of those artifacts.

The ‘C’ in CAMS stands for “Collaboration”. There are no shortcuts.

This article is also on LinkedIn here: https://www.linkedin.com/pulse/predictability-predictably-hard-dan-zentgraf/

Advertisements

Ops Heroes are NOT Qualified to do Anything with Nothing

There is a certain “long-suffering and misunderstood” attitude that shows up a lot in Operations. I have seen this quote on a number of cube walls:

We the willing, 
led by the unknowing, 
are doing the impossible 
for the ungrateful. 

We have done so much, 
with so little, 
for so long, 
we are now qualified to do anything, 
with nothing. 

Note: This quote is often mistakenly attributed to Mother Teresa. It was actually from this other guy called Konstantin Josef Jireček that no one has heard of recently.

The problem, of course, is that this attitude is counter-productive in a DevOps world. It promotes the culture that operations will ‘get it done’ no matter what how much is thrown their way in terms of budget cuts, shortened timeframes, uptime expectations, etc. It is a great and validating thing in some ways – you pulled off the impossible and get praise heaped on you. It is really the root of defective ‘hero culture’ behaviors that show up in tech companies or tech departments. And no matter how many times we write about the defectiveness of hero culture in a sustained enterprise, the behavior persists due to a variety of larger societal attitudes.

If you have seen (or perpetuated) such a culture, do not feel too bad – aspects of it show up in other disciplines including medicine. There is a fascinating discussion of this – and the cultural resistance to changing the behaviors – in Atul Gawande’s book, The Checklist Manifesto. The book is one of my favorites of the last couple of years. It discusses the research Dr (yes – he is a surgeon himself) Gawande did on why the instance of complications after surgery was so high relative to other high-criticality activities. He chose aviation – which is a massively complex and yet very precise, life-critical industry. It also has a far better record of incident free activity relative to the more intimate and expertise-driven discipline of medicine. The book proceeds to look at the evolution of the cultures of both industries and how one developed a culture focused on the surgeon being omniscient and expert in all situations while the other created an institutional discipline that seeks to minimize human fallibility in tense situations.

He further looks into the incentives surgeons have – because they have a finite number of hours in the day – to crank through procedures as quickly as possible. That way they generate revenue and do not tie up scarce and expensive operating rooms. But surgeons really can only work so fast and procedures tend to take as long as they do for a given patient’s situation. Their profession is manual and primarily scales based on more people doing more work. Aviation exploits the fact that it deals with machines and has more potential for instrumentation and automation.

The analogy is not hard to make to IT Operations people having more and more things to administer in shorter downtime windows. IT Operations culture, unfortunately, has much more in common with medicine than it does with aviation. There are countless points in the book that you should think about the next time you are logged in with root or equivalent access and about to manually make a surgical change… What are you doing to avoid multitasking? What happens if you get distracted? What are you doing to leverage/create instrumentation – even something manual like a checklist – to ensure your success rate is better each time? What are you doing to ensure that what you are doing can be reproduced by the next person? It resonates…

The good news is that IT Operations as a discipline (despite its culture) deals with machines. That means it is MUCH easier to create tools and instrumentation that leverage expertise widely while at the same time improving the consistency with which tasks are performed. Even so, I have heard only a few folks mention it at DevOps events and that is unfortunate, because the basic discipline of just creating good checklists – and the book discusses how – is a powerful and immediately adoptable thing that any shop, regardless of platform, toolchain, or history can adopt and readily benefit from. It is less inspirational and visionary than The Phoenix Project,  but it is one of the most practical approaches of working toward that vision that exists.

The book is worth a read – no matter how DevOps-y your environment is or wants to be. I routinely recommend it to our junior team members as a way to help them learn to develop sustainable disciplines and habits. I have found this to be a powerful tool for managing overseas teams, too.

I would be interested in anyone’s feedback who is using checklist techniques – particularly as an enhancement / discipline roadmap in a DevOps shop. I have had some success wrapping automation and instrumentation (as well as figuring out how to prioritize where to add automation and instrumentation) by building checklists for things and would love to talk about it with others who are experimenting with it.

What would “The Matrix” look like now?

All of the recent talk about matrix organizations has gotten me thinking a bit about “The Matrix” – the movie…

That movie came out in March of 1999 with an R rating.  It was therefore targeting folks born in the early 1980s or earlier.  A demographic that grew up with the popular image of computers – at least very large ones – as having “green screen” interfaces.  Despite the proliferation of WIMP GUIs in the 90s, the classic terminal screen was still a common paradigm when discussing computers.  It is also useful to remember that the web was only a few years old in popular culture when this movie was being designed – which probably started in the 1996-1997 timeframe.  A time when 56K dial-up was the common smokin’-fast access to relatively simplistic web sites.

So, the iconic visualization of “The Matrix” as a big cascade of green characters makes a great deal of sense.

The Matrix image

Since then, we’ve had the explosion of high-speed connectivity to the home and the subsequent advent of rich media websites.  By the way, mobile phones are a LOT cooler than the ones Neo & company were using, too.  In fact, I daresay my iPhone is more powerful/capable than the computer Neo had on his desk…

But as I watched the movie again with my sons, I realized that they had never really seen a green screen terminal except in a movie.  Those glowing green letters were basically as relevant/real to them as a typewriter.  The question that came to me at that point was what visualization would make sense to them?  My unscientific survey resulted in these description ideas:

  • Google or Facebook
  • A cloud.  With lightening.  And maybe some colors.
  • iPad – there’d be an app for that…
  • Video game – the guys watching would be like watching Call of Duty or Battlefield.
  • Maybe an RTS like in Age of Empires

So, there you go.  Of course, by the time the inevitable remake/sequel/reboot comes along, I’m sure we’ll have even cooler paradigms.

DevOps is NOT a Job Title

Given my recent posts about organizational structure, I feel like I need to clarify my stance on this…

You know a topic is hot when recruiters start putting it in job titles.  I do believe that most organizations will end up with a team of “T-shaped people” focused on using DevOps techniques to ensure that systems can be support an Agile business and its development processes.  However, I am not a fan of hanging DevOps on the title of everyone involved.

Here’s the thing, if you have to put it in the name to convince yourself or other people you are doing it, you probably are not.  And the very people you hope to attract may well avoid your organization because it fails the ‘reality’ test.  In other words, you end up looking like you don’t get it.  A couple of analogies come to mind immediately.

  • First, let’s look at a country that calls itself the “People’s Democratic Republic of” somewhere.  That is usually an indicator that it is not any of those modifiers and the only true statement is the ‘somewhere’ part.  Similarly, putting “DevOps Sysadmin” on top of a job description that, just last week, said “Sysadmin” really isn’t fooling anyone.
  • Second, hanging buzzwords on job titles is like a 16 year old painting racing stripes on the four door beater they got as their first car.  With latex house paint.  You may admire their enthusiasm and optimism.  You certainly wish them the best.  But you have a pretty realistic assessment of the car.

Instead, DevOps belongs down in the job description.  DevOps in a job role is a mindset and an approach used to define how established skills are applied.  You are looking for a Release Manager to apply DevOps methods in support of your web applications.  Put it down in the requirements bullet points just as you would put things like ‘familiar with scripting languages’, ‘used to operating in an [Agile/Lean/Scrum] environment]’, or ‘experience supporting a SaaS infrastructure’.

I realize that I am tilting at windmills here.  We went through a spate of “Agile” Development Managers and the number of  “Cloud” Sysadmins is just now tapering.  So, I guess it is DevOps’ turn.  To be sure, it is gratifying and validating to see such proof that DevOps is becoming a mainstream topic.  I should probably adopt a stance of ‘whatever spreads the gospel to the masses’.  But I really just had to get this rant off my chest after seeing a couple of serious “facepalm” job ads.

Agility Comes from Knowledge

One of the members at Agile Austin is fond of saying that ‘the only true source of Agility is knowledge’.  I think that is very true in a lot of situations.  The more you know and understand, the more adaptable you can be.  It might be  a geek-spun buzzword version of the old aphorism that “knowledge is power”, but that doesn’t mean that it is in any way bad.  Indeed, old aphorisms, rephrased or not, stand the test of time because they speak to human nature.  For all of our technology, we’re still pretty much the same.

So, where does this cultural comment hit DevOps?  Many places, really, but today I am going to pick on the fact that Agile and DevOps require participants to know and understand more than what would have been present in their traditional job role.  It is no longer OK to just be the best coder or sysadmin or architect.  You have to maintain a much higher level of generalization to be effective in your specialized job role.

This can hurt people’s heads a bit.  Particularly in larger organizations where the message has been to specialize and be the best [technical role] that you can be.  The irony is that larger organizations tend to have large and complex application systems.  So, they have traditionally compensated by having teams of people who specialize in fitting things together across the specialties.  While this certainly works, there is often a lot of time spent on rework and polish to get the pieces to all fit together.  That also implies time, which directly impacts the responsiveness (agility) of the development organization to the business.

Now those organizations are faced with needing to retool their very culture (and the management structures entwined within it) to place some amount of value on generalization for their people.  That means deliberately encouraging staff to learn more and more about the “big picture” from all aspects – not just technical.  That means deliberately DIS-couraging isolationism in specific disciplines.  It also means deliberately blowing up organizational fiefdoms before they take hold.  And it means rewarding behaviors that focus on achieving the larger goals of the organization while rooting out incentives on very parochial behaviors

The funny thing is that this is not new.  When I was first a manager, I worked for a company obsessed with this sort of thing.  We were very high on the notion of ‘lifetime learning’ and organizational development in general.  It was a way that the company encouraged/taught/focused people to aggressively adapt to the changes that came with fast growth.  That company returned more to its investors than any other tech startup I have seen in a long time.  We never worried about solving problems -we all understood a lot about the business and had a common understanding of how it worked.  It was easy and fast to get people working on a problem because we did not have to waste time bringing people ‘up to speed’.  We knew how it fit together and understood the value of proactively pushing it into newbies’ heads.  They wouldn’t be newbies for long, after all.

This month’s book club selection  at Agile Austin is focusing on Peter Senge’s keystone work in this area – “The Fifth Discipline”.  I have not read it in a while, but it is damn good to hear people focusing on this stuff again.  I really liked a lot of the concepts in that book; probably because they are relatively timeless as it relates to human nature / behavior.