Managing To Get To Agile is Harder

There are a variety of jokes or snarky comments made about management.  A lot of them are modern echoes of Industrial era factory practices.  And that is the hard thing – most large corporate management doctrines are merely evolutions of principles established in the industrial era.  That era was characterized by hyper-specialization of jobs so that the companies could achieve economies of scale, from which competitive advantage could be derived.  That, of course, led to a whole system of policies, rules, and even laws that reflected that era.

Of course, competitive advantage from economies of scale is not a focus today.  Business moves too fast.  They key advantages are responsiveness to the changing market and the ability to exploit shorter-lived opportunities.  Even manufacturing processes have evolved to enable faster retooling to serve different markets with the same facility and equipment.  Agile development and DevOps are how the need for business agility gets reflected in IT.

Despite these market dynamics, the people management doctrine for most businesses still looks like an old-school industrial approach.  There is still a push to specialized roles and to do appraisals within a narrow set of rules for that specialty.  The generalization that is intrinsic to agile execution is not valued and corporate structures often limit what lower and middle level managers can do in terms of incenting the behaviors of folks on their teams.  A lot of that is based on the fact that the HR structures are designed to stay within a narrow band of safe and easily defended legal structures so that a pissed-off employee can’t really sue if there is a problem with perceived fairness.

These things are easier to achieve in smaller companies where there is not the same legacy and, frankly, there just isn’t as much sue-able money.  It is naive to not think about these aspects and would be patently unfair to simply slam things for being the way they are.  Things are the way they are for a lot of very good and very complex reasons; some of which are beyond the direct control of the business.

It is solvable, of course.  You can use creative organizational structures that put nominal specialists ‘on assignment’ in other specialty teams.  A popular extension of this is full-on matrix management.  A variation on the matrix is to have people farmed out to project teams in a similar manner to how consulting companies do things.

The common point of these solutions is that they require managers to work together in new ways.  They require managers of managers to encourage good team dynamics for their teams of managers.  They require a lot of communication and interaction among managers and to scattered teams.  Lower-level managers will have to be empowered to invest in and coach their teams into adaptable groups with good team dynamics.  It means the managers will have to be a lot more “hands-on” and leader-like rather than manager-like than they might be used to.

That is a lot harder for all levels of management than scientifically managing a group of theoretically interchangeable specialists.  The odds are that the managers are not trained for leadership skills relative to management skills.  So, as the organization goes Agile, make sure that the investment includes an investment in how to actually manage in an Agile environment.  It really is different.  It really takes an investment.  And it really will eventually take structural change.

Agility Comes from Knowledge

One of the members at Agile Austin is fond of saying that ‘the only true source of Agility is knowledge’.  I think that is very true in a lot of situations.  The more you know and understand, the more adaptable you can be.  It might be  a geek-spun buzzword version of the old aphorism that “knowledge is power”, but that doesn’t mean that it is in any way bad.  Indeed, old aphorisms, rephrased or not, stand the test of time because they speak to human nature.  For all of our technology, we’re still pretty much the same.

So, where does this cultural comment hit DevOps?  Many places, really, but today I am going to pick on the fact that Agile and DevOps require participants to know and understand more than what would have been present in their traditional job role.  It is no longer OK to just be the best coder or sysadmin or architect.  You have to maintain a much higher level of generalization to be effective in your specialized job role.

This can hurt people’s heads a bit.  Particularly in larger organizations where the message has been to specialize and be the best [technical role] that you can be.  The irony is that larger organizations tend to have large and complex application systems.  So, they have traditionally compensated by having teams of people who specialize in fitting things together across the specialties.  While this certainly works, there is often a lot of time spent on rework and polish to get the pieces to all fit together.  That also implies time, which directly impacts the responsiveness (agility) of the development organization to the business.

Now those organizations are faced with needing to retool their very culture (and the management structures entwined within it) to place some amount of value on generalization for their people.  That means deliberately encouraging staff to learn more and more about the “big picture” from all aspects – not just technical.  That means deliberately DIS-couraging isolationism in specific disciplines.  It also means deliberately blowing up organizational fiefdoms before they take hold.  And it means rewarding behaviors that focus on achieving the larger goals of the organization while rooting out incentives on very parochial behaviors

The funny thing is that this is not new.  When I was first a manager, I worked for a company obsessed with this sort of thing.  We were very high on the notion of ‘lifetime learning’ and organizational development in general.  It was a way that the company encouraged/taught/focused people to aggressively adapt to the changes that came with fast growth.  That company returned more to its investors than any other tech startup I have seen in a long time.  We never worried about solving problems -we all understood a lot about the business and had a common understanding of how it worked.  It was easy and fast to get people working on a problem because we did not have to waste time bringing people ‘up to speed’.  We knew how it fit together and understood the value of proactively pushing it into newbies’ heads.  They wouldn’t be newbies for long, after all.

This month’s book club selection  at Agile Austin is focusing on Peter Senge’s keystone work in this area – “The Fifth Discipline”.  I have not read it in a while, but it is damn good to hear people focusing on this stuff again.  I really liked a lot of the concepts in that book; probably because they are relatively timeless as it relates to human nature / behavior.

Change Mis-management (Part 3)

For part three of the Change Mis-management series, I want to pick on the tradition of NOT keeping system management scripts in version control.  This is a fascinating illustration of the cultural difference between Development and Operations.  Operations is obsessed with ensuring stability and yet tolerates fairly loose control over things that can decimate the environment at the full speed of whatever machine happens to be running the script.  Development is obsessed with making incremental changes to deliver value and would never tolerate such loose control over their code.  I have long speculated that this level of discipline for Development is in fact a product of the fact that they have to deal with and track a LOT of change.

Whatever the cause and whether or not you believe in Agile and/or the DevOps movement, this is really a fundamental misbehavior and we all know it.  There really is no excuse for not doing it.  Most shops have scripts that control substantial swaths of the infrastructure.  There are various application systems that depend on the scripts to ensure that they can run in a predictable way.  For all intents and purposes these scripts represent production-grade code.

This is hopefully not a complex problem to explain or solve.  The really sad part is that every software delivery shop of any size already has every tool needed to version manage all of their operations scripts.  There is no reason that there can’t be an Ops Scripts tree in your source control system.  Further, those repositories are often set up with rules that force some sort of notation for the changes that are being put into those scripts and will track who checked it in, so you have better auditing right out of the gate.

Further, you now have a way to, if not know, then at least have a good idea, what has been run on the systems.  That is particularly important if the person who ran the script is not available for some reason.  If your operations team can agree on the doctrine always running the ‘blessed’ version and never hack it on the filesystem, then life will get substantially better for everyone.  Of course, the script could be changed after checkout and the changes not logged.  Any process can be circumvented – most rather easily when you have root.  The point is to make such an event more of an anomaly.  Maybe even something noticeable – though I will talk about that in the next part of this series.

This is really just a common-sense thing that improves your overall organizational resilience.  Repeat after me:

  • I resolve to always check in my script changes.
  • I resolve to never run a script unless I have first checked it out from source to make sure I have the current version.
  • I resolve to never hack a script on the filesystem before I run it against a system someone other than me depends on.  (Testing is allowed before check-in; just like for developers)
  • I resolve to only run scripts of approved versions that I have pulled out of source control and left unmodified.

It is good, it is easy, it does not take significant time to do and saves countless time-consuming screw-ups.  Just do it.

Change Mis-management (Part 2)

In my last post, I mentioned three things that need to be reliably happening in order to achieve a faster, more predictable release process.  The first one was to unify change management for the system between the Ops and Development sides.  On the surface,  this sounds like a straightforward thing.  After all a tool rationalization exercise is almost routine in most shops.  It happens regularly due to budget reviews, company mergers or acquisitions, etc.

Of course, as we all know, it is never quite that easy even when the unification is happening for nominally identical teams.  For example, your company buys another and you have to merge the accounting system.  Pretty straightforward – money is money, accountants are accountants, right?  Those always go perfectly smoothly, right?  Right?

In fact, unifying this aspect of change management can be especially thorny because of the intrinsic differences in tradition between the organizations.  Even though complex modern systems evolve together as a whole, few sysadmins would see themselves as ‘developing’ the infrastructure, for example.  Additionally, there are other problems.  For instance, operations are frequently seen as service providers who need to provide SLAs for change requests that are submitted through the change management system.  And a lot of operational tracking in the ticketing system is just that – operational – and it does not relate to actual configuration changes or updates to the system itself.

The key to dealing with this is the word “change”.  Simplified, system changes should be treated in the same way as code changes are handled.  Whatever that might be.  For example, it could be a user story in the backlog.  The “user” might be a middleware patch that a new feature depends on and the work items might be around submitting tickets to progressively roll that up the environment chain into production.  The goal is to track needed changes to the system as first-class items in the development effort.  The non-change operational stuff will almost for sure stay in the ticketing system.  A simple example, but applying the principle will mean that the operating environment of a system evolves right along with its code – not as a retrofit or afterthought when it is time to deploy to a particular environment or there is a problem.

The tool part is conceptually easy – someone manages the changes in the same system that backlog/stories/work items are handled.  However, there  is also the matter of the “someone” mentioned in that sentence.  An emerging pattern I have seen in several shops is to cohabitate an Ops-type with the development team.  Whether these people are the ‘ops representative’ or ‘infrastructure developers’ their role is to focus on evolving the environment along with the code and ensuring that the path to production is in sync with how things are being developed.  This is usually a relatively senior person who can advise on things, know when to say no, and know when to push.  The real shift is that changes to the operating environment of an application become first-class citizens at the table when code or test changes are being discussed and they can now be tracked as part of the work that is required to deliver on an iteration.

These roles have started popping up in various places with interesting frequency.  To me, this is the next logical step in Agile evolution.  Having QA folks in the standups is accepted practice nowadays and organizations are figuring out that the Ops guys should be at the table as well.  This does a great job of pro-actively addressing a lot of the release / promotion headaches that slow things up as things move toward production.  Done right, this takes a lot stress and time out of the overall Agile release cycle.

A Sports Analogy for DevOps Thinking

I have been known to go off a bit on how typical management culture self-defeats on its attempts to execute more quickly.  This is a pretty common cultural problem as much as management problem.  Here is some perspective.  Football (American Football, that is) has an ineligible receiver rule.  The roles of the individual players, on offence especially, are so specialized that only certain people can receive a pass.  Seriously.  Then there are very specialized ‘position coaches’ who make sure that individual players focus on the subset of skills they need to perform their specific job.  There is also very little cross-training.  This works fine in the very iterative, assembly-line way the game is played.  Baseball is the same way – very specialized.  And both are quintessentially 20th century American games that grew during (and reflect) an industrial mindset.

However, business is a free-flowing process.  There are no ‘illegal formations’.  Some work.  Some don’t.  The action does not stop.  A better game analogy for releasing software (or running IT, or even the whole business) in the modern era would be Soccer (Football in the rest of the world).  The game constantly flows.  There are no codified rules about who passes the ball to whom.  The goalie is actually only special in that he can use his hands – when standing in his little area.  There is no rule that says it is illegal for him to come out of that area and participate as a regular player.  This occasionally happens in the course of elimination tournaments, in fact.

I draw this comparison to point out the relative agility of a soccer team to adapt to an ever changing game flow.  Football teams only function when there is a very regulated flow of events and where there are a number of un-realistic throttles and limits on the number and types of events.  When you compare this to how most IT shops are set up, you find a lot of Football teams and very few soccer teams.  And guess which environments are seen as more adaptable to the needs of their overall organizations…

Authors note:  I picked soccer over hockey and basketball principally because the latter two sports rely heavily on rapid substitution and aggressive use of timeouts.  Those are luxuries that modern online business most certainly does not have.  Substitutions happen slowly in the enterprise and there darn sure are no timeouts.

“Enterprise” DevOps

Anyone in the IT industry today will note that much of the DevOps discussion is focused on small companies with large websites – often tech companies providing SaaS solutions, consumer web services, or some other solution content.  There is another set of large websites, supported by large technology organizations that have a need for DevOps.  These are large commerce sites for established retailers, banks, insurance companies, etc.  Many of these companies have had large-scale online presences and massive software delivery organizations behind them for well over a decade now.  Some of these enterprises would, in fact, qualify among the largest software companies on the planet dwarfing much more ‘buzz-worthy’ startups.  It also turns out that they are pretty good at delivering to their online presence predictably and reliably – if not as agilely – as they would like.

Addressing the agility challenge in an enterprise takes a different mindset than it does in a tech startup.  This has always been the case, of course.  Common sense dictates that solving a problem for 100 people is intrinsically different than for 10,000.  And yet so many discussions focus on something done for a ‘hot’ website or maybe a large ‘maverick’ team in a large organization.  And those maverick team solutions more often than not do not scale to the enterprise and have to be replaced.  Of course, that is rarely discussed or hyped.  They just sort of fade away.

It is not that these faded solutions are bad or wrong, either.  A lot of times, the issue is simply that they only looked at part of the problem and did not consider the impact of improving that part on the other parts of the organization.  In large synchronized systems, you can only successfully accelerate or decelerate if the whole context does so together.  There are many over-used analogies for this scenario, so let’s use the one about rowers on a boat to make this point and move on.

Let’s face it, large organizations can appear to be the “poster children” for silos in their organizational structure.  You have to remember, though that those silos often exist because the organization learned hard lessons about the value of NOT having someone maniacally focused on one narrow, specialized activity.  Think about this as your organization grows, runs into problems or failures, and puts infrastructure in place to make sure it doesn’t happen again.

One of the main points of this blog is going to be to look at the issues confronted by organizations that are or are becoming ‘enterprises’ and how they can balance the need for the Agile flexibility of DevOps with the pragmatic need to synchronize large numbers of people.