Category Archives: DevOps
Management, Leadership, Continuous Improvement, and DevOps
There is a management aphorism that “If you can’t measure it, you can’t manage it.” That gets attributed to Peter Drucker, though it is not actually what he said. What he said was “If you can’t measure it, you can’t IMPROVE it”. That’s an important difference as we talk about bringing DevOps and its related practices and disciplines to enterprises.
If you think about it, measuring with an intent to improve something is a much more challenging statement. Management of something is usually about keeping it within known parameters – maintaining a certain status quo. That is not to imply that Management is not valuable – it is absolutely crucial in maintaining a level of rigor to what is going on. But Improvement deliberately pressures the status quo to redefine the status quo at a new point. In fact, redefining a status quo to a better state sounds an awful lot like what we talk about in the DevOps movement.
Improvement always sounds very cool, but there is also an icky truth about Improvement – it is a relative thing. There are no easy answers for questions like:
- ‘What point are we improving to?’
- ‘How do we know when we have improved enough for a while in one area?’
- ‘What is the acceptable rate of progress toward the improved state?’
- and so on…
Those must be answered carefully and the answers must be related to each other. Answering those questions requires something different from Management. It requires Leadership to provide a vision. That brings us to another famous Drucker quote: “Management is doing things right; leadership is doing the right things.”
That quote is a sharp observation, but it does not really judge one as ‘better’ than the other. Leadership is exciting and tends to be much more inspirational at the human level. It therefore usually gets more attention in transitional efforts. However, without the balance of Management, Leadership may not be a sustainable thing in a business situation.
In terms of DevOps and the discipline of Continuous Improvement, the balance of these two things can be articulated with relative clarity. Leadership provides the answers for the hard questions. Management provides the rigor and discipline to maintain steady progress toward the new status quo defined by those answers. Put more simply, Leadership sets forth the goals and Management makes sure we get to those goals.
There is a certain bias in DevOps where we value Leadership – the desire to set and pursue improvement of our daily tech grind. Maybe that is because DevOps is an emergent area that requires a certain fortitude and focus on the doing the right things to get it started. And Leadership is certainly good for that. However, I also work with organizations where the well-intended, but unfocused, efforts of leadership-minded people lead to chaos. And those DevOps ‘transformations’ tend to flounder and even make things worse for the people involved. Which is not very DevOps at all.
I have seen enough of these that I have been spending time lately trying to organize my thoughts on the balance point. In the meantime, a piece of advice when you want to pursue a great idea / innovation – figure out how you want to answer the hard questions so you can make them stick in your organization and truly reap the benefit of that idea. Then, you can get on to the next one, and the next one, and the next one – to achieve the steady improvement of your status quo that is near the heart of DevOps culture.
This article is also on LinkedIn here: https://www.linkedin.com/pulse/management-leadership-continuous-improvement-devops-dan-zentgraf
Predictability is Predictably Hard
In order to successfully automate something, the pieces being automated have to be ‘predictable’. I use ‘predictable’ here – rather than ‘consistent’ – deliberately. A ‘predictable’ environment means you can anticipate its state and configuration. ‘Consistent’ gets misconstrued as ‘unchanging’, which is the opposite of what Agile software delivery is trying to achieve.
Consider deploying a fresh build of an application into a test environment. If you cannot predict what the build being deployed looks like and how the environment will be set up, why would you expect to reliably be able to get that build working in that environment in a predictable window of time? And yet, that is exactly what so many teams do.
The proposed solution is usually to automate the deployment. That, however, leads to its own problems if you do not address the predictability of the underlying stuff being automated. I talk to teams with stories about how they abandoned automation because it ‘slowed things down’ or ‘just did not work’. That leads teams to say, and in some cases believe, that their applications are ‘too complex to deploy automatically’.
At the heart of achieving predictability of the code packages and environments is the fact that they are different teams. Somehow it is harder to collaborate with the developers or operations team than it is to spend months attempting to build a mountain of hard to maintain deployment code. A mountain of code that stands a good chance of being abandoned, by the way. That represents months of wasted time, effort, and life because people working on the same application do not collaborate or cooperate.
And we get another example of why so many DevOps conversations become about culture rather than technology… Which really sucks, because that example is at the expense of a fair bit of pain from the real people on those teams.
The lesson here is that there is no skipping the hard work of establishing predictability in the packaging of the code and environments before charging into automating deployments. We are in an era now where really good packaging and configuration management tools are very mature.
And the next generation of tools that unifies the code and environment changes into immutable, deployable, and promotable artifacts is coming fast. But even with the all of these awesome tools, cross-disciplinary experts will have to come together to contribute to the creation of predictable versions of those artifacts.
The ‘C’ in CAMS stands for “Collaboration”. There are no shortcuts.
This article is also on LinkedIn here: https://www.linkedin.com/pulse/predictability-predictably-hard-dan-zentgraf/
Old Habits Make DevOps Transformation Hard
My father is a computer guy. Mainframes and all of the technologies that were cool a few decades ago. I have early memories of playing with fascinating electro-mechanical stuff at Dad’s office and its datacenter. Printers, plotters, and their last remaining card punch machine in a back corner. Crazy cool stuff for a kid if you have ever seen that gear in action. There’s all kinds of noise and things zipping around.
Now the interesting thing about talking to Dad is that he is seriously geeky about tech. Always fascinated by the future of how tech would be applied and he completely groks the principals and potentials of new technology even if he does not really get the specific implementations. Recently he had a problem printing from his iPhone. He had set it up a long time ago and it worked great. He’s 78 and didn’t bat an eye at connecting his newfangled mobile device to his printer. What was interesting was his behavior when the connection stopped working. He tried mightily to fix the connection definition rather than deleting the configuration and simply recreating it with the wizard. That got me thinking about “fix it” behavior and troubleshooting behavior in IT.
My dad, as an old IT guy, had long experience and training that you fix things when they got out of whack. You certainly didn’t expect to delete a printer definition back in the day – you would edit the file, you would test it, and you would fiddle with it until you got the thing working again. After all, you had just the relatively few pieces of equipment in the datacenter and offices. That makes no sense in a situation where you can simply blow the problematic thing away and let the software automatically recreate it.
And that made me think about DevOps transformations in the enterprise.
I run into so many IT shops where people far younger than my dad struggle mightily to troubleshoot and fix things that could (or should) be easily recreated. To be fair – some troubleshooting is valuable and educational, but a lot is over routine stuff that is either well known, industry standard, or just plain basic. Why isn’t that stuff in an automated configuration management system? Or a VM snapshot? Or a container? Heck – why isn’t it in the Wiki, at least?! And the funny thing is that these shops are using virtualization and cloud technologies already, but treat the virtual artifacts the same way as they did the long-lasting, physical equipment-centric setups of generations past. And that is why so many DevOps conversations come back to culture. Or perhaps ‘habit’ is a better term in this case.
Breaking habits is hard, but we must if we are to move forward. When the old ways do not work for a retired IT guy, you really have to think about why anyone still believes they work in a current technology environment.
This article is on LinkedIn here: https://www.linkedin.com/pulse/old-habits-make-devops-transformation-hard-dan-zentgraf
Your Deployment Doc Might Not be Useful for DevOps
One of the most common mistakes I see people making with automation is the assumption that they can simply wrap scripts around what they are doing today and be ‘automated’. The assumption is based around some phenomenally detailed runbook or ‘deployment document’ that has every command that must be executed. In ‘perfect’ sequence. And usually in a nice bold font. It was what they used for their last quarterly release – you know, the one two months ago? It is also serving as the template for their next quarterly release…
It’s not that these documents are bad or not useful. They are actually great guideposts and starting points for deriving a good automated solution to releasing software in their environment. However, you have to remember that these are the same documents that are used to guide late night, all hands, ‘war room’ deployments. The idea that their documented procedures are repeatablly automate-able is suspect, at best, based on that observation alone.
Deployment documents break down as an automate-able template for a number of reasons. First, there are almost always some number of undocumented assumptions about the state of the environment before a release starts. Second, using the last one does not account for procedural, parameter, or other changes between the prior and the upcomming releases. Third, the processes usually unconsciously rely on interpretation or tribal knowledge on the part of the person executing the steps. Finally, there is the problem that steps that make sense in a sequential, manual process will not take advantage of the intrinsic benefits of automation, such as parallel execution, elimination of data entry tasks, and so on.
The solution is to never set the expectation – particularly to those with organizational power – that the document is only a starting point. Build the automation iteratively and schedule multiple iterations at the start of the effort. This can be a great way to introduce Agile practices into the traditionally waterfall approaches used in operations-centric environments. This approach allows for the effort that will be required to fill in gaps in the document’s approach, negotiate standard packaging and tracking of deploy-able artifacts, add environment ‘config drift’ checks, or any of the other common ‘pitfall’ items that require more structure in an automated context.
This article is also on LinkedIn here: https://www.linkedin.com/pulse/your-deployment-doc-might-useful-devops-dan-zentgraf
Automation and the Definition of Done
“Done” is one of the more powerful concepts in human endeavor. Knowing that something is “done“ enables us to move on to the next endeavor, allows us to claim compensation, and sends a signal to others that they can begin working with whatever we have produced. However, assessing done can be contentious – particularly where the criteria are undefined. This is why the ‘definition of done’ is a major topic in software delivery. Software development has a creative component that can lead to confusion and even conflict. It is not a trivial matter.
Automation in the software delivery process forces the team to create a clear set of completion criteria early in the effort, thus reducing uncertainty around when things are ‘done’ as well as what happens next. Though they at first appear to be opposites, with done defining a stopping point and automation being much more about motion, the link between ‘done’ and automation is synergistic. Being good at one makes the team better at the other and vice-versa. Being good at both accelerates and improves the team’s overall capability to deliver software.
A more obvious example of the power of “done” appears in the Agile community. For example, Agile teams often have a doctrine of ‘test driven development’ where developers should write the tests first. Further examples include the procedural concepts for completing the User Stories in each iteration, or sprint, so that the team can clearly assess completion in the retrospective. Independent of these examples, validation-centric scenarios are an obvious area where automation can help underpin “done”. In the ‘test-driven development’ example, test suites that run at various points provide unambiguous feedback over whether more work is required. Those test suites become part of the Continuous Integration (CI) job so that every time a developer commits new code. If those pass, then the build automatically deploys into the integration environment for further evaluation.
Looking a bit deeper at the simple process of automatically testing CI builds reveals how automation forces a more mature understanding of “done”. Framed another way, the fact that the team has decided to have that automated assessment means that they have implicitly agreed to a set of specific criteria for assessing their ‘done-ness’. That is a major step for any group and evidence of significant maturation of the overall team.
That step of maturation is crucial, as it enables better flows across the entire lifecycle. For example, understanding how to map ‘done-ness’ into automated assessment is what enables advanced delivery methodologies such as Continuous Delivery. Realistically, any self-service process, whether triggered deliberately by a button push or autonomously by an event, such as delivering code, cannot exist without a clear, easily communicated, understanding of when that process is complete and how successful it was. No one would trust the automation were it otherwise.
There is an intrinsic link between “Done” and automation. They are mutual enablers. Done is made clearer, easier and faster by automation. Automation, in turn, forces a clear definition of what it means to be complete, or ‘done’. The better the software delivery team is at one, the stronger that team is at the other.
This article is also on LinkedIn here: https://www.linkedin.com/pulse/automation-definition-done-dan-zentgraf
Lean Software Development and Automation
Over the past decade or more, the interest in Agile and Lean topics has grown substantially as businesses have come to see software as a key part of their brand, their customer experience, and their competitive advantage. Understanding the principles at a basic level is relatively straightforward – a testament to how well they are articulated and thought out – however, executing on them can be difficult. One of the key tools in successfully executing around the vision of Lean is exploiting the power of automation. A frequent source of confusion is that automation itself is not a goal – rather, it is a very powerful means to achieving the goal. To clarify the point of automation helper rather than end, this post will look at the Principles of Lean Software Development as defined by Tom and Mary Poppendieck in their seminal work “Lean Software Development: An Agile Toolkit” (2003) and some ways automation enables them.
Principles of Lean Software Development
1- Principle: Eliminate waste
Tracing its way all the way back to the core of Lean manufacturing, this principle is about eliminating unnecessary work, such as fixes, infrequently used features, low-priority requirements, etc. that does not translate to actual customer value. In many ways, this principle underpins all of the others as guiding context for why they are valuable and important in their own right.
Automation carries both direct and indirect value for eliminating waste. In direct terms, it simply cuts cycle times by doing things faster relative to doing them manually. The indirect value is that speed enables the shorter feedback loops and the amplified learning that allow the team to make better decisions faster. Between the direct and indirect value, it is easy to see why there is so much focus on automation among the Lean, Agile and DevOps movements – it is at the core of waste elimination.
2 – Principle: Build Quality In
The notion of ‘building quality in’ deals with the point that it is fundamentally more efficient and cost-effective to build good code from the beginning than to try to ‘test quality in’ later. Testing late in the cycle, even though seen as a norm for years, is actually devastating to software projects. For starters, the churn of constantly fixing things is waste. Further, the chances of introducing all new problems (and thus extending expensive test cycles) increases with each cycle. Finally, the impact to schedules and feature work can be very damaging to customer value. There are numerous other problems as well.
The developers building the system need to have the proper facilities for ensuring that the code they have written actually meets the standard. Otherwise, they are relying on downstream tests to put the quality in and have thus violated this principle. Since the developers are human and their manual work, including any manual testing they might do, is therefore relatively slow error prone, automation is the only practical answer for ensuring they can validate their work while they are still working on it. This takes the form of techniques like Continuous Integration, automated tests during build cycles, and automated test suites for the integrated system once the changes have passed their unit tests. Automation provides the speed and consistency required to operate with such a high level of discipline and quality of work.
3 – Principle: Create Knowledge
This principle, sometimes written as “Amplify Learning”, addresses the point that the act of building something teaches everyone involved new ways of looking at both the original problem as well as what the best solution would be. Classic ‘omniscient specification’ at the beginning of a project carries the bizarre assumption that the specifier knows and understands all aspects of both problem and solution before writing the first word of the specification. This is obviously very unlikely at best. Lean and Agile address how the team quickly and continuously seeks out this learning, distributes the knowledge to all stakeholders, and takes action to adjust activities based on the new understanding. This behavior is one of the core maxims that really delivers the ‘agility’ in Agile.
Automation, as we have seen, provides speed and consistency that is not otherwise available. These capabilities serve to create knowledge by enabling the faster, easier collection of data. The data might be technical, in the form of the test results mentioned above as part of “Build Quality In”. A more advanced scenario might be more frequent value assessment – achieved by giving the business owners an easy facility for seeing a completed, or nearly completed, feature sooner – in order to validate the implementation before it is final. Even more advanced variants involve techniques such as “Canary deployments” or A/B testing – in which a limited audience of live customers receives early versions of features in order to analyze their response.
4 – Principle: Defer Commitment
The Defer Commitment principle addresses the point that teams would not want to take a design direction that they later learn was a fundamental ‘dead-end’. This principle is a response to the impact of knowledge creation (Principle #3 above). By delaying decisions that are hard to reverse, the team can dramatically reduce the risk of hitting a ‘dead end’ that might cause expensive rework or even kill the project.
Automation as applied to this principle also reflects the tight relationship to “Create Knowledge“. By exploiting the ability to collect more knowledge faster, and with a more complete context, teams can ensure they have the most thorough set of information possible before making a hard-to-reverse decision. Fast cycle time can also enable experimental scenarios that would not otherwise be possible. Promising architectural ideas can be prototyped in a more realistic running state because it is not too hard or time consuming to do so. This opens the team up to new and potentially better solutions, which would otherwise might have been too risky. Whether about a particular feature, architectural point, or design element, automation enables the team to ensure that it has real data from many deployment and test cycles before committing.
5 – Principle: Deliver Fast
The principle of delivering fast uses the fact that short cycle times mean less waste, more customer feedback, and more learning opportunities. A fast cycle will generally have less waste because there will be less wait time and less unfinished work at any given point in time. The ability to deliver quickly also means that more frequent customer feedback, which, as we have discussed, reduces waste and risk while increasing knowledge. Finally, delivering quickly will cause the team to focus on finishing a smaller number of features for each delivery rather than leaving many undone for a long period.
As has been described, speed is a common byproduct of automation. Getting working code into the hands of stakeholders is a key part of every approach to enabling business agility. In the case of applying automation’s speed to this principle, however, it is better to think in terms of frequency. Speed and frequency are closely related factors, of course – a long cycle time implies less frequent delivery and vice-versa. The point is simply that without automation, frequency will always be much lower. That means less feedback, less learning, and less knowledge for the team to use.
6 – Principle: Respect People
Beyond the obvious human aspects of this point, this principle is actually very pragmatic. The people closest to the work are the ones who know it best. They are best equipped to identify and solve challenges as they come up. Squashing their initiative to do so will diminish the team’s effectiveness and, through missed opportunities over time, the cost-effectiveness and value of the software itself.
In the previous principles, there are numerous statements about giving new capabilities to the team. This principle deals with how automation empowers the members of the team. Indeed, the alternate expression of this principle is “Empower the Team”. That phrasing gets to the crux of how automation does or does not respect the people. Automation itself cannot show respect, but how it is deployed most certainly can. For example, the contrast between a self-service facility that anyone on the team can use at any time and a similar facility for which individuals must ask permission each time they use it speaks volumes about the respect the organization has for the team’s professionalism. It will also drive behavior and self-discipline as the team matures. Consider how the practice in high-maturity Continuous Delivery scenarios has a direct, automatic path from check-in to production while so many shops still require multiple sign-offs. Which team is more likely to be effective, innovative, and efficient?
7 – Principle: Optimize the whole
This principle really focuses on how all of the principles are interrelated across the whole lifecycle. Other management theories, such as the “Theory of Constraints” address this with statements such as ‘you can never be faster than the slowest step’. This principle continues the theme of continuous learning and adjustment that pervades Lean thinking. It deals with the fact that in order to take time and waste out of a system, you need to understand its goal and then continuously and deliberately eliminate the root cause of the largest bottlenecks that prevent the most efficient realization of that goal.
The core of this principle is to optimize delivery of value to the customer – effectively starting with the value and working backward to the start of the process. Automation as a tool in that effort makes optimization substantially easier. When starting with a process that has manual steps, the very act of automating a process is an optimization by itself. Until the entire process flows from end to end with automation, the manual phases will be the more obvious bottlenecks. Then, once the automation spans the whole flow, the automation itself generates metrics for further improvement in cycle time and efficiency in pursuit of delivering value to the customer.
That optimization effort of the last principle takes the discussion somewhat circularly back to the first principle, which is to eliminate waste. That is quite appropriate. Given how interrelated all of these principles are, the discussion of the contributions of automation to them should be similarly interrelated.
This post is also on LinkedIn here: https://www.linkedin.com/pulse/lean-software-development-automation-dan-zentgraf
Grow a (Delivery) Backbone
Feature delivery in a DevOps/Continuous Delivery world is about moving small batches of changes from business idea through to production as quickly and frequently as possible. In a lot of enterprises, however, this proves rather problematic. Most enterprises organize their teams more around functional teams rather than value delivery channels. That leaves the business owners or project managers with the quandary of how to shepherd their changes through a mixture of different functional teams in order to get releases done. That shepherding effort means negotiating for things like schedules, resources, and capacity for each release– all of which take time that simply does not exist when using a DevOps approach. That means that the first step in transforming the release process is abandoning the amoebic flexibility of continuous negotiations in favor of something with more consistency and structure –a backbone for continuously delivering changes.
The problem, of course, is where and how to start building the backbone. Companies are highly integrated entities: disrupting one functional area tends to have a follow-on effect to other functions. For example, dedicating resources from shared teams to a specific effort implies backfilling those resources to the other efforts those resources previously supported. That takes time and money that are better spent. This is why automation very quickly comes to the forefront of any DevOps discussion as the driver of the flow. Automation is relatively easy to stand up in parallel to existing processes, it lends itself to incremental enhancement, and it has the intrinsic ability of multiplying the efforts of a relatively small number of people. It is relatively easy to prove the ROI of automating business processes; which, really, is why most companies got computers in the first place.
So, if automation is the backbone of a software delivery flow, how do you get to an automated flow? Most organizations follow three basic stages as they add automation to their software delivery flow.
- Continuous Integration to standardized artifacts (bonus points for a Repository)
- Automated Configuration Management (bonus for Provisioning)
Once a business has decided that it needs more features, sooner from a software development organization, the development teams will look at Agile practices. Among these practices is Continuous Integration (supporting the principle from the Agile manifesto that “working software is the primary measure of progress”). This will usually take the form of a relatively frequent build process that produces a consumable form of the code. The frequency of delivery creates a need for consistency in the artifacts produced by the build and a need to track the version progression of those artifacts. This can take the form of naming conventions, directory structures, etc., but eventually will lead to a binary artifact repository.
The availability of frequent updates stimulates a demand for more deployments to more consistent test environments and even to production. The difficulties driven by traditional test lab Configuration Management practices, the drift between test labs and production, and the friction pushing back on iterations causes Development to suddenly become a lot more interested in how their code actually gets into the hands of consumers. The sudden attention and spike in workload causes Operations to suddenly become very interested in what Development is doing. This is the classic case for why DevOps became such a topic in IT. The now classic solution for solving this immediate friction point is to use an automated configuration management solution to provide consistent, high-speed enforcement of a known configuration into all environments where the code will run. These systems are highly flexible, model and script-based and enable environment changes to be versioned in a way that is very consistent with how developed code is versioned. The configuration management effort will usually evolve to include integration to provisioning systems that enable rapid creation, deletion, or refresh of entire environments. Automating Configuration Management, then, becomes the second piece of the delivery backbone; however, environmental sprawl and lack of feedback loops quickly show the need for another level.
The third stage of growing a delivery backbone is Orchestration. It deals with the fact that there are so many pieces to modern application systems that it is inefficient and impractical for humans to track and manage them all in every environment to which those pieces might be deployed. Orchestration systems deal with deliberate sequencing of activities, integration with human workflow, and management of the pipeline of changes in large, complex environments. Tools that deal with this level of activity address the fundamental fact that, even in the face of high-speed automated capabilities, some things have to happen in a particular order. This can be due to technical requirements, but it is often due to fundamental business requirements: coordinating feature improvements with consuming user communities, coordinating with a marketing campaign, or the dreaded regulatory factors. Independent of the reason or need, Orchestration tools allow teams to answer questions of ‘what is in which environment’, ‘are the dependent pieces set’, and ‘what is the actual state of things over there’. In other words, as the delivery backbone grows, Orchestration delivers the precise control over the backbone and enables the organization to capture the full value of the automation.
These three stages of backbone growth, perhaps better termed ‘evolution’ may take slightly different forms in different organizations. However, they consistently appear as a natural progression as DevOps thinking becomes pervasive in organizations and those organizations seek to mature and optimize their delivery flows. Organizations seeking to accelerate their DevOps adoption can use variations on this pattern to deliberately evolve or accelerate the growth of their own delivery backbone.
This article is also on LinkedIn here: https://www.linkedin.com/pulse/grow-delivery-backbone-dan-zentgraf
The “Other” Value Levers of Automation – Part 4 – Traceability
Computers are far better at keeping records than humans and good logs are a crucial part of getting value from automating anything. Sometimes this aspect of automation gets lumped in with logging, but there is a difference between recording events and providing traceability. Both have value – the history of what happens in a system is important for a range of reasons ranging from the reactive to the proactive. On the reactive end, this record provides root cause analysis – an understanding of who did what and what happened. As things shift toward the proactive end of things, the valuable information can be used to trace how well an automated process is working, identify how it is evolving, and where it can be improved.
Starting at the basic end of the automation traceability spectrum it the simple concept of access and event logs. The very word ‘traceability’ often calls to mind the idea of auditors, investigators, and even inquisitors seeking to answer the question ‘who did what and when did they do it?’ In some organizations this is a very critical part of the business and is a valuable part of automation because it makes answering this question much easier. There is no time lost by having staff dredge up records and history. The logs are available to be turned into reports by anyone who might be interested. The productivity saved by letting people get on with their work while ensuring that those whose work it is to ensure the business meets its regulatory requirements can also get on with theirs. It actually is a true win-win, even if it is an awkward topic at times.
The other great reactive value lever of traceability in an automated environment is that it eases root cause analysis when problems occur. No system is perfect and they will always break down. The automation may even work perfectly, but still let an unforeseen problem escape into production. Good records of what happened facilitate root cause analysis. That saves time and trouble as engineers seek to figure out how to fix the problem at hand and are then tasked with making sure that the problem can never happen again. With good traceability, both sides of the task are less costly and time-consuming. Additionally, the resultant fix is more likely to be effective because there is more and better information available to create it.
Closely related to using traceability for root cause analysis and fixes is the notion of ensuring the automated process’ own health. Is there something going on with the process that could cause it to break down? This is much like a driver noticing that their car is making a new squeaking noise and taking it to the mechanic before major damage is done. The benefit of catching a potential problem early is, of course, that it can be dealt with before it causes an unplanned, costly disruption.
The fourth way that traceability makes automation valuable is that it provides the data required to perform continuous improvement. This notion is about being able to use the data produced by the automation to make something that is working well work better. While ‘better’ may have many definitions depending on the particular context or circumstance being discussed, there can be no structured way of achieving ANY definition of ‘better’ without being able to look at consistent data. And there are few better ways to get consistent data than to have it produced automatically as part of the process on which it is reporting.
Reaching the more proactive end of this spectrum requires time and a consistent effort to mature the tools, automations, and organization. However, traceability of automation builds on itself and is, in fact, the one of the three levers discussed in these posts that has the potential to build progressively more value the longer it is in use with no clear upper limit. That ability to return progressive value makes it worth the patience and discipline required.
The “Other” Value Levers of Automation – Part 3 – Everybody is Empowered
Implicit in DevOps automation is the idea that the decision to make technical changes should be delegated to non-experts in the first place. Sure, automation can make an expert more productive, but as I discussed in my last post, the more people who can leverage the automation, the more valuable the automation is. So, the next question is how to effectively delegate the automation so that the largest number of people can leverage it – without breaking things and making others non-productive as a result.
This is a non-trivial undertaking that becomes progressively more complex with the size of the organization and the number of application systems involved. For bonus points, some industries are externally mandated to maintain a separation of duties among the people working on a system. There needs to be a mechanism through which a user can execute an automated process with higher authority than they normally have. Those elevated rights need to last only for time when that execution is running and limit the ability to affect the environment to a scope that is appropriate. Look at it this way – continuous delivery to production does not imply giving root to every developer on the team so they can push code. There are limits imposed by what I call a ‘credentials proxy’.
A credentials proxy is simply a mechanism that allows a user to execute a process with privileges that are different, and typically greater than, those they normally have. The classic model for this is the 1986 wonder tool _sudo_. It provides a way for a sysadmin to grant permissions to a user or group of users that enable them to run specific commands as some other user (note – please remember that user does NOT have to be root!!). While sudo’s single system focus makes it a poor direct solution for modern, highly distributed environments, the rules that sudo can model are wonderfully instructive. It’s even pretty smart about handling password changes on the ‘higher-level’ account.
Nearly every delivery automation framework has some notion of this concept. Certainly it is nothing new in the IT automation space – distributed orchestrators have had some notion of ‘execute these commands on those target systems as this account’ for just about as long as those tools have existed. There are even some that simply rely on the remote systems to have sudo… As with most things DevOps, the actual implementation is less important than the broader concept.
The key thing is to have an easily managed way to control things. Custom sudo rules on 500 remote systems is probably not an approach that is going to scale. The solution needs to have 3 things. First, a way to securely store the higher permission accounts. Do not skimp here – this is a potential security problem. Next, it needs to be able to authenticate the user making the request. Finally, it needs to have a rules system for mapping the requestors to the automations that they are allowed to execute – whatever form they may take.
Once the mechanics of the approach are handled and understood, the management doctrine can be established and fine tuned. The matrix of requesters and automations will grow over time, so all of the typical system issues of user groups and permissions will come into play. The simpler this is, the better off the whole team will be. That said, it needs to be sophisticated enough to enable managers, some of whom may be very invested in expertise silos, to understand that the system is sufficiently controlled to allow the non-experts to do what they need to do. Which is the whole idea of empowering the team members in the first place – give the team what they need and let them do their work.