How Fast Should You Change the Tires?

I am an unabashed car nut and like to watch a variety of motor racing series. In particular I tend to stay focused on Formula 1 with a secondary interest in the endurance series (e.g Le Mans). In watching several races recently, I observed that the differences in how each series managed tire changes during pit stops carried some interesting analogies to deploying software quickly.

Each racing series has a different set of rules and limitations with regard to how pit stops may be conducted. These rules are imposed for a combination of safety reasons, competitive factors, and the overall viability of the racing series. There are even rules about changing tires. Some series enable very quick tire changes – others less so. The reasons behind these differences and how they are applied by race teams in tight, time competitive situations can teach us lessons about the haste we should or should NOT have when deploying software.

Why tire changes? The main reason is that, like deploying software, there are multiple potential points of change (4 tires on the car – software, data, systems, network with the software). And, in both situations, it is less important how fast you can change just one of them than how fast you change all of them. There is even the variants where you may not need to change all 4 tires (or system components) every time, but you must be precise in your changes.

Formula 1

Formula 1 is a fantastically expensive racing series and features extreme everything – including the fastest pit stops in the business. Sub 4-second stops are the norm, during which all 4 tires are changed. There are usually around18 people working on the car – 12 of whom are involved in getting the old tires off and clear while putting new tires on (not counting another 2 to work the jacks). That is a large team, with a lot of expensive people on it, who invest a LOT of expensive time practicing to ensure that they can get all 4 tires changed in a ridiculously short period of time. And they have to do it for two cars with potentially different tire use strategies, do it safely, while competing in a sport that measures advantage in thousandths of a second.

But, there is a reason for this extreme focus / investment in tire changes. The tire changes are the most complex piece of work being done on the car during a standard pit stop. Unlike other racing series, there is no refueling in Formula 1 – the cars must have the range to go the full race distance. In fact, the races are distance and time limited, so the components on the cars are simply engineered to go that distance without requiring service, and therefore time, during the race. There are not even windows to wash – it is an open cockpit car. So, the tires are THE critical labor happening during the pit stop and the teams invest accordingly.

Endurance (Le Mans)

In contrast to the hectic pace of a Formula 1 tire change is Endurance racing. These are cars that are built to take the abuse of racing for 24 hours straight. These cars require a lot of service over the course of that sort of race and the tires are therefore only one of several critical points that have to be serviced in the course of a race. Endurance racers have to be fueled, have brake components replaced, and the three drivers have to switch out periodically so they can rest. The rules of this series, in fact limit the number of tire wrenches the team can use in the pits to just one. That is done to discourage teams from cutting corners and also to keep team size (and therefore costs) down.

NASCAR

NASCAR is somewhere between Formula 1 and Endurance racing when it comes to tire changes. This series limits tire wrenches to two and tightly regulates the number of people working on the car during a pit stop. These cars require fuel, clean-up, and tires just like the Endurance cars, but generally do not require any additional maintenance during a race, barring damage. So, while changing tires quickly is important, there are other time eating activities going on as well.

Interestingly, in addition to safety considerations, NASCAR limits personnel to keep costs down to help the teams competing in the series afford the costs of doing so. That keeps the overall series competition healthy by ensuring a good number of participants and the ability of new teams to enter. Which, to contrast, is one of the problems that Formula 1 has had over the years.

In comparing the three approaches to the same activity, you see an emerging pattern where ultimate speed of changing tires gets traded based on cost and contextual criticality. These are the same trade-offs that are made in a business when it looks at how much faster it can perform a regular process such as deploy software. You could decide you want sub-four second tire changes, but that would be dumb if your business needs 10 seconds for refueling or several minutes for driver swaps and brake overhauls. And if they do, your four second tire change would look wasteful at best as your army of tire guys stands around and watches the guy fueling the car or the new driver adjusting his safety harnesses.

The message here is simple – understand what your business needs when it comes to deployment. Take the thrill of speed out of it and make an unemotional decision to optimize; knowing that optimal is contextually fastest without waste. Organizations that literally make their living from speed undestand this. You should consider this the next time you go looking to do something faster.

Advertisements

Rollback Addiction

A lot of teams are fascinated with the notion of ‘rollback’ in their environments.  They seem to be addicted to its seductive conceptual simplicity.  It does, of course, have valid uses, but, like a lot of things, it can become a self-destructive dependency if it is abused.  So, let’s take a look at some of the addictive properties of rollback and how to tell if you might have a problem.

Addictive property #1 – everybody is doing it.  One of the first things we learn in technology (Dev or Ops) is to keep a backup of a file when we change something.  That way, we know what we did in case it causes a problem.  In other words, our first one is free and was given to us by our earliest technology mentors.  As a result, everyone knows what it is.  It is familiar and socially acceptable.  We learn it while we are so professionally young that it becomes a “comfort food”.  The problem is that it is so pervasive that people do it without noticing.  They will revert things without giving it a second thought because it is a “good thing”.  And this behavior is a common reason rollback does not scale.  In a large system, where many people might be making changes, others might make changes based on your changes.  So, undoing yours without understanding others’ dependencies, means that you are breaking other things in an attempt to fix one thing.  If you are in a large environment and changes “backward” are not handled with the exact same rigor as changes forward, you might have a problem.

Addictive property #2 – it makes you feel good and safe.  The idea that “you are only as good as your last backup” is pretty pervasive.  So, the ability to roll something back to a ‘known good’ state gives you that warm, fuzzy feeling that it’s all OK.  Unfortunately, in large scale situations with any significant architectural complexity, it is probably not OK.  Some dependency is almost certainly unknown, overlooked or assumed to be handled manually.  That will lead to all sorts of “not OK” when you try to roll back.  If rollback is the default contingency plan for every change you make and you don’t systematically look at other options to make sure it is the right answer, you might have a problem.

Addictive property #3 – It is easy to sell.  Management does not understand the complexity of what is required to implement a set of changes, but they do understand “Undo”.  As a result, it is trivial to convince them that everything is handled and, if there should be a problem with a change, you can just ‘back it out’.  Being able to simplify the risks to an ‘undo’ type of concept can eliminate an important checkpoint from the process.  Management falls into the all to human behavior of assuming there is an ‘undo’ for everything and stops questioning the risk management plan because they think it is structurally covered.  This leads to all sorts of ugliness should there be a problem and the expectation of an easy back-out is not met.  Does your team deliberately check for oversights its contingency plan every time or does it assume that it will just ‘roll it back’?  If the latter, you might have a problem.

As usual, the fix for a lot of this is self-discipline.  The discipline to do things the hard and thorough way that takes just that little bit longer.  The discipline to institutionalize and reward healthy behaviors in your shop.  And, as usual, that goes against a fair bit of human nature that can be very difficult to overcome, indeed.

What would “The Matrix” look like now?

All of the recent talk about matrix organizations has gotten me thinking a bit about “The Matrix” – the movie…

That movie came out in March of 1999 with an R rating.  It was therefore targeting folks born in the early 1980s or earlier.  A demographic that grew up with the popular image of computers – at least very large ones – as having “green screen” interfaces.  Despite the proliferation of WIMP GUIs in the 90s, the classic terminal screen was still a common paradigm when discussing computers.  It is also useful to remember that the web was only a few years old in popular culture when this movie was being designed – which probably started in the 1996-1997 timeframe.  A time when 56K dial-up was the common smokin’-fast access to relatively simplistic web sites.

So, the iconic visualization of “The Matrix” as a big cascade of green characters makes a great deal of sense.

The Matrix image

Since then, we’ve had the explosion of high-speed connectivity to the home and the subsequent advent of rich media websites.  By the way, mobile phones are a LOT cooler than the ones Neo & company were using, too.  In fact, I daresay my iPhone is more powerful/capable than the computer Neo had on his desk…

But as I watched the movie again with my sons, I realized that they had never really seen a green screen terminal except in a movie.  Those glowing green letters were basically as relevant/real to them as a typewriter.  The question that came to me at that point was what visualization would make sense to them?  My unscientific survey resulted in these description ideas:

  • Google or Facebook
  • A cloud.  With lightening.  And maybe some colors.
  • iPad – there’d be an app for that…
  • Video game – the guys watching would be like watching Call of Duty or Battlefield.
  • Maybe an RTS like in Age of Empires

So, there you go.  Of course, by the time the inevitable remake/sequel/reboot comes along, I’m sure we’ll have even cooler paradigms.

DevOps is NOT a Job Title

Given my recent posts about organizational structure, I feel like I need to clarify my stance on this…

You know a topic is hot when recruiters start putting it in job titles.  I do believe that most organizations will end up with a team of “T-shaped people” focused on using DevOps techniques to ensure that systems can be support an Agile business and its development processes.  However, I am not a fan of hanging DevOps on the title of everyone involved.

Here’s the thing, if you have to put it in the name to convince yourself or other people you are doing it, you probably are not.  And the very people you hope to attract may well avoid your organization because it fails the ‘reality’ test.  In other words, you end up looking like you don’t get it.  A couple of analogies come to mind immediately.

  • First, let’s look at a country that calls itself the “People’s Democratic Republic of” somewhere.  That is usually an indicator that it is not any of those modifiers and the only true statement is the ‘somewhere’ part.  Similarly, putting “DevOps Sysadmin” on top of a job description that, just last week, said “Sysadmin” really isn’t fooling anyone.
  • Second, hanging buzzwords on job titles is like a 16 year old painting racing stripes on the four door beater they got as their first car.  With latex house paint.  You may admire their enthusiasm and optimism.  You certainly wish them the best.  But you have a pretty realistic assessment of the car.

Instead, DevOps belongs down in the job description.  DevOps in a job role is a mindset and an approach used to define how established skills are applied.  You are looking for a Release Manager to apply DevOps methods in support of your web applications.  Put it down in the requirements bullet points just as you would put things like ‘familiar with scripting languages’, ‘used to operating in an [Agile/Lean/Scrum] environment]’, or ‘experience supporting a SaaS infrastructure’.

I realize that I am tilting at windmills here.  We went through a spate of “Agile” Development Managers and the number of  “Cloud” Sysadmins is just now tapering.  So, I guess it is DevOps’ turn.  To be sure, it is gratifying and validating to see such proof that DevOps is becoming a mainstream topic.  I should probably adopt a stance of ‘whatever spreads the gospel to the masses’.  But I really just had to get this rant off my chest after seeing a couple of serious “facepalm” job ads.