Computers are far better at keeping records than humans and good logs are a crucial part of getting value from automating anything. Sometimes this aspect of automation gets lumped in with logging, but there is a difference between recording events and providing traceability. Both have value – the history of what happens in a system is important for a range of reasons ranging from the reactive to the proactive. On the reactive end, this record provides root cause analysis – an understanding of who did what and what happened. As things shift toward the proactive end of things, the valuable information can be used to trace how well an automated process is working, identify how it is evolving, and where it can be improved.
Starting at the basic end of the automation traceability spectrum it the simple concept of access and event logs. The very word ‘traceability’ often calls to mind the idea of auditors, investigators, and even inquisitors seeking to answer the question ‘who did what and when did they do it?’ In some organizations this is a very critical part of the business and is a valuable part of automation because it makes answering this question much easier. There is no time lost by having staff dredge up records and history. The logs are available to be turned into reports by anyone who might be interested. The productivity saved by letting people get on with their work while ensuring that those whose work it is to ensure the business meets its regulatory requirements can also get on with theirs. It actually is a true win-win, even if it is an awkward topic at times.
The other great reactive value lever of traceability in an automated environment is that it eases root cause analysis when problems occur. No system is perfect and they will always break down. The automation may even work perfectly, but still let an unforeseen problem escape into production. Good records of what happened facilitate root cause analysis. That saves time and trouble as engineers seek to figure out how to fix the problem at hand and are then tasked with making sure that the problem can never happen again. With good traceability, both sides of the task are less costly and time-consuming. Additionally, the resultant fix is more likely to be effective because there is more and better information available to create it.
Closely related to using traceability for root cause analysis and fixes is the notion of ensuring the automated process’ own health. Is there something going on with the process that could cause it to break down? This is much like a driver noticing that their car is making a new squeaking noise and taking it to the mechanic before major damage is done. The benefit of catching a potential problem early is, of course, that it can be dealt with before it causes an unplanned, costly disruption.
The fourth way that traceability makes automation valuable is that it provides the data required to perform continuous improvement. This notion is about being able to use the data produced by the automation to make something that is working well work better. While ‘better’ may have many definitions depending on the particular context or circumstance being discussed, there can be no structured way of achieving ANY definition of ‘better’ without being able to look at consistent data. And there are few better ways to get consistent data than to have it produced automatically as part of the process on which it is reporting.
Reaching the more proactive end of this spectrum requires time and a consistent effort to mature the tools, automations, and organization. However, traceability of automation builds on itself and is, in fact, the one of the three levers discussed in these posts that has the potential to build progressively more value the longer it is in use with no clear upper limit. That ability to return progressive value makes it worth the patience and discipline required.