I grew up in the world of IT OPS and Services. I saw the formulation of ITIL and COBIT and used these frameworks to base our practices. What I missed was the opportunity to get the language right.
Consider: what is the definition of a Defect? Obstruction, shortcoming, imperfection, flaw and other concepts spring readily to mind.
Now what is the definition of an incident (please not the long winded official one)? Something liable to happen resulting from something happening that you wish had not happened.
How is that for a definition or of course you could just say: an incident is the result of a defect. Does this mean that when we have a Change Approval Board we are actually approving defects to become incidents?
Now some of you who will say "we know when things are wrong so we do a risk assessment", have the proper documentation and training in place, and we have added the defect back to the backlog, so the likelihood of incidents will be small or at least not occur for a long time. But most of you will say, "hmmm, yes that is what we are doing". We don't know because our testing is not that robust or we don't consider the impact as we are not certain of all of the interactions between our infrastructure, applications, cloud providers, social media, etc.
DevOps is that great movement that tries to get people to think differently by empowering collaboration, communication and cooperation about how technology can be used to enable an employee or a customer. What would happen if you called all incidents defects (or vice versa) anywhere in the lifecycle of idea to realisation? What would happen if you ensured that the processes you have when things go wrong in love happened when things go wrong anywhere in that lifecycle?
Think of the improved service. Think of changing your culture of technology from “ship then fix” to “fix then ship”. It would help you create in smaller chunks so you could fix faster. In fact, because you were doing things in smaller chunks you would know more and earlier such that if you had to deploy a defect, at least you could be ready to deal with it and the impact on employees and customers.
Defects and incidents: set a target of none in production. No defects or incidents. Treat every part of your lifecycle as impacting someone. Introduce monitoring and alerting across that lifecycle and the capability of people to collaborate together to resolve the issue before there is a major impact. Then you can truly get to continuous delivery of great services.
If you believe this to be difficult, ask Ranger4 to perform a Value Stream Mapping or Assessment exercise of your lifecycle or critical support processes. Let’s do better, faster, safer together.