What Makes DevOps Different?
We used to spend a lot of time with people trying to define DevOps and have concluded that it means different things to different people - you can see the results of a year's worth of answers to the question: "What's your preferred definition of DevOps?" here. Although it's true that at its very core we can say that DevOps is about delivering more, higher quality IT innovation to the end user faster, it's also true that this is a goal many IT organisations have had for many years - so perhaps a more useful question is: "What's different about DevOps?" or: "Why does DevOps have so much currency in today's climate?" We think the answer to these last two questions is threefold:
1) The increased strategic importance of digitisation means that organisations have a mismatch between developers under pressure to deliver more innovation to end-users than ever before, and their IT operations teams who are tasked with keeping infrastructure (that has evolved over several decades to become very complex and therefore fragile) stable. Add to this that many organisations have arranged themselves so that these two teams often operate in isolation or as silos, and poor communication and collaboration and conflict manifests itself in a number of ways.
2) The prevalence of Agile and ITIL. Agile changed the way we develop from waterfall to iterative and enabled small changes to reach the end user at a vastly improved cadence. DevOps extends this approach to IT Ops with a focus on building processes and systems that support the release of these iterations in a controlled and predictable manner - i.e. not compromising the ITIL controls that protect us from system failure. Automation tools offer capabilities that support controls and compliance by providing secure, role-based access, authorisation workflows and records of system actions.
3) And these tools are also often new technologies that have only become available in the last 4-5 years - around the time that DevOps was 'born'. New tools around Continuous Integration (such as Jenkins), orchestration, such as Puppet, Chef and Ansible, and application release automation like UrbanCode enable us to release predictably on demand, whilst other tools like Green Hat allow us to virtualise services to accelerate integration testing. The newest application performance management solutions, like AppDynamics, provide monitoring at the business transaction level allowing us to predict failure and report on the impact of failure in a way we haven't been able to before.
What People Want to Do When They Do DevOps and How:
Patterns are fabulous things - they allow us to make things repeatable, learn from our mistakes and create best practices. Over the past few years, we've noticed a handful of DevOps goals that we keep seeing over and over again and here they are:
- Release on demand
- Eliminate technical debt and unplanned work
- Fail smart/fast/safe
- Look "outside-in"
- Measure feature value
They sound great, yeah? But how do people do any of this?
1) Release on demand
There was a time when we talked with people about their current and future release cadences in terms of: "So, if you perform 4 releases a month now, what would life look like if you could do 4 a week?!" Now, though, we're focussed on Continuous Delivery and the capability of having software always in a releasable state so that when that new feature is ready to go live - it can go live, regardless of where an organisation might be in a release cycle. This approach was featured in the 2014 State of DevOps report from Puppet Labs and you can read more about how one of our customers, Hiscox, implemented a CD pipeline that enabled them to push 19 releases in their first week after going live, here. Implementing automation for release management means that organisations can:
- Release when they like
- Predictably, consistently and very, very fast
- Rollback (or deploy) in the event of failure to the last known good state instantly
- Provide a full audit trail for compliance purposes
- Incorporate NFRs around performance and security early
- Stop having 'release weekends' - the process becomes like breathing
2) Eliminate technical debt and unplanned work
Many readers of this blog will have read The Phoenix Project, the seminal novel about DevOps which centres around the theory of constraints and managing unplanned work. Technical debt is often the root cause of unplanned work, whether it's a system performance issue or failure that has demanded attention to fix away from resource that had been assigned to work on a new feature, or refactoring of code that is outside of the project's original time and budget constraints. The first step to understanding the impact of technical debt on an organisation is to measure the unplanned work (metrics being a fundamental DevOps concept) - many organisations don't do this today, but it's not difficult to do. Most organisations have timesheet systems (and if they don't, they should) and adding a code for unplanned work (or refactoring) will allow a baseline to be drawn against which future states can be compared. It's arguable whether it's ever possible to fully eliminate technical debt, particularly when we're doing Agile, we often work towards models where the developed code is 'Just Good Enough'. We might want code that is excellent, or perfect, but time and budget are constraints too - as is the need to get innovation to market fast enough to win or compete. But anything we can do to build quality into code is useful - this is where we often talk about shifting left. We know that testing earlier in the cycle means that defects are quicker and easier to fix - and DevOps provides a platform to include Operations earlier on in the development cycle and include non-functional requirements early in the design phase.
And requirements are really key - we speak with a lot of organisations who are concerned about the volume of defects they are experiencing and jump to a conclusion that they would benefit from automated testing. Often we find, upon closer inspection of the processes involved, that there are issues around requirements elicitation and that by making improvements here, the volume of defects reduces; the root cause of the defect is poorly defined requirements.
That's not to say automated testing isn't important - it's essential to streamlining the software development process and building quality into code and a CD pipeline - the important thing is to ensure requirements and integrated into the testing process... and beyond. Tools can really help with this.
3) Fail smart/fast/safe
Two core cultural concepts in DevOps are around blame and failure:
Blame is mostly a negative and unhealthy emotion. Lots of organisations have 'war-rooms' (sometimes dedicated spaces) that are enacted or populated in the case of system failure and a great deal of time can be wasted and bad feeling created while individuals and teams spend time finger-pointing at each other. Agile incorporates the concept of retrospectives at the end of each sprint that goes some way to familiarising individuals and teams with continuous improvement and promotes healthy approaches - but catastrophic failures can still happen. So, how can organisations limit the pain during these 'hair on fire' moments and get back on track fast? Effective application performance management tools not only give preemptive warning of impending doom (and therefore the opportunity to remedy the issue before the systems fail) - they also give vital information about what went wrong that can be shared with the team and make it much quicker and easier to fix - reducing the MTTR (Mean Time to Recover/Resolution/Repair - whichever term you prefer). I particularly enjoy this video from AppDynamics that explains how this happens.
A phrase we often use when talking about DevOps cultures is the 'sense of constantly dancing around failure': although this may on the face look like a bad thing (it sounds risky) what we're actually doing is creating an environment where people feel empowered and safe to innovate; they are not being scared of trying new things. This requires processes and systems that remove or mitigate as much risk as possible by doing things like:
- Automated testing
- Templating deployments
- Being able to instantly redeploy the last known good state
- Having early warning systems for failure
4) Look "outside-in"
This means thinking about everything from your users' perspectives. I mean REALLY (a lot of people have tried to do this for a while but it's hard when you're IN the company to look from the OUTSIDE). Not just when you are eliciting and prioritising requirements and developing and testing them, but monitoring their overall experience. Again, some of the newer application performance management tooling is really strong at this and can give you information not just on response times on different mobile browsers in different geographies but also at the business transaction level - where people are abandoning online applications or shopping carts for example.
A/B Testing is well established in the marketing world - sending an email in two different formats and analysing the results helps marketers make better decisions. But we can do more than this, more than testing whether a red button works better than a blue button on a webpage - we can (if we have integrated our requirements all the way through to release - and are releasing iteratively) measure the impact and take up (or not) of a new feature; how much our users like what we just did.
Closer integration and analytics also enables us to perform sentiment analysis. Some of the companies we work with, for example in insurance, are concerned about their traditional business being eaten up by 'new pretenders'; classic retail outlets are now offering insurance (look at the pamphlets next time you're at the till in the supermarket) and they have a real advantage over traditional insurers in that they have been collecting much more detailed customer data for some time and are able to make more targetted offerings. By tapping into social media and collecting data around sentiment about the brand, product, particular features, companies can increase their competitiveness.
Although analytics isn't always an obvious answer to DevOps' goals, by understanding more about the end user, innovation can be optimised for success and better investment decisions can be made. Cognitive computing (like IBM's Watson) takes this to a whole new level - mining unstructured and structured data collected within and from outside data sources to offer insights to prioritise investment and provide new ways for end users to interact with an organisation's applications.
5) Measure feature value
Moving to iterative, Agile methodologies enables us to track a feature from inception all the way through to live and measure its cost of travel through the process. Features should be specified by end users and the business and to ensure operational efficiency it's imperative to be able to both prioritise the features based on anticipated value to the end user and therefore business and also to report back on the actual value (or not) received over time. Integrating the software manufacturing pipeline and having an effective application monitoring tool that can identify changes at the business transaction level is essential to being able to report back to your stakeholders on the impact of that new piece of code you have put into production.
One of the new questions we're asking in our DevOpsFriday5 initiative is: "What does DevOps look like when it's 'done'?" - maybe next year we'll be asking: "What's after DevOps?". What do you think? What's your key DevOps goal and how many of the 5 we have described here could you describe as 'done' in your organisation?