DevOps is all about improving the time to market of innovation by improving collaboration between development and operations teams. To date, much of the focus of DevOps discussions has been around automating processes, particularly deployment processes. The success of this approach is dependent on the volume of releases an organisation has appetite to perform - if a new application release is only occurring once every six months, say, then a manual process may suffice - but if deployments are more frequent, even monthly, savings will be achieved through automation. Automation also delivers additional benefits around reduction in errors, auditing and compliance capabilities and ability to troubleshoot/triage problems.
Automating the application release process is a good start, but issues can still happen in production that might not be related to the deployment of code (perhaps there is unauthorised change, or a conflict between systems that was impossible to test in pre-production). Unplanned work is the bane of IT Operations organisations worldwide - it disrupts the schedule defined by management in line with the business' objectives and creates high-pressure situations where individuals are forced to perform heroic tasks often taking them out of hours and eating into their personal lives.
Resolution is dependent on the speed of discovery, diagnosis, and fix. If a critical issue is found quickly, an immediate roll back to the previous version may be recommended.
The key to avoiding this type of high-risk scenario is to quickly identify the problem in production and focus the development team on the issue as soon as possible. Automation is the answer. Production issues must be identified immediately and automatically. The team needs a swift diagnosis in order to quickly fix the issue. The last thing businesses want is for their customers to tell them their systems are broken.
So, the new challenge to DevOps is to minimise the time to discover and diagnose issues in production and effectively use data to continuously improve an application’s value. "Fast Feedback" will separate the high performers from the low performers.
Application Performance Management (APM) tooling provides real-time feedback from an application in production to a central location, where it can be aggregated for alerting and/or reporting. APM, along with automation, further accelerates the time it takes for DevOps teams to get their innovation infront of their customers. Frequent updates can greatly benefit from automated, actionable information to reduce the mean time to fix and improve an application.
Production Incident Detection
Reduce the Mean Time to Fix Applications – Achieve Continuous Quality
It's not always possible to 100% simulate production a production environment, so many organisations sensibly assume that some incidents will appear in production. The highest performing companies are good at discovering and diagnosing production incidents at high speed. The response time requirements are typically measured in days/weeks. Let’s step through what this would look like with APM for a few scenarios:
1) An application is released in production. Very rapidly, failures come back and they are surfaced as work items in an ALM system. The development team has all the information to fix the problem, so they fix it, test it, close the work item and re-release the application to production.
2) An automated deployment tool does the first stage of a multi-phase release into production. Very rapidly, failures come back and one of them is very serious. Based on that issue, the deployment tool halts the deployments. Operations identifies the issue as serious, and redeploys the previous version while development works on a fix.3) After an application is released into production, an infrequent, hard-to-diagnose issue crops up. Based on rules (frequency, people/customers affected, criticality, etc.), the issue is surfaced as a work item in an ALM system and appropriately triaged and assigned by the team lead.
View this video for more about Application Performance Management and DevOps
Reduce the Mean Time to Improve Applications – Achieve Continuous Value
Sharp insight into how users interact with applications in production allow teams hit quality goals, lower maintenance costs and focus on the issues that matter most to their customers. The response time requirements are typically measured in weeks/months. The scenarios might include:
1) The development team needs to prioritize which items in the backlog will get done for the next release. Gaining insight from analytics, they can validate which features would have the greatest positive impact on their customers and prioritize appropriately.
2) The development team would like to deprecate an older feature that is difficult to maintain. Using analytics, they are able to verify that almost none of their customers are still using the older feature and they can safely stop supporting it in future releases.
3) Issues that are not themselves fatal can sometimes set up the conditions that later cause an application to fail. Because of the insight gained from analytics, the development team is able to observe usage patterns that eventually precede failures and fix a problem that was otherwise undiagnosed.
A quick build-measure-learn cycle is not possible if the “measure” portion is not automated. This feedback loop can help your team efficiently validate your backlog priorities against real world usage patterns and prioritize bug fixes in an optimal manner.
DevOps originated with a focus on continuous integration and delivery where development and operations teams work together with the single goal of agile delivery to production. But development and operations teams need to keep doing more. They need to hit quality goals, lower maintenance costs and focus on the issues that matter most to their customers and analytics are a foundational piece. Efficient, actionable, fast feedback (provided in part by analytics) will separate the high performers from the low performers. High performing teams are investing heavily in analytics. The next big thing is here.