Ranger4 DevOps Blog

Betfair Won't Gamble with Database Performance

Posted by George Price on Fri, Sep 25, 2015 @ 16:09 PM

Betfair is the world's leading online betting exchange, a concept it pioneered. Driven by cutting-edge technology, Betfair enables customers to choose their own odds and bet against each other, even after an event has started. Betfair processes 7 million transactions a day and 99.9% of these transactions are completed in under a second.

Key Benefits:

  • Captured brief errors other solutions couldn't detect
  • Occupied less than 1% of CPU resource in already highly-stressed serversAutomated testing previously done manually
Why AppDynamics:
  • Production-safe monitoring technology with low overhead
  • Agentless technology required no installation or changes on the monitored database platform
  • Intuitive web GUI for easy information sharing between teams

Challenge: Required fine detail to catch fleeting errors

“Improving visibility into very short duration performance problems was critical for Betfair. In the past, we've seen performance problems in production that affect our customers for no longer than 15 seconds and then go away. But they always return at a later date because we haven't been able to see why they happened or do anything about them,” said Nigel Noble, Sr. Performance DBA at Betfair. “Having a problem that only lasts 15 seconds may not sound that serious but, if it's the wrong 15 seconds, it could seriously impact both our customers and the business.”

“Over the years we have reviewed a number of database monitoring tools but each time have been disappointed to find that the best granularity they could provide in our busy production environment was a 15 minute time slice,” said Noble. These solutions completely missed the information Betfair needed and would “effectively leave us blind,” said Noble.

Noble and his team searched for a database monitoring tool that could deliver on four key attributes: “First, it needed to be able to provide us with a very fine level of detail, so that we could capture very short duration performance issues. Second, it needed to be able to cope with monitoring our huge transaction volumes on various Oracle platforms, without adding significant overhead. Third, it needed to be good at profiling performance during load-tests, allowing us to quickly and easily highlight bottlenecks, and compare differences between multiple tests. And finally, it needed to support not just Oracle, but SQL Server and MySQL.”

Found complete visibility and low overhead in AppDynamics Database Monitoring

“The last thing you want when trying to improve database performance is for your performance monitoring tool to impose a significant overhead, particularly when being implemented on production or highly-stressed load-testing servers,” said Noble. “For this reason, we tested the AppDynamics Database Monitoring tool exhaustively.” To get the precise level of performance detail Betfair required, the AppDynamics monitoring time slice was routinely set to 10 seconds. “Even when capturing information at the finest level of detail, total overhead would still be less than 1% of CPU resource,” Noble noted. “This overhead was well within acceptable limits, and has enabled us to deploy AppDynamics on even our most heavily loaded Oracle servers, which are among the busiest in the world,” added Noble. AppDynamics Database Monitoring is used throughout all stages of application development as well as in production at Betfair, helping everyone to communicate internally about database performance issues.

AppDynamics proved a good bet. Enabled faster software releases with load-testing

“Betfair has seen customer usage almost doubling for four years and now needs to deal with more than 25,000 dynamic page impressions per second,” said Oliver Cook, Engineering Services Manager at Betfair. “It is essential that we load test all of our applications thoroughly prior to release, because if we get it wrong, even a seemingly innocuous change can have a significant impact on the customer experience.” “AppDynamics has helped us to significantly reduce the time is takes to isolate and resolve performance problems during development and pre-production load testing. The result is that we can release new functionality faster without having to compromise on quality,” added Cook.

Saved time with intuitive database comparisons

Comparing any two database loads in a changing environment can be complicated. This is especially true in a load testing environment where many different scenarios are evaluated for performance and scalability. Being able to quickly see what has changed between the different scenarios is vital, and the database engineering team at Betfair saw the AppDynamics Database Monitoring solution excel in this area. Its load test comparison report immediately highlighted where performance changed either positively or negatively, saving Betfair valuable analysis time. Betfair also found that the comparison reports were not limited to load testing and could simplify the task of comparing any two scenarios. For example, the Betfair team started using this functionality to compare a QA load with production, two nodes of a cluster or performance before and after a production change such as the addition of a new index.

Topics: APM, Appdynamics, Case Story