There are many discussions about performance, but they often concentrate on a specific facet of performance. The main problem is that performance is the result of every design and implementation detail, you can’t ensure performance approaching it from one side only.
There are different approaches and techniques to alleviate performance risks, such as:
Single-User Performance Engineering Profiling, tracking and optimization of single-user performance, Web Performance Optimization (WPO), etc. Everything that helps to ensure that single-user response times, the critical performance path, match our expectations.
- Software Performance Engineering (SPE) Performance patterns and anti-patterns, scalable architectures, modeling, etc. Everything that helps selecting appropriate architecture and design and proving that it will scale according to our needs.
- Instrumentation / Application Performance Management / Monitoring Everything that provides insights in what is going on inside the working system and tracks down performance issues and trends.
- Capacity Planning / Management Everything that ensures that we will have enough resources for the system. Including, for example, auto-scaling as a way to provide needed capacity.
- Load Testing ’Load testing’ is used here as an umbrella term for testing the system under any multi-user load (including all other variations of multi-user testing, such as performance, concurrency, stress, endurance, longevity, scalability, etc.).
- Continuous Integration / Deployment Everything allowing quickly deploy and remove changes, decreasing the impact of performance issues.
And, of course, all that exist not in a vacuum, but on the top of high-priority functional requirements and resource constraints (including time, money, skills, etc.).
Every approach or technique mentioned above somewhat mitigates performance risks and improves chances that the system would perform up to expectations. However, none of them guarantees that. And, moreover, none completely replaces another, every one addresses different facets of performance.
Let’s look, for example, at load testing. Recent trends of agile development, DevOps, lean startup, and web operations somewhat question importance of load testing. Some (not many) are openly saying that they don’t need load testing, some are still paying lip service to it – but just never get to it. In more traditional corporate world we still see performance testing groups and important systems usually get load tested before deployment.
Yes, other ways to mitigate performance risks mentioned above definitely decrease performance risk comparing with situation when nothing was done about performance at all until the last moment before rolling out the system in production without any instrumentation, but they still leave risks of crashing and performance degradation under multi-user load. And if its cost is high, you should do load testing (how exactly is another large topic – there is much more here than the stereotypical waterfall-like last-moment record-and-replay approach).
There are always risks of crashing or performance issues under heavy load – and the only way to mitigate them is to actually test it. Even stellar performance in production and a highly scalable architecture don’t guarantee that it won’t crash with a slightly higher load. Even load testing doesn’t completely guarantee it (for example, real-life workload may be different from what you have tested), but it significantly decreases the risk.
Another important value of load testing is making sure that changes don’t degrade multi-user performance. Unfortunately, better single-user performance doesn’t guarantee better multi-user performance. In many cases it improves multi-user performance too, but not always. And the more complex the system is, the more likely are exotic multi-user performance issues no one even thought of. Load testing is the way to ensure that you don’t have such issues.
And when you do performance optimization, you need a reproducible way to evaluate the impact of changes on multi-user performance. The impact of the changes on multi-user performance won’t probably be proportional to what you see with single-user performance (even if it still would be somewhat correlated). The actual effect is difficult to quantify without multi-user testing. The same with the issues happening only in specific cases that are difficult to troubleshoot and verify in production – using load testing can significantly simplify the process.
Yes, with other ways of mitigating performance risks and relatively low cost of performance issues and downtime, it may be possible to survive without load testing: use customers to test your system and address only those issues that pop up. However, it sounds as a risky strategy as soon as performance and downtime start to matter.
Summarizing, I don’t see, for example, that the need in load testing is going away. Even in case of web operations, we would probably see load testing coming back as soon as the systems would become more complex and performance issues start to hurt business. Maybe it would be less need for “performance testers” as it was at the heyday due to better instrumenting, APM tools, continuous integration, resource availability, etc. – but I’d expect more need for performance experts that would be able to see the whole picture using all available tools and techniques (although I don’t see it yet).
For the last fifteen years Alex Podelko has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for performance testing and optimization of Hyperion products. Alex serves as a director for the Computer Measurement Group (CMG) http://cmg.org, a volunteer organization of performance and capacity planning professionals. He blogs at http://alexanderpodelko.com/blog and can be found on Twitter as @apodelko.