Performance: See a Bigger Picture
Posted in: Performance Testing   -   August 20, 2012

There are many discussions about performance, but they often concentrate on a specific facet of performance. The main problem is that performance is the result of every design and implementation detail, you can’t ensure performance approaching it from one side only.

There are different approaches and techniques to alleviate performance risks, such as:

Single-User Performance Engineering Profiling, tracking and optimization of single-user performance, Web Performance Optimization (WPO), etc. Everything that helps to ensure that single-user response times, the critical performance path, match our expectations.

  • Software Performance Engineering (SPE) Performance patterns and anti-patterns, scalable architectures, modeling, etc. Everything that helps selecting appropriate architecture and design and proving that it will scale according to our needs.
  • Instrumentation / Application Performance Management / Monitoring Everything that provides insights in what is going on inside the working system and tracks down performance issues and trends.
  • Capacity Planning / Management Everything that ensures that we will have enough resources for the system. Including, for example, auto-scaling as a way to provide needed capacity.
  • Load Testing ‘Load testing’ is used here as an umbrella term for testing the system under any multi-user load (including all other variations of multi-user testing, such as performance, concurrency, stress, endurance, longevity, scalability, etc.).
  • Continuous Integration / Deployment Everything allowing quickly deploy and remove changes, decreasing the impact of performance issues.

And, of course, all that exist not in a vacuum, but on the top of high-priority functional requirements and resource constraints (including time, money, skills, etc.).

Every approach or technique mentioned above somewhat mitigates performance risks and improves chances that the system would perform up to expectations. However, none of them guarantees that. And, moreover, none completely replaces another, every one addresses different facets of performance.

Let’s look, for example, at load testing. Recent trends of agile development, DevOps, lean startup, and web operations somewhat question importance of load testing. Some (not many) are openly saying that they don’t need load testing, some are still paying lip service to it – but just never get to it. In more traditional corporate world we still see performance testing groups and important systems usually get load tested before deployment.

Yes, other ways to mitigate performance risks mentioned above definitely decrease performance risk comparing with situation when nothing was done about performance at all until the last moment before rolling out the system in production without any instrumentation, but they still leave risks of crashing and performance degradation under multi-user load. And if its cost is high, you should do load testing (how exactly is another large topic – there is much more here than the stereotypical waterfall-like last-moment record-and-replay approach).

There are always risks of crashing or performance issues under heavy load – and the only way to mitigate them is to actually test it. Even stellar performance in production and a highly scalable architecture don’t guarantee that it won’t crash with a slightly higher load. Even load testing doesn’t completely guarantee it (for example, real-life workload may be different from what you have tested), but it significantly decreases the risk.

Another important value of load testing is making sure that changes don’t degrade multi-user performance. Unfortunately, better single-user performance doesn’t guarantee better multi-user performance. In many cases it improves multi-user performance too, but not always. And the more complex the system is, the more likely are exotic multi-user performance issues no one even thought of. Load testing is the way to ensure that you don’t have such issues.

And when you do performance optimization, you need a reproducible way to evaluate the impact of changes on multi-user performance. The impact of the changes on multi-user performance won’t probably be proportional to what you see with single-user performance (even if it still would be somewhat correlated). The actual effect is difficult to quantify without multi-user testing. The same with the issues happening only in specific cases that are difficult to troubleshoot and verify in production – using load testing can significantly simplify the process.

Yes, with other ways of mitigating performance risks and relatively low cost of performance issues and downtime, it may be possible to survive without load testing: use customers to test your system and address only those issues that pop up. However, it sounds as a risky strategy as soon as performance and downtime start to matter.

Summarizing, I don’t see, for example, that the need in load testing is going away. Even in case of web operations, we would probably see load testing coming back as soon as the systems would become more complex and performance issues start to hurt business. Maybe it would be less need for “performance testers” as it was at the heyday due to better instrumenting, APM tools, continuous integration, resource availability, etc. – but I’d expect more need for performance experts that would be able to see the whole picture using all available tools and techniques (although I don’t see it yet).

 


For the last sixteen years Alex Podelko has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for performance testing and optimization of Enterprise Performance Management and Business Intelligence (a.k.a. Hyperion) products. Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. He blogs at http://alexanderpodelko.com/blog and can be found on Twitter as @apodelko.

Tags: , , , ,

  • http://blog.justindorfman.com jdorfman

    +1

  • William Louth

    Alex following on from my remark on twitter:

    With regard to your comment on DevOps perceiving load testing as less important I think this would be a serious mistake though probably an all too common occurrence during an initial transition. In my opinion the success of DevOps at scale will largely depend on the degree and quality of automation that is embedded in components, applications, services and platforms. The automation here will largely be based on self observation and self regulation (adaptation) in the form of controllers, sensors, feedback loops, signals,….Once we go down this route of dynamic self regulation the only mechanism left for us to reason about such systems (which become complex adaptive systems by their very nature) is to probe it via inputs & disturbances and observe its reactions (adjustments & performance) as it tries to maintain its service reliability (resilience). This requires testing, functional testing, but its not functional in terms of the application itself but functional in terms of its regulation which is driven by load. So load testing becomes the new form of functional testing of such systems.

  • Brad Johnson

    Nice summary of the state of Performance and load testing, Alex.

    @William – great point. As we enter a world dominated by mobile, the line between what is considered functionality and performance is increasingly blurred. End users could care less what “we” call it – they just know their experience is bad (or good).

    Software engineering in a world of “endless” scalability and the influence and impact of traditionally “performance” characteristics like battery life and carrier speed/quality require the ability to both monitor user experience with metrics from every point – front to back – as well as measure the “does it work right” experience under all circumstances. Oh, and faster and more completely than ever. Discrete tests lose validity and tests that combine objectives, or at least include perf and functional elements become necessary from both a speed and coverage perspective.

    When complexity increases along with user quality expectations, the only answer is more efficient automation…from dev to test to ops.

    Brad Johnson
    @bradjohnsonsv