Breaking Performance Silos
Posted in: Monitoring, Performance Testing, Speed Up Tips   -   August 30, 2013

It is amazing that, while performance is the result of every aspect of the system, performance as a discipline consists of multiple silos and different groups of performance-related specialists have very limited communication between themselves.

You hardly find any book or community covering performance end-to-end. Performance-related responsibilities are usually spread across several groups and rarely there is a person responsible for end-to-end performance throughout the whole lifecycle of the system. Ian Molyneaux’s post The Case for the CPO has the point – but how far it is from the reality. Many performance experts I talked to do believe that it should be a person responsible for performance – but you hardly see it anywhere, not to mention CPO.

Moreover, the current devops trend should amplify that need in the unified view of performance – devops, in the essence, is about breaking down silos. And performance is the first area where silos should be broken down – but it looks like performance somehow was left out. Not much happened with performance silos and it looks like the devops process happens somehow in parallel.

Development of performance as a discipline definitely have cycles – and it is a pity that next cycles start from a scratch and forget about what was done before. Yes, traditionally performance engineering was mainly concentrated on back-end. Just because front-end was trivial and didn’t impact scalability. Yes, now front-end is far from trivial and deserves the attention it has now. But back-end and scalability issues haven’t disappeared and they deserve their share of attention too. I like Andy Hawkes’ post When 80/20 Becomes 20/80 bringing back some balance. Every performance specialist should understand Performance vs. Scalability and, even being a specialist in specific area, know about other performance-related areas and main issues and methods associated with them.

Application Performance Management (or Monitoring, APM) tools are, in a way, supposed to break these silos down providing end-to-end visibility into the system. Even leaving aside the discussion how these tools live up to their promise to provide full visibility across all tiers with minimal overheads, most silos still remain. We still may have different environments (development, test, production) and APM views of each one may be not completely compatible due to differences in configuration. APM also assumes that we have people who may work with the information provided by the tools across all tiers and work with all teams to ensure system’s performance. Whatever issues were identified by (or with help of) APM, with exception of trivial cases when they may be handled automatically (for example, auto-scaling), they need to be addressed by professionals as part of performance engineering efforts (even if they are not officially named so). Meanwhile we practically have no across-the-silos performance professionals, nor have positions in organization org charts for such professionals.

Moreover, with APM we base our decisions on the current information. While theoretically it is possible to do some modeling and what-if scenarios based on the current situation (although, to my surprise, it doesn’t look like any APM vendor has it beyond trivial or has much interest in it), modeling has its limitations and provides rather the best case scenario. Only load testing allows to assure that the system would handle the expected load (assuming, of course, that it is done properly).

Velocity is probably the most popular performance-related conference for the moment, but it actually limits itself to a few performance areas. It is centered around front-end and web operations. Some scalable architecture patterns are discussed (mainly when backend processing may be parallelized). Other topics, such as load testing, APM, capacity planning, are mainly ignored. All these topics look still relevant to web operations, especially if we speak about e-commerce. And it looks like traditional corporations with all their performance problems don’t exist anymore. Well, I suppose focusing on web operations is intentional – but it isn’t a comprehensive view of performance. What I am concerned here is that with all Velocity popularity it may make impression that the topics that are not covered are not important – and I believe that it may harm the industry.

For those who are really interested in performance, I’d suggest to attend Performance and Capacity 2013 conference by CMG, which is probably the only vendor-neutral conference that is covering most aspects of performance and this year program looks great. Unfortunately it is rather small now and its impact on the performance community is rather limited. Unfortunately for the performance community – for attendees it is probably rather an advantage when you may easily reach out to renowned performance gurus in a cozy environment. I don’t position it as an alternative to Velocity – front-end coverage is rather light for those who specialize in it – but rather as a perfect complement and perhaps a better choice for those who do not specialize in front-end.


For the last sixteen years Alex Podelko has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for performance testing and optimization of Enterprise Performance Management and Business Intelligence (a.k.a. Hyperion) products. Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. He blogs at http://alexanderpodelko.com/blog and can be found on Twitter as @apodelko.

Tags: ,

  • williamlouth

    The client side is indeed important but lets be honest here the vast majority of problems that are solely client related can be found without testing in production and without testing across every single user that is using the client. This is the biggest con I have seen over the years in the APM industry.

    When mobile/clients exhibit performance related issues of such a nature that warrants tweeting or calling support what you find is that these are invariable backend related. Clients admittedly can be influentially in causing such backend issues but as a whole and not as an individual which is how most client side performance monitoring solution approach reporting.

    Now the real problem here in the APM field is that we are creating performance silos that run parallel to the software communication channels and that involve humans. To eliminate silos we simply use the communication that is already in place within the software itself to pass back and forth signals that drive (or stimulate) adaption and regulatory routines.

    http://www.jinspired.com/site/introducing-signals-the-next-big-thing-in-application-management

    The industry has created these silo in not engineering a solution but marketing a software product (for each and everyone who wants his own “thing”).

    • Alexander Podelko

      William,

      I don’t quite agree with both your statements.

      While I agree that back-end causes more serious performance issues and it is easier to troubleshoot front-end performance issues, front-end is complex enough today to have its own sophisticated performance issues and they may happen to only one or few groups of users. Not to mention complex Internet infrastructure involved beyond usual scope of back-end. I see huge value in end-user monitoring.

      While I agree that software self-management and self-adaptation are important and probably will be used more in the future (and definitely should), I don’t see them even closely coming to replacement of human / APM in any foreseeable future. Today’s systems are just too complicated and involve a mix of diverse technologies to be completely self-managed – and I don’t see it becoming better with time. We getting new and new technologies and products – while many companies still running old mainframe technologies in back-end.

      And I don’t see performance silos as product-oriented – rather role-oriented. At least new generation APM tools, as I mention, are crossing several traditional performance silos.

      Alex

      • williamlouth

        My point is that many of the issues on the client are not under your control.

        Recently I attended a “mid-summers” night talk at Joyent on system management and performance analysis. One of the talks was given by someone from AppDynamics who tried to demonstrate client side monitoring but unfortunately his (web)client showing client monitoring data had performance problems and when he tried to log back into the AppDynamics management system he was kicked straight back out with some irrelevant user credential error message which I suspect was in fact a red herring. The system had crashed and died. This was hilarious considering all the noise generated by AppDynamics when it announced it had selected its own monitoring solution to monitor its SaaS monitoring solution.

        Getting back to client side issues. When asked what kind of problems can happen on the mobile client side that can impact performance he mentioned tunnels. Physical tunnels causing poor reception temporarily. Then he preceded to show us the slow response time someone was getting in Peru. When asked why compared to others around the world I could only openly speculate it was a tunnel.

        Now the question is how much tunnel noise of sorts is there present in all this data. Why do I need to monitor every single client and incur the overhead cost and decreased the user privacy in doing so? Why can’t many of the issues not be found in a small set of mobile client tests I have UNDER MY CONTROL. The sad response I got back was…”we can’t test all these mobile devices out there in a test lab…(even a live one)”. Sure there are many devices but I expect 3-5 configurations represent the lion share.

        Mobile clients are slow. JavaScript is slow. Data latency is big. If we truly want to address this and not spend hours looking at sugar laden mgmt dashboards then a rethink is in order to bring more of the execution back under the control of the providers to some degree especially in dealing with third-party script extensions.

        From a performance perspective I still can’t get over the fact we download code into a browser only for it to go back many times for its data. There has got to be a better way that does not involve deceit, ignorance and waste which is what I see with many client performance efforts.

        http://www.jinspired.com/site/visions-of-cloud-computing-paas-everywhere