I am often asked by folks “What should we be monitoring?” This is a billion dollar question and it seems like everyone has their own answers. I have seen different approaches, some better than others.
In this blog post I want to share with you what I think is a “good” methodology for monitoring. To illustrate it I will use a simple webpage, but you can easily apply it to any web-based applications.
Before we talk about what to monitor, lets quickly cover why you should monitor in the first place:
- Availability – Is your site or application up and running?
- Speed – Is the site or application operating at the desired speed (this is not about optimization, this is about is it running as fast as is supposed to run or not).
- Reliability or Integrity – Great; we accessed it, and it was fast. Now, is it giving me what is intended, and working as it is supposed to?
In a webpage there are multiple components that determine availability, speed and integrity. If we were to dissect a webpage we would see the following:
- Primary URL of the page – the URL the user has to type/click to access the webpage.
- HTML response from the Server – this is essentially what the browser will render.
- External Objects – images, ads, beacons and widgets. All different web technologies, but each could impact your webpage.
All of the above rely on HTTP requests to one or more hosts, and the browser executing the HTTP responses properly. If we were to analyze the loading of the webpage we would see all these requests being issued, answered, and executed – and some of them will have a major impact while others might be very limited. So how do we go about monitoring this rich ecosystem?
It is obviously important to monitor your webpage from an actual browser to get a clear picture of the availability, speed, and integrity of a webpage. This will help you answer questions like:
- How long did it take to download the page?
- How long it takes for the webpage to start rendering? This metric is important as it affects the end user perception of how fast your page loads.
- How long it took the document to complete? Another very important metric, as it maps to what most users think as page-finished-loading, and most importantly, page interactivity could be dependent on this event firing.
However, while monitoring the webpage on a browser is important, it will not be sufficient. The main reason is that the complexity of the webpage, with its different hosts or requests, brings complexity to troubleshooting. If you rely on third-party vendors and partners then you will want to monitor them independently of each other – to avoid finger-pointing between them.
Therefore we recommend that you not only monitor the webpage itself in a browser, but also individual requests and hosts that have an impact on your webpage. Thus we suggest the following three-step process to identify what to monitor:
- Identify which hosts and requests have an impact on availability, speed, and integrity. Obviously the primary URL has the biggest impact on your webpage; if it is slow by one second – your webpage will be slow by one second. If the URL delivers a 500 error, the webpage is unavailable, and so on. Other hosts or requests that have an impact are:
- Any requests that deliver key content or functionality to the webpage. For example if your webpage is delivering Stock Market prices via Ajax and JSON calls, any failures of those requests would result in not displaying the data.
- Identify any sources of impact on these requests. Sometimes these additional hosts or requests rely on sources not visible to end users, or the browser. For example if you are relying on a CDN, you might have configured the CDN to rely on an Origin Server – that would be a server you own and maintain that is the original source of the content the CDN caches and serves from the edge. A misconfiguration of the Origin Server can have a major impact on CDN performance! Also any slowness or availability issue with the origin server could result in performance and availability issues affecting the CDN requests.
- Monitor each identified host, request, and most importantly the webpage itself. The key here is to avoid duplication of monitoring. So if you have asset1.site.com and asset2.site.com both pointing to the same CDN and their DNS configured on the same DNS provider – you need to just monitor one of them.
- If you have multiple datacenter all serving www.xyz.com make sure you monitor each one of them too.
Up to now we have described why to monitor, where to monitor, and next we describe what to monitor for each of the requests.
Obviously you have to monitor the HTTP requests, and the webpage or webpage-like content (widget or Ad) performance on the browser.
Additionally we recommend you monitor your DNS Servers or DNS Providers. DNS is often forgotten by individuals, but it is the one thing that can make a huge difference to users from different geographic locations – or to the availability of the requests for your webpage. It is best that you monitor the DNS Servers directly (this is why we have a separate DNS monitoring solution), versus relying on the browser/http monitors. The main reason is that DNS can be resolved by one of multiple servers. If you have two DNS servers resolving your domain and one has a response time of 100ms while the other is 500ms, there would be 1 in 2 chance you see DNS at 500ms – and because of DNS TTL and caching you might have an even harder time discovering DNS performance problems.
Monitor not only the webpage but also the key hosts and requests that impact the performance of your webpage. Don’t be limited to HTTP monitoring; expand to DNS monitoring for the key domains to ensure speed and availability. Don’t just monitor to keep baselines, observe where you can act and save some milliseconds or bytes; reducing delay by 100 milliseconds every release can be very empowering. You are making end users happy and making your company money.
I used the word monitor a lot. Monitor allows you to create baselines, and catch issues before your customers do. I also recommend you put in place ways to measure and watch your site via Google Analytics, various RUM solutions or by just looking at your web server logs. The combination of all these solutions will allow you keep an eye on your uptime, performance and thus the bottom line!
Mehdi – Catchpoint
Mehdi (@mdaoudi) is the founder and CEO of Catchpoint. The Catchpoint vision is to provide the best IT analytics & intelligence to help companies improve their end user performance, and keep an eye on their infrastructure and vendors.