HAR stands for HTTP Archive. This is a common format for recording HTTP tracing information. This file contains a variety of information, but for our purposes, it has a record of each object being loaded by a browser. Each of these objects’ timings is recorded.
The HAR file format is still an evolving standard, and the information contained within is both flexible and extensible. You should expect the HAR file to include a breakdown of timings including:
- how long it takes to fetch the DNS information
- how long each object takes to be requested
- how long it takes to connect to the server
- how long it takes to transfer from the server to the browser of each object
- whether the object is blocked or not
The data is stored as a JSON document and extracting meaning from the low level data is not always easy, but with practice, a HAR file can quickly help you identify the key performance problems with a web page, which in turn will help you efficiently target your development towards the areas that will deliver the greatest return on your efforts.
How to get a HAR file?
As HAR files are still an evolving format, support can be patchy and it may take some work to get your HAR file. High quality monitoring services will collect the HAR data for each sample they take. Automated testing tools can be tailored to get HAR files. At Neustar WPM we use Selenium and the Browsermob Proxy. It is also possible to use Firefox, Firebug and NetExport together to generate a HAR file for a specific URL. That said, if you are using a modern browser it can generate a waterfall diagram, without the need to capture the HAR file as an interim representation, and this provides most of the information you will need in an instantly available view.
Visualizing A HAR file
As the file is in a JSON format, this is relatively easy to process with software, but it can be difficult for us humans to visualize. Here are three options that will create visualizations of HAR files:
- softwareishard.com – With this site you can simply paste your HAR File into the field and it will generate a report.
- Your monitoring service – Your monitoring service should provide a visualization of your page loads for each sample they take. Compare your load times from different geographic locations to get a broader view of your customers’ experience.
- Developer tools for browsers – I developed my first pass of web pages in Chrome. The Network Tab of the developer tools shows the webpage waterfall. Its easy availability makes it a great tool to work with.
What to look for
Once you have a clear visual layout of your HAR file, you need to know what to look for. When looking at the HAR file, you will graphically see for every object, when it starts loading and the time each step of the object load process takes.
Next, identify the critical path. By definition, reducing the load time of any item on the critical path will reduce your load time. In the interim, the easiest way to determine your critical path is to start at the last step to complete in the HAR file, which is always on the critical path, and then to work backwards looking for the item that prevented that item from loading sooner. To do this systematically can be time consuming, although as a rule of thumb anytime an object is the only item in progress, it is very likely to be on the critical path. At the Velocity Conference, Google demonstrated the forthcoming version of PageSpeed insight that will show the critical path of a web site, and this looks like it will be a great tool and a potentially big time saver. For each item on the critical path look to see how you can reduce its load time, as this will reduce the page load time. Reducing load times of items not in the critical path will not reduce the page load time.
After identifying the critical path, it is important to identify the items causing load time issues in the path. Here are some things to look for:
- Longest load time items: The page items with the longest load time off the largest potential for speeding up your load time. The strategy for speeding up an object will depend on its nature. Load time of static files may be reduced through aggregation, compression and judicious removal of unused elements. And while most developers like to focus on improving the efficiency of the server side code, typically the payback for most websites is low. If your website is a candidate for improving server side efficiency the evidence will be in your HAR file.
- Items not loading in parallel: HTTP 1.1 specifies that browsers should load 2 items in parallel from each domain. Modern browsers support more parallelism.
- Tip: Rearrange the order of CSS, images and JS page items can unblock the browser’s choke points.
- Items not being cached: If you reload your web page, your static objects should not be reloaded.
- Tip: If the static items on your page are reloading, your web server is not setting the headings correctly.
- The HAR view above shows a page that is well down the path to optimization being loaded first with a fresh cache and then a second time with the cache primed. In the initial load the page has two CSS files, two JS files, some images and analytics trackers. Note that for the second page load, the page requires only the HTML document, a JSON document with the users customized data, and some analytics trackers, which are loaded asynchronously.
- DNS Lookups: DNS lookup times can vary greatly among ISPs, geographic locations and DNS managers. If your page is loading objects from a larger number of domains, this may result in a larger overhead.
- Tip: If possible, consider moving items to your server or CDNs rather than loading from third parties. Also consider loading these items asynchronously, which may remove them from the critical path.
Armed with your HAR File, visualization and these tips, you’re now ready to optimize your website speed.
Other uses for a HAR file
yslow can be run from the command line and given a HAR file. You can include this as part of your automated regression testing to ensure your most recent changes are not degrading your websites behavior.
Alan Dyke has a Ph.D. from the University of Wales, and over 20 years experience working as a software engineer. He has worked in academia, in the defense industry, and in manufacturing. For most of his career, Dr Dyke has been developing for the web. He is currently a senior engineer at Neustar in the Web Performance Management Group.