#SAM2013 Ribbon
What is a HAR File and what do I use it for?
Posted in: Monitoring   -   August 29, 2012

HAR stands for HTTP Archive. This is a common format for recording HTTP tracing information. This file contains a variety of information, but for our purposes, it has a record of each object being loaded by a browser.  Each of these objects’ timings is recorded.

The HAR file format is still an evolving standard, and the information contained within is both flexible and extensible. You should expect the HAR file to include a breakdown of timings including:

  • how long it takes to fetch the DNS information
  • how long each object takes to be requested
  • how long it takes to connect to the server
  • how long it takes to transfer from the server to the browser of each object
  • whether the object is blocked or not

The data is stored as a JSON document and extracting meaning from the low level data is not always easy, but with practice, a HAR file can quickly help you identify the key performance problems with a web page, which in turn will help you efficiently target your development towards the areas that will deliver the greatest return on your efforts.

 

How to get a HAR file?

As HAR files are still an evolving format, support can be patchy and it may take some work to get your HAR file. High quality monitoring services will collect the HAR data for each sample they take. Automated testing tools can be tailored to get HAR files. At Neustar WPM we use Selenium and the Browsermob Proxy. It is also possible to use Firefox, Firebug and NetExport together to generate a HAR file for a specific URL. That said, if you are using a modern browser it can generate a waterfall diagram, without the need to capture the HAR file as an interim representation, and this provides most of the information you will need in an instantly available view.

 

Visualizing A HAR file

As the file is in a JSON format, this is relatively easy to process with software, but it can be difficult for us humans to visualize. Here are three options that will create visualizations of HAR files:

  • softwareishard.com – With this site you can simply paste your HAR File into the field and it will generate a report.
  •  Your monitoring service – Your monitoring service should provide a visualization of your page loads for each sample they take. Compare your load times from different geographic locations to get a broader view of your customers’ experience.
  • Developer tools for browsers – I developed my first pass of web pages in Chrome. The Network Tab of the developer tools shows the webpage waterfall. Its easy availability makes it a great tool to work with.

What to look for

Once you have a clear visual layout of your HAR file, you need to know what to look for. When looking at the HAR file, you will graphically see for every object, when it starts loading and the time each step of the object load process takes.

The first and most obvious thing you will notice is how many items are loading. In this page shown above, the web page is loading a large number of javascript files, css files, mustache templates and images. Best practice guidelines recommend aggregating each of these groups of objects into a single file to reduce the number of round trips to the server. And in this case that would significantly reduce the 9.3 seconds load time for the page. There are a large number of tools that can aggregate your javascript and css files and a relatively small change to your deployment scripts can have a dramatic effect. Image files can be combined using a technique known as spriting. This is a more involved process and is not easily automated for the general case. Analysis of your HAR file will help you to decide whether the effort is worth the payback for your site.

Next, identify the critical path. By definition, reducing the load time of any item on the critical path will reduce your load time. In the interim, the easiest way to determine your critical path is to start at the last step to complete in the HAR file, which is always on the critical path, and then to work backwards looking for the item that prevented that item from loading sooner. To do this systematically can be time consuming, although as a rule of thumb anytime an object is the only item in progress, it is very likely to be on the critical path. At the Velocity Conference, Google demonstrated the forthcoming version of PageSpeed insight that will show the critical path of a web site, and this looks like it will be a great tool and a potentially big time saver.  For each item on the critical path look to see how you can reduce its load time, as this will reduce the page load time. Reducing load times of items not in the critical path will not reduce the page load time.

After identifying the critical path, it is important to identify the items causing load time issues in the path. Here are some things to look for:

  • Longest load time items: The page items with the longest load time off the largest potential for speeding up your load time. The strategy for speeding up an object will depend on its nature. Load time of static files may be reduced through aggregation, compression and judicious removal of unused elements. And while most developers like to focus on improving the efficiency of the server side code, typically the payback for most websites is low. If your website is a candidate for improving server side efficiency the evidence will be in your HAR file.
  • Items not loading in parallel: HTTP 1.1 specifies that browsers should load 2 items in parallel from each domain. Modern browsers support more parallelism.
    • TIP: Rearranging the order of items in your webpage can increase the parallelism and reduce page load time. Also, where possible load your JavaScript asynchronously. We use head.js to load all our JavaScript, but there are many other excellent equivalent JavaScript libraries.
    • Periods when nothing is loading: Browsers will block when waiting for CSS to load, or while executing JavaScript. Keep in mind that even items such as a print style sheet, something your browser doesn’t need to render the page, will block your page from displaying.
      • Tip: Rearrange the order of CSS, images and JS page items can unblock the browser’s choke points.
    • Items not being cached: If you reload your web page, your static objects should not be reloaded.
      • Tip: If the static items on your page are reloading, your web server is not setting the headings correctly.
  • The HAR view above shows a page that is well down the path to optimization being loaded first with a fresh cache and then a second time with the cache primed. In the initial load the page has two CSS files, two JS files, some images and analytics trackers. Note that for the second page load, the page requires only the HTML document, a JSON document with the users customized data, and some analytics trackers, which are loaded asynchronously.
  • DNS Lookups: DNS lookup times can vary greatly among ISPs, geographic locations and DNS managers. If your page is loading objects from a larger number of domains, this may result in a larger overhead.
    • Tip: If possible, consider moving items to your server or CDNs rather than loading from third parties. Also consider loading these items asynchronously, which may remove them from the critical path.

Armed with your HAR File, visualization and these tips, you’re now ready to optimize your website speed.

 

Other uses for a HAR file

yslow can be run from the command line and given a HAR file.  You can include this as part of your automated regression testing to ensure your most recent changes are not degrading your websites behavior.


Alan Dyke has a Ph.D. from the University of Wales, and over 20 years experience working as a software engineer. He has worked in academia, in the defense industry, and in manufacturing. For most of his career, Dr Dyke has been developing for the web. He is currently a senior engineer at Neustar in the Web Performance Management Group.

Tags: , , ,

  • Kareem

    Nice blog post Lane, very insightful.

    • http://twitter.com/LaneJoplin lane joplin

      Thanks Kareem!

  • http://www.igorware.com/ Igor

    Umm, you can also save/copy HAR data from Chrome Dev Tools, just right click on waterfall.

  • Abhishek

    Really helpful post..:)..Thanks

  • http://www.justaprogrammer.net Justin Dearing

    Its good to see there is a multi-platform standard for this. I use fiddler a lot that has its own standard called SAZ, but google just told me fiddler supports HAR format

  • http://www.facebook.com/muhali786 Muhammad Ali

    Excellent source….would really appreciate if you keep posting stuff on Pagespeed.
    Thanks