A tool written to compare two web pages.


This is a set of scripts and a tool to compare two web pages. Essentially, it takes pictures of web pages and compares them, providing a normalized root-mean-square calculation of the pixel differences between the two images. The purpose is for QA of content management systems, where code changes should not radically alter the rendering of pages. This tool can be used in such a case to verify that code changes did not introduce substantial changes.

The complexity lies in how the tool executes this function:

  1. It is designed to work on a large number of web pages: hundreds of thousands. As such, it is highly concurrent. At one point, I had some metrics about how quickly it’d churn through 500k web pages, but these are lost in the mists of time. It was pretty fast.

  2. It uses PhantomJS for the rendering

  3. It provides a number of metrics and statistics to make finding the needles (the differences) easier, such as threshold limits. This allows users to ignore small NRMSes which may be caused by (e.g.) date text changes on page footers.

Building and running

Sorry. You’ll have to figure this out. It was a tool developed for internal use, and at the time I resurrected the repository, I’d been out of the project for 3 years. Good luck!