The Road to HTML5: Automating visual testing with Sikuli

By Han Lee on Jul 12, 2012 in Let's Talk Tech

Rebuilding the Gliffy editor in HTML5 has forced us to take another look at how we do QA. There’s more variation between browers in HTML5 then there is with Flash, so we need to run every test in more than a dozen browser/OS combinations. For us, that means thousands of test cases per release.

With so many tests to run, gui test automation is an appealing option — but HTML5’s canvas makes it difficult. In a lot of ways, it’s similar to testing a Flash application: We care about the graphical results of doing a bunch of (mostly) drag-and-drop operations. Tools like Selenium are good at scripting text-based interactions, but not great at scripting “drag a rectangle to the middle of the screen and make it bigger.” Then there’s the problem of checking to see if we have a grey rectangle 100×250 with a 2px red border and a lower-right drop shadow. On the other hand, image-based testing is notoriously sensitive to small pixel-sized differences, which can make automated tests very prone to breaking.

Sikuli ( is an interesting tool that uses image recognition to drive a GUI. It’s natively Jython, but there’s also a Java API. There’s a great demo on the Sikuli blog showing a short script that plays Angry Birds: here.

Sikuli is great because it lets us adjust the sensitivity of its image recognition using an IDE. We can ask it to look for either an exact image match or a less exact match. The IDE allows us to easily develop a test, then replay that test on a number of different platforms, tuning the sensitivity as we go. It does this using a comparison screen and a simple slider (see screenshot). Once the test has been developed, we check it in. So far, we’ve got a small smoke test that has survived being an early user of the HTML5 app. It’s needed some tweaking as the HTML5 editor has evolved, but it has been fairly resilient to small styling changes.

Our next problem will be how to scale. We already use Atlassian’s excellent Bamboo build server, and we’re setting it up with Elastic Bamboo clients to fire up AWS test execution machines on demand. Since Sikuli works at a keyboard-and-mouse level, we can use any remote-control technology (VNC, RDP, etc.). We’re still trying to find the best solution for supporting a multitude of browsers and operating systems, but a service like BrowserStack ( looks promising. We’re aiming to run tens of parallel tests from a single AWS Sikuli machine, and also scale by increasing the number of AWS Sikuli machines. We’ll have a much better idea of the long-term maintainability of a Sikuli-based solution after we’ve built out more tests and have them running every time our Bamboo builds execute.

Early prototypes have been encouraging. This type of software test automation will help catch bugs earlier by giving developers feedback across a large set of browsers right after they check code in. And more efficient bug-catching will let us release faster, fix bugs faster, and develop great new features faster.