8 June 2019 - 9 minute read

So, one thing I've been working with at my job recently is a thing called "UI Automation". What it basically is is a means of searching a GUI for buttons and entry boxes and interacting with them programmatically. To really explain how hilariously stupid I find this even just as a concept, I'm going to talk through what the problem is, what the solution to that problem is, and then how this solves the wrong problem with a "solution" that's just a whole new problem in itself.


So, say you have some system you've built. What the system does is irrelevant, but what you want to do is test it. Say your company has a testing team that have planned out some tests and they carry out these tests each morning to make sure the system is all running as it should be - and to raise alarms if something is broken. This is human labour being used to do this repetitive task daily, so - being programmers (and people that hopefully care about our testing team's sanity) - we want to automate this.

Ideally, we want some means by which to go through the same process as the testers, but programmatically so we can reduce this massive workload to near nothing by having a script running in the background. The added advantage of this is that rather than run once each morning and be susceptible to human error, this thing can be running constantly, testing as often as every few minutes. It can carry on even out of working hours, so alarms are raised far sooner about any issues so the support team can get things fixed and working well much sooner. It's overall an obvious problem to solve.


What's the solution to this problem then? Well, there are a number of ways to do it. If you were a good developer, then your application should have some CLI to it, meaning you can write these tests in an afternoon with simple shell scripts. Just carry out basic tasks with the CLI and parse the output. If anything's out of order, then raise an alarm. If for instance your application is running as a web server, then you being a good developer will have written in health check systems, meaning you can use a similarly small shell script to simply make the requests to the server using curl for instance, then parse the plain text output just as before. All still very simple. If for whatever reason your application is tediously large and a CLI isn't feasible, then you can have written in at least a very simple scripting API that could be built on any common scripting base such as Lua. With this, you can write your test for the application as a small Lua script and pass the path to the script as an argument to your application for it to execute. The test is carried out and the results are printed in plain text to - again - parse with ease.

This overall means that testing the application or system can be done very easily with only simple scripting systems relying on existing parts of the application. I say "existing" because you reaaally should be building in a means of testing or automating your application as it makes a whole variety of tasks relating to it significantly easier - both during development and post deployment. Not having such systems not only severely harms the flexibility of your application, but contributes to the growing issue of the gaping XY problem that I'm talking about in this post.


So, say your developer team has failed to add such a system to your application, or say the application is coming from some other company that are so foolishly writing proprietary software that your own much better dev team cannot fix. What do you do then? You have no means of automating the application, so you have no means of automating testing, so your testing team is stuck doing this monotonous task every morning. The actual solution is to pester the developers of the application and demand they fix the giant issue. If it's free software then your own dev team can contribute the fix themselves and distribute the fix to anyone else that happens to be interested, leading to an overall better ecosystem for the application in question. If however it's proprietary software, then you have no choice but to whine and shout at the fools writing the proprietary software until they fix it for you - which not only isn't guaranteed to work, but will almost certainly take far longer than it needs to, meaning your testers are just suffering needlessly.

Enter "UI Automation". Someone saw this horrible problem in software, and rather than bring attention to the problem in an attempt to fix it, thought "hey, I could make shit-loads of money out of this". Brand new applications and libraries were then built that would use a combination of image-recognition, OCR, and reading the memory of other applications to try and identify windows, popups, buttons, selection boxes, text inputs, and labels. Combine that with some simple keyboard and mouse control systems and you have it. You would then use this library to try and recreate what a human would otherwise be doing.

That sounds all well and good until you look at the problems:

  1. It's slow
  2. It's temperamental and in some cases non-deterministic
  3. There are orders of magnitude more variables and events to deal with

It creates a horrendous amount of extra work for the developers, because rather than fix the application and then use the nice convenient built-in automation system, your dev team is now having to write a program using this black box library to try and programmatically recreate the entire examine-process-act cycle of a living breathing human. You're having to look at the GUI of your program, find specific components, and carry out specific actions. What's most tedious though is that you're also having to take into account, prepare for, and handle every single possible thing that could go wrong. Every error message, unpredictable window manager action, extra blocking events in the GUI, temporary windows that need to be interacted with before they disappear, and so on ad nauseum. It makes the problem so horrendously complicated that in a lot of cases, it's infeasible to effectively script these actions and correctly identify specific problems. This can be due to unpredictability of the system, bad UX, or just heavily user-centric design that isn't intended for a machine to work with... you know... like every GUI in existence.

The real problem though is that these kinds of libraries have earned a lot of money through this exploitation of bad application design. This is a business with whole companies dedicated to it, writing their shiny proprietary systems with flashy buzzwords and extortive licensing fees. Consider this a warning to any developer that happens to be working on graphical programs and anticipates that they will play a part in the testing of something: if you care about your users/customers, then you'll build in proper automation systems to your application such as a proper CLI or simple internal scripting API, because without it, your users/customers will be being forced to either expend excessive human effort doing the tests manually, or have to be paying these expensive prices for this ridiculous software that'll never completely solve their problem. Go for it, add these systems, make a point about it, market it and show people how useful it is. You're benefiting everyone who'll ever need to test your program or using your program.




Copyright Oliver Ayre 2019. Site licensed under the GNU Affero General Public Licence version 3 (AGPLv3).