08 June 2019 - 9 minute read
So, one thing I've been working with at my job recently is a thing called "UI
Automation". What it basically is is a means of searching a GUI for buttons and
entry boxes and interacting with them programmatically. To really explain how
hilariously stupid I find this even just as a concept, I'm going to talk through
what the problem is, what the solution to that problem is, and then how this
solves the wrong problem with a "solution" that's just a whole new problem in


So, say you have some system you've built. What the system does is irrelevant,
but what you want to do is test it. Say your company has a testing team that
have planned out some tests and they carry out these tests each morning to make
sure the system is all running as it should be - and to raise alarms if
something is broken. This is human labour being used to do this repetitive task
daily, so - being programmers (and people that hopefully care about our testing
team's sanity) - we want to automate this.

Ideally, we want some means by which to go through the same process as the
testers, but programmatically so we can reduce this massive workload to near
nothing by having a script running in the background. The added advantage of
this is that rather than run once each morning and be susceptible to human
error, this thing can be running constantly, testing as often as every few
minutes. It can carry on even out of working hours, so alarms are raised far
sooner about any issues so the support team can get things fixed and working
well much sooner. It's overall an obvious problem to solve.


What's the solution to this problem then? Well, there are a number of ways to do
it. If you were a good developer, then your application should have some CLI to
it, meaning you can write these tests in an afternoon with simple shell scripts.
Just carry out basic tasks with the CLI and parse the output. If anything's out
of order, then raise an alarm. If for instance your application is running as a
web server, then you being a good developer will have written in health check
systems, meaning you can use a similarly small shell script to simply make the
requests to the server using curl for instance, then parse the plain text
output just as before. All still very simple. If for whatever reason your
application is tediously large and a CLI isn't feasible, then you can have
written in at least a very simple scripting API that could be built on any
common scripting base such as Lua. With this, you can write your test for the
application as a small Lua script and pass the path to the script as an argument
to your application for it to execute. The test is carried out and the results
are printed in plain text to - again - parse with ease.

This overall means that testing the application or system can be done very
easily with only simple scripting systems relying on existing parts of the
application. I say "existing" because you reaaally should be building in a
means of testing or automating your application as it makes a whole variety of
tasks relating to it significantly easier - both during development and post
deployment. Not having such systems not only severely harms the flexibility of
your application, but contributes to the growing issue of the gaping XY problem
that I'm talking about in this post.


So, say your developer team has failed to add such a system to your application,
or say the application is coming from some other company that are so foolishly
writing proprietary software that your own much better dev team cannot fix. What
do you do then? You have no means of automating the application, so you have no
means of automating testing, so your testing team is stuck doing this monotonous
task every morning. The actual solution is to pester the developers of the
application and demand they fix the giant issue. If it's free software then your
own dev team can contribute the fix themselves and distribute the fix to anyone
else that happens to be interested, leading to an overall better ecosystem for
the application in question. If however it's proprietary software, then you have
no choice but to whine and shout at the fools writing the proprietary software
until they fix it for you - which not only isn't guaranteed to work, but will
almost certainly take far longer than it needs to, meaning your testers are just
suffering needlessly.

Enter "UI Automation". Someone saw this horrible problem in software, and rather
than bring attention to the problem in an attempt to fix it, thought "hey, I
could make shit-loads of money out of this". Brand new applications and
libraries were then built that would use a combination of image-recognition,
OCR, and reading the memory of other applications to try and identify windows,
popups, buttons, selection boxes, text inputs, and labels. Combine that with
some simple keyboard and mouse control systems and you have it. You would then
use this library to try and recreate what a human would otherwise be doing.

That sounds all well and good until you look at the problems:
  1. It's slow
  2. It's temperamental and in some cases non-deterministic
  3. There are orders of magnitude more variables and events to deal with
It creates a horrendous amount of extra work for the developers, because
rather than fix the application and then use the nice convenient
built-in automation system, your dev team is now having to write a
program using this black box library to try and programmatically
recreate the entire examine-process-act cycle of a living breathing
human. You're having to look at the GUI of your program, find specific
components, and carry out specific actions. What's most tedious though
is that you're also having to take into account, prepare for, and handle every
single possible thing that could go wrong. Every error message, unpredictable
window manager action, extra blocking events in the GUI, temporary windows that
need to be interacted with before they disappear, and so on ad nauseum. It makes
the problem so horrendously complicated that in a lot of cases, it's infeasible
to effectively script these actions and correctly identify specific problems.
This can be due to unpredictability of the system, bad UX, or just heavily
user-centric design that isn't intended for a machine to work with... you
know... like every GUI in existence.

The real problem though is that these kinds of libraries have earned a lot of
money through this exploitation of bad application design. This is a business
with whole companies dedicated to it, writing their shiny proprietary systems
with flashy buzzwords and extortive licensing fees. Consider this a warning to
any developer that happens to be working on graphical programs and anticipates
that they will play a part in the testing of something: if you care about your
users/customers, then you'll build in proper automation systems to your
application such as a proper CLI or simple internal scripting API, because
without it, your users/customers will be being forced to either expend excessive
human effort doing the tests manually, or have to be paying these expensive
prices for this ridiculous software that'll never completely solve their
problem. Go for it, add these systems, make a point about it, market it and show
people how useful it is. You're benefiting everyone who'll ever need to test
your program or using your program.


Copyright Oliver Ayre 2019. Site licensed under the GNU Affero General Public
Licence version 3 (AGPLv3).