Testing graphical interfaces (and Dear ImGui)

Posted on Aug. 7, 2021 by Ben Dickson.

The stability of an interface such as a programming API is quite inherently well understood as a "good thing". Code you write today which works will, to the best ability of everyone involved, continue working tomorrow. If something needs to change in a jarring fashion (such as removing a method, changing an argument) will most likely be done with the following considerations:

The change will be signalled with something like a major version increment. Users would expect (either by explicit agreement or convention) that major version means "maybe things will break"
The breaking change will be made in a way which breaks old code very clearly - e.g say there is a method with a confusingly double-negative argument like delete_files(dont_prompt_user: bool) to delete_files(prompt_user: bool). Instead removing the entire method and introducing delete_with_prompt() and delete_silently() would leave much less room for surprises.

Graphical interfaces are an interface too

These requirements extend quite consistently to changing graphical interfaces: If you remove a button, you better not add a completely different button in the same place with only slightly different text. If you change a default behaviour, it should be clearly indicated to the user with the same care as a breaking programming API change.

However I suspect UI's aren't often held to the same standard. Partly this is due to the inescapable fact that humans are more easily adaptable to changing interfaces than code, and the costs in changing are less. For the most part, is just "less important" keeping a UI consistent ¹

Although, not always: in industrial applications, changing an interface in a way where the user could unintentionally trigger a potentailly dangerous action is just as catastrophic as a system failure to to a coding bug.

However the tooling and approach to mantaining compatibility are not as well developed for interfaces.

As some examples:

There is a quite well accepted approach to ensuring a shared .so type library is compatible, and tools like ABIcheck can automate this.
With Rust code, there is (mostly) well defined guidelines for what constitutes a compatible change to a library versus a "breaking change", SemVer Compatibility guide
Test cases are a widely used method of ensuring that a snippet of code keeps doing the same thing release-to-release.

Approaches for testing interface stability

I suspect a lot of graphical interfaces are approached as a "don't make mistakes" - just be careful when changing interface code, compare before and after and make sure it looks about right. In simple interfaces this is probably fine, but as soon as interfaces start gaining much state this becomes increasingly impractical.

As with programming API's, ensuring interfaces remaining consistent almost certainly requires some kind of automated testing.

With web-based applications there is some good tooling to help catch visual changes, tools like Percy take a pull-request and run some code like this which navigates to a URL, fills in some inputs, clicks some buttons, and asserts. It also screengrabs the state of the browser and shows if it looks the same.

This is pretty excellent as it checks that a series of actions by the user will result in the same result.

Applying the same approach to desktop applications is doable, but, harder.

I feel a lot of the complexity is due to:

Just running a graphical application in a CI context is harder than a web-page. "Headless browsers" are quite common, while running a graphical application in a CI context either requires dedicated hardware with a graphics card, something like Xvfb, and so on.
There is host-OS-dependant and GUI framework dependant means to, say, click a button and check if it makes a popup window appear. OS-level Accessability APIs potentially make this easier, but these are often an even more abandoned topic than testing inter.
Running the application and extracting internal state to make test assertions often requires additional work (e.g the internal state of a window might be hidden behind several layers of private data types). Whereas with a web-based application, it's most probably written in Javascript, running inside the same process as the test runner which is also written in Javascript - therefore the test runner can more easily introspect the application, share application code to set up tests etc.

There is some projects which somewhat help with this, e.g Sikuli X which uses a nice IDE and fuzzy-matched screenshots to automate clicking on widgets. However being a completely "external" tool it lacks an easy access to the internal application state.

Dear IMGUI

I'm currently developing a complex desktop application using the Dear IMGUI library for it's interface. It is a somewhat unusual approach but it made perfect sense for the particular application.

Since starting to use this, I had a "feeling" the approach it uses makes it very ammeable to testing.

I'm using the Rust bindings to the library. With this, you do some setup steps and then create the all important ui object. With this object, you create your interface for the current frame.

For example to make a UI with a button which, when clicked, makes some text change, your application would look roughly like this psuedo-code:

let platform = imgui_winit_platform::init();

let ctx = imgui::Context::create();

let mut button_clicked: bool = false;

while {
    // Make new `Ui` object for this frame
    let ui = ctx.frame();

    // Draws a button, returning true if clicked in this frame
    if ui.button("Example button!") {
        // Update our state
        button_clicked = true;
    }

    if button_clicked {
        ui.text("Button has been clicked");
    } else {
        ui.text("Please, click the button");
    }

    let draw_data = ui.render();
    
    imgui_glium_renderer.render(draw_data);
}

The interesting parts to note here are:

The inputs from the OS, like "which keys were pressed", "where is the mouse cursor" is provided by the "platform" (in this example denoted by the common imgui_winit_platform)
All your UI code happens for that frame - it reacts to button clicks and so on, updates your application state.
You get a DrawData structure, and pass it to the "renderer" which takes this draw data (basically a bunch of polygons and texture information) and displays it in an OpenGL context or something similar

The exciting part is all this is exposed very directly, with the intent that both the "inputs" (keyboard/mouse/etc) and outputs (polygons representing your UI) are designed to be interfaced with different systems (e.g different input devices, game engines, windowing systems)

Why this is exciting for testing an interface might be evident from the above, but basically you can replace that while loop with something "less infinite", and you have something mostly resembling the typical testing pattern of "setup, code, assertion".

let platform = mock_platform::init();

let ctx = imgui::Context::create();

let mut button_clicked: bool = false;

// Make new `Ui` object for this frame
let ui = ctx.frame();

// Draws a button, returning true if clicked in this frame
if ui.button("Example button!") {
    // Update our state
    button_clicked = true;
}

if button_clicked {
    ui.text("Button has been clicked");
} else {
    ui.text("Please, click the button");
}

let draw_data = ui.render();
    
let new_image = render_draw_data_to_image(draw_data);
diff_image(new_image, load_image("./test_data/example.png"));

This is, at least in theory, has a lot of the good aspects of what makes testing web-applications less hard:

The code under test and the test case code is in the same language, and in the process
You can provide mock inputs in exactly the same interface the application gets them (via the "Platform" abstraction)
Your output does not depend on any graphics hardware, so can easily run anywhere (including the all important random free-teir CI runner)