More than once, I have evaluated a large body of automated test code. The usual customer complaint was lack of reliability- the tests sometimes ran, sometimes called (non-existent) defects and sometimes crashed. So much for their sizable investment in automated testing… Most commonly, after examining a lot of code I decided that it was lack of attention to the details of test structure that caused so many problems- the tiniest steps in the tests were the least reliable.
An automated test is just like any other coding project; one must understand the fundamentals, which in this case are the fundamentals of building a test, before using code to solve a problem.
Let’s start with the simplest component of testing which can declare a test status of PASS or FAIL. I call it the Probe and Verify pair.
The Probe operation instigates some predictable behavior on the part of the Software Under Test (SUT). The Verify operation examines available state and artifacts which can be used to verify the correct behavior of the SUT. If the Verify operation does not recognize the state and/or artifacts that it expects, a FAIL or ERROR is declared. Otherwise, a PASS is declared. It’s simple, right? Well, the answer would be both yes and no. A ‘real’ test has many Probe and Verify steps strung together in an order pre-determined by the author of the test activity, with intervening actions to move the SUT through the operations targeted by the test. The hard part of writing a test is all in those actions that move you from one Probe and Verify pair to the next. In fact, the bulk of most tests are made up of those intervening actions that allow a test to progress from one Probe and Verify pair to the next. Since those intervening actions are a huge part of the test, making them reliable, robust, repeatable and predictable will contribute in equal measure to making the test as a whole reliable, robust, repeatable and predictable.
When a test moves from one Probe and Verify to the next, there are intervening activities that prepare the SUT for that next Probe operation. We can call those intervening activities Probe Setup, or just setup. Setup for a test is not going to surprise you, but we’re looking at the finest grained setup. Those are individual actions which set up an individual probe and how they contribute to the qualities of reliability and repeatability and predictability and robustness.
Here’s an example. You want to test the newly fixed Change Password Page for accepting valid and invalid passwords (I know, there are better ways, but bear with me). You must enter the users’ current account information, including current password, click a check box “I Accept Terms and Conditions”, then fill in the new password (twice).
It sounds simple. This example is imagined, slightly contrived and very much condensed. Let’s take a look at the test steps:
The test fails with the “Password not Changed” dialog. That seems appropriate until you get to the part where you supply conforming new passwords. The “Password Not Changed” dialog appears again. It must be a defect. A defect report was filed indicating that the Change Password Page did not accept valid passwords.
Full disclosure: the problem was that the “Terms and Conditions” check box was read-only when the page came up and its starting state was unchecked. The Click() method could not change it to checked. The “Password Not Changed” dialog box was presented because the “Terms and Conditions” were not accepted. The new contents of the New Password text box were irrelevant.
The result is that we have posted a defect report for the product, the SUT, instead of realizing that the test was doing something wrong.
The correctness of the test was never challenged because it was used in the past. What went wrong?
This is what happened: The latest changes to the Change Password Page included a fix for a defect where a user could change their password without accepting the terms and conditions. That was fixed by requiring the “Terms and Conditions” check box to be checked before a password can be changed. However, there was a secondary ‘improvement’ by the developer where the check box was disabled until the user name and current password were filled in. After that, the check box was enabled and could be checked. However, the order of operations in the test code checked the check box first then filled in the other fields. The test no longer matched the behavior of the SUT.
We could have avoided all that pain with the principle of “Know, don’t Assume”. The test writer assumed that a simple Click() operation would be successful and never checked the return code which would have indicated the error. The test could have stopped at step (2). Assuming that the Click() was successful, the rest of the test was executed and the error dialog showed when a valid password was presented.
If the test writer used the principle of “Know, don’t Assume”, s/he would have checked the result of the Click() and the test would have flagged the failed Click() of the check box instead of a defect in the password validation. Further, the discrepancy between the changed product code and the test code would have been immediately apparent as a test problem, not a product problem and would have been fixed at the first execution of the test with very little trouble.
Now let’s look at an example from real code written by an experienced software engineer rather recently. I’ll use pseudo code with simplified operations. This test writes to a field in an interactive web page and verifies that the SUT works by verifying that the data written to the field (eventually) got to a database.
This all seems fairly trim and direct- if you enter data at one place, you should see the predicted result, in this case a particular value in a database record. If not, declare that Data Entry Page has a defect.
However, what if step (5) sees a transient infrastructure error, perhaps network, storage or configuration and the database access was unsuccessful? The test logic will provide an empty or defective result to step (6) and the test will declare Data Entry Page as defective. The worst thing a test can do is to declare an SUT failure when the problem was an internal error in the test. Transient or not, likely or not, a database access problem will trigger the defect cycle for Data Entry Page. If the whole process is automated, then a defect would be filed in the defect tool, a Developer would be assigned and work done to figure out that nothing is wrong with the SUT. The Developer would close the defect report with No Problem Found (NPF). After that, the testing would have to be restarted, this time making sure that there are no failures that are not SUT failures, usually by having a tester manually invoke and manually monitor the test. The tester would have to manually inspect the logs and results and manually post the PASS or FAIL as required. This really defeats the goals of automated testing and costs more in time and people than straight manual testing.
Certainly we would hope that test internal errors are few, but I used this example to show that the test developer assumed that the database access would always work and database accesses need not be checked for proper operation. That assumption was invalid. If the database operation was checked for proper operation, the test writer would know that the database access was successful or not and would declare a FAIL in the SUT only if there was truly a failure in the SUT. Even if the database access throws its own specialized exception, the test would lose control, which in automated environments is almost as bad as declaring a false SUT defect because it forces human intervention and stops or interferes with subsequent automated tests.
In testing, the difference between assuming and knowing can cause a lot of problems as well as reducing the value and effectiveness of automated testing and increasing the over-all time and cost of testing.
The most common reason that test writers don’t always check their assumptions is that they don’t know what to do if some internal failure happens. The problem shows like this: Tests are expected to return SUT PASS or SUT FAIL, but neither of those is true when there is an internal test error. In fact, you don’t actually know anything about the SUT for that test. The result of that test activity is void, because that test lost control and can’t really declare either that the SUT PASSed or FAILed. If your test environment accepts only PASS or FAIL, as happens with many CI environments, then you need to recognize and deal with test FAIL codes that may not be SUT problems, but internal test failures.
For this particular situation, I like to use a third return code called TEST_INTERNAL_ERROR. Test writers need to be able to return that third test result- TEST_INTERNAL_ERROR. Accommodating the TEST_INTERNAL_ERROR can be done by a direct addition of that third result code, or it can be done by qualifying a FAIL with extra conditional information. A FAIL with a “test_internal_error result” modifier could be used to avoid triggering any unwanted defect cycle activities and instead trigger review of that test.
It’s actually quite simple when you realize that once you have an internal error that isn’t handled or retried your test has lost control of the state of the test activity and simply and clearly cannot proceed. That’s it- the test is broken and must stop (or let the next test run after reporting this internal failure).
Below is a snippet from real code which, for obvious reasons, has been retired. It shows a utility method which sets up a search operation with a variable search term. Note carefully, that userClicks () can fail for any number of page-rendering problems where the target object is not available or not ready or otherwise just wrong. userClicks () returns a success/fail result code to the caller but as you can see the result is not checked. Even the selectValueFromDropDown(…) method relies on userClicks(), so it too can fail silently. In this code, every line could fail silently and the caller would never know. The test writer assumed that every call, every time, would work correctly. That assumption was not true in practice.
If userClicks() threw a TestInternalErrorException, then this code would be put under control and we would be able to do the right thing when the SUT works but the test environment doesn’t:
Here I’ve put all of the formerly silent failures under control with a minimum of extra code by having userClicks(..) emit an exception on error. The try/catch/finally allows the test writer to cluster a group of statements in a meaningful way but keep the whole operation under control. This construct allows the test writer to be able to specifically identify the true cause of the failure and avoid incorrectly declaring a FAIL for the SUT.
The catch block then logs the error. The best place to log the problem is where it occurred–because that’s where the reason for and the information about the true failure is. In any case, the exception allows the code to be interrupted at the true point of failure and the problem recognized at that point, not where some subsequent verification of a SUT failure happens. Then, the ‘finally’ block gives you a place to assert control over the situation whether there is an exception or not and clean up any system changes that may have occurred prior to the failure.
It is highly recommended that you incorporate the ‘finally’ block into your thinking and into your code. In testing, knowing the state of things is critical, and the ‘finally’ block gives you what you need to assert control over the situation whether there is an exception or not.
The test writer is no longer assuming that all of this test probe setup code always works flawlessly. The test writer can know when this sequence happens flawlessly as hoped, and when it doesn’t. This mechanism puts the test code under control and allows the test writer the opportunity to end the test cleanly with a clear indication that there was a problem not in the SUT, but in the test or its environment. There will be no false defect reports for the SUT, but there may be a valid defect report against the test code or an investigation into the reliability of the test environment.
This is the hard part! Many test running environments accept only PASS or NOT PASS as a binary condition. This environment can make no distinction between a product SUT failure and a test failure. In this case, for failures there must be a post-process which processes some artifact of the testing, usually a log or console output, which can recognize a TestInternalException and/or its standard log line. Slight post-processing is pretty common in test environments and this can really raise the reliability and predictability of hands-off, lights-out, no-human-involved automated testing.
After all, that is the goal: have your automated tests so reliable, robust, repeatable and predictable that you can run them anytime without human intervention and get the right result, even if the test environment fails you. This technique of “Knowing, not Assuming” for your tiniest test steps will go a long way to helping you get there.