regression test | bill abbott's weblog

Quiz questions about software test. My answers are probably longer than was hoped for, but specific, and most important, true and demonstrable.

1) What is the difference between functional testing and system testing?

2) What are the different testing methodologies?

1) System test is the equivalent of actual customers/users using the product. Carried out as if in the real world, with a range of detailed configurations, simulation of typical users working in typical way. It is one level of abstraction above Functional testing. Functional Test verifies that the product will do functions which it is intended to do. Play, rewind, stop, pause, fast forward. +, -, x, /, =. Functional Tests must be drawn from the Requirements documents. System Test checks that a product which meets those requirements can be operated in the real world to solve real problems. Put another way, System test proves that the requirements selected for the product are correct.

This makes one wonder why engineers don’t do system test on the requirements before creating the design and code… mostly because its hard to do, and they’re sure they understand what the requirements should be, I suppose. I’ve never seen it done in depth.

2) “the different testing methodologies” seems over-determined. The following are ‘some’ different testing methods. There may be others.

Perhaps the intent of the question is to expose a world divided into White Box and Black Box testing, which are different from each other. But there are other dichotomies, in addition to White Box and Black Box.

Software testing methods divide into two large classes, Static and Dynamic. Static testing looks at source code, dynamic testing requires executable programs and runs them. Another division is between Using a Tool that evaluates source code and and Checking Program Output. Within either set of large groups are smaller divisions, Black Box and White Box (and Clear Box and Gray Box) are all divisions of Dynamic or Checking Output methods. Specific methods within the large groups include

running source code through a compiler
running a stress test that consumes all of a given resource on the host
running a tool that looks for memory allocation and access errors
doing a clean install on a customer-like system and then running customer-like activities and checking their output for correctness.

Orthagonal to all of the above, Manual Test and Automated Test are infastructure-based distinctions, Automated tests may be Black Box, Unit, running a tool, checking output, or any other methodology. Manual and Automated are meta-methods.

Static Software Test Methods: Similar to, but not exactly the same as Tool Using Methods, to find problems in software source code.

2.1) Compile successfully, no errors or warnings. This is the first step before inspection, since nothing is better or cheaper at finding compiler problems than the compiler.

2.2) Inspection and code review, to see if the code is written to the standards that the organization enforces. I like and use code reviews, the formal Fagan system, and less formal “extreme programming” techniques like having a second person review all diffs or do a walk through with two people at the workstation. They work. The standards inspected for are usually helpful in preventing bugs or making them visible. Just looking usually improves product quality – the Western Electric effect if nothing else.

There may be some insight into product requirements and how the code meets them in a review. But the reviewers would need to know the requirements and the design of the software in some detail. Its difficult enough to get the code itself to be read. In Engineering Paradise, I suppose the requirements are formally linked to design features, and features to data and code that operates on that data, to create the feature.

2.3) Static analysis. Besides passing compiler checks without errors or warnings, there are static analysis tools, “lint” for example, that can inspect code for consistency with best practices and deterministic operation. Coverity, and others, have commercial products that do static test on source code.

2.4) Linking, loading. The final static events are linking the code and libraries required to complete the application, and writing a usable file for the executable, which the loader will load.

Dynamic Software Test Methods:

2.5) Memory access / leakage software test. Rational/IBM’s Purify, like ValGrind and BoundsChecker, run an instrumented copy of the source code under test to see memory problems in a dynamic environment. Its run and the results should be checked and responded to before a large investment in further Dynamic testing should happen.

2.6) Performance test. Measuring resources consumed, obviously time, possibly others, during repeatable, usually large-scale, operations, similar to System or Load tests. Generic data, from development testing, is necessary and may be shipped as an installation test to users. Proprietary data, under a NDA (non-disclosure agreement), may also be needed, for complex problems ans/or important customers. In normal operation, the actual outputs are not looked at, at most, spot-checked, and the tool(s) keeping track of resources are the basis of pass/fail.

2.7) Installation Test. Typically a subset of in-house performance tests, with optional, generic, data. The performance recorded is comparable between releases, instances, configurations, sites, customers, and the software maker’s own in-house performance tests. Customers can use Installation tests to verify their hardware/software environment, benchmark it, evaluate new purchases for their environment, etc.

Checking Program Output Methods:

After tool based dynamic testing, the rest of Dynamic software test is based on running the product with specific inputs and checking the outputs, in detail.

Checking can be done with with exit status, stack traces,”assert()”, exceptions, diffing large output files against ‘gold’ references, log searches, directory listings, searching for keywords in output streams indicating failure or incorrect operation, checking for expected output and no other, etc. No test failures are acceptable. Each test must be deterministic, sequence independant, and (ideally) can run automatically. No judgement required for results. All require running the program.

2.8) Unit tests of pieces of the a product, in isolation, with fake/simulated/mock resources. A great bottom-up tool for verifying software. At the unit test level is where knowledge of the code is most important to testing. It is white box/clear box, with full insight into the code under test. One explicit goal of unit test should be forcing all branches in the code to be executed. That can’t be done without allowing visibility into the code.

2.9) Integration Test. The next level above unit test, the tests of code which calls code which calls code… and the code above that! The point is that integration is where code from different groups, different companies, different points in time, certainly different engineers, comes together. Misunderstanding is always possible. Here’s one place it shows up. Visibility into the code is getting dimmer here. Some tests are more functional, if a subsystem contains complete, requirement-satisfying functions.

2.10) Functional Test. Verifying that the product will do functions which it is intended to do. Play, rewind, stop, pause, fast forward. +, -, x, /, =. Tests here should be drawn from the Requirements documents. Things that should be tested here should start in the Requirements docs. Each requirement has to be demonstrated to have been met. Its black-box testing, run from the interface customers use, on a representative host, with no insight into the internals of the product. Unless the requirements specify low level actions.

Its not particularly combinatorial- a short program, a long program, 2+2, 1/-37. Pat head. Rub belly. Walk, Not all 3 at once.

If a word-processor has no stated limit for document size, you need to load or make a really big file, but, truly, that’s a bad spec. A practical limit of ‘n’ characters has to be agreed as the maximum size tested-to. Then you stop.

All these Tests should be drawn from the Requirements documents. Things that should be tested here should start in the Requirements docs.

All that Verification is good, but what about Validation?

Unit test, Integration test, or Functional Test, is where Validation, proving correctness of the design, might happen. Validation test is where deep algorithms are fully exercised, broad ranges of input are fully exercised, Tests that include all possible numerals, all possible characters, all defined whitespace, read in or written out. Numbers from MinInt to MaxInt, 0 to MaxUnsigned, the full range of Unicode characters, etc., etc., are exercised.

(Errors in input numbers should be seen in System test anyway, but accepting a wide range goes here.) This is not always done very formally, because most modern code environments don’t need it. But someone ought to look at least once.

L10n (Localization) and I18n (Internationalization) that need to be selected at link time or run time can be checked here too.
This is also where path-length limits, IPv-6 addresses, etc. should be checked.

2.11) User interface test verifies the controls and indicators that users at various levels see, hear, touch, operate and respond to. This is separate from any actual work the program may do in response. This is a high-value target for automation, since it can be complex and tedious to do UI testing in great detail by hand.

2.12) System Test. Full up use of the system. Training, white-paper and demo/marketing examples. Real-world situations reproduced from bugs or solutions provided for customers. Unless requirements included complexity, this is where the complex tests start. Huge data. Complex operations. The range of supported host configurations, min to max, gets tested here too.

We’ll want to see all the error messages, created every possible way. We’ll want to have canned setups on file, just like a customer would, and we pour them into the product, run it, and collect the output. The set pass/fail on the output.

Somewhere between System Test and Acceptance test, the scale of pass/fail goes up another level of abstraction. Software test pass/fail results are one in the same with the product pass / fail. If data and setup are good, it should run and pass. Ship the result. If the data and/or setup have a problem, it should run and fail. The failure should propagate out to be stored in detail, but in the end this is a trinary result. Pass, Fail, Not Proven

2.13) Load test, Stress test. Load tests go to the point that all of a resource is consumed, and adding more activity produces no more output in real time. Resources include CPU, memory, local storage, networked storage, video memory, USB ports, maximum number of users, maximum number of jobs, maximum instances of product, etc. Stress test adds data, jobs, etc, clearly (110% or more) above load test maximum.

2.14) Stability test. Long term test. Stability test and long-term test are where a server or set of servers are started and left running, doing real work, for days, weeks, months. Some of the tests must repeat inputs and expect identical outputs each time. Resource consumption should be checked. Its fair for the application or tool to have the node to itself, but adding other applications and unrelated users here and in the Load/Stress tests is meaningful, to avoid surprises from the field.

2.15) Acceptance test. Customer sets-up their run-time world use of the system and uses it. Everything they would normally do. If its a repeat sale, they may just clone the previous installation. Run the previous and the new system, release, patch, etc, and compare output to installed software on machines that customer likes and trusts. If the product is a new one, acceptance means judging pass-fail from the output produced.

Many other kinds of test are mentioned in conversation and literature. A web search will turn up dozens. Regression test, stability test, in the sense that a new code branch is stable, sanity test and smoke test are all forms of testing but usually, in my experience, consist of subsets of the test levels/methods listed above.

A Smoke test (run the product, make sure it loads and runs, like a hardware smoke test where you apply power, turn it on and see if any smoke comes out…) can be made from the first steps of several different methods/levels named above. If the Smoke test is more than simply running the program once, then it should probably be some part of one of the other methods/levels. Or to put it another way, the work that goes into setting up the smoke test should be shared/captured. There might be a ..test/smoke/… directory, but the contents should be copied from somewhere else.

A Sanity test, a Stability test and Regression tests are successively larger swaths, at lower and lower levels, of the System, Performance, User Interface, Functional, etc. tests. They should be specified and are not embarrassing, but their content should be drawn from or reflected by those larger level-based tests. The should not be original and alone.

What do you think?

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tag Archives: regression test

Software Test Methods, Levels, quiz question answers

Archives

Pages