Tag Archives: Software

Software Test Methods, Levels, quiz question answers

Quiz questions about software test. My answers are probably longer than was hoped for, but specific, and most important, true and demonstrable.

1) What is the difference between functional testing and system testing?

2) What are the different testing methodologies?

1) System test is the equivalent of actual customers/users using the product. Carried out as if in the real world, with a range of detailed configurations, simulation of typical users working in typical way. It is one level of abstraction above Functional testing. Functional Test verifies that the product will do functions which it is intended to do. Play, rewind, stop, pause, fast forward. +, -, x, /, =.  Functional Tests must be drawn from the Requirements documents. System Test checks that a product which meets those requirements can be operated in the real world to solve real problems. Put another way, System test proves that the requirements selected for the product are correct.

This makes one wonder why engineers don’t do system test on the requirements before creating the design and code… mostly because its hard to do, and they’re sure they understand what the requirements should be, I suppose. I’ve never seen it done in depth.


2) “the different testing methodologies” seems over-determined. The following are ‘some’ different testing methods. There may be others.

Perhaps the intent of the question is to expose a world divided into White Box and Black Box testing, which are different from each other. But there are other dichotomies, in addition to White Box and Black Box.

Software testing methods divide into two large classes, Static and Dynamic. Static testing looks at source code, dynamic testing requires executable programs and runs them. Another division is between Using a Tool that evaluates source code and and Checking Program Output. Within either set of large groups are smaller divisions, Black Box and White Box (and Clear Box and Gray Box) are all divisions of Dynamic or Checking Output methods.  Specific methods within the large groups include

  • running source code through a compiler
  • running a stress test that consumes all of a given resource on the host
  • running a tool that looks for memory allocation and access errors
  • doing a clean install on a customer-like system and then running customer-like activities and checking their output for correctness.

Orthagonal to all of the above, Manual Test and Automated Test are infastructure-based distinctions, Automated tests may be Black Box, Unit, running a tool, checking output, or any other methodology. Manual and Automated are meta-methods.


Static Software Test Methods: Similar to, but not exactly the same as Tool Using Methods, to find problems in software source code.

2.1) Compile successfully, no errors or warnings. This is the first step before inspection, since nothing is better or cheaper at finding compiler problems than the compiler.

2.2) Inspection and code review, to see if the code is written to the standards that the organization enforces. I like and use code reviews, the formal Fagan system, and less formal “extreme programming” techniques like having a second person review all diffs or do a walk through with two people at the workstation. They work. The standards inspected for are usually helpful in preventing bugs or making them visible. Just looking usually improves product quality – the Western Electric effect if nothing else.

There may be some insight into product requirements and how the code meets them in a review. But the reviewers would need to know the requirements and the design of the software in some detail. Its difficult enough to get the code itself to be read. In Engineering Paradise, I suppose the requirements are formally linked to design features, and features to data and code that operates on that data, to create the feature.

2.3) Static analysis. Besides passing compiler checks without errors or warnings, there are static analysis tools, “lint” for example, that can inspect code for consistency with best practices and deterministic operation. Coverity, and others, have commercial products that do static test on source code.

2.4) Linking, loading. The final static events are linking the code and libraries required to complete the application, and writing a usable file for the executable, which the loader will load.

Dynamic Software Test Methods:

2.5) Memory access / leakage software test. Rational/IBM’s Purify, like ValGrind and BoundsChecker, run an instrumented copy of the source code under test to see memory problems in a dynamic environment. Its run and the results should be checked and responded to before a large investment in further  Dynamic testing should happen.

2.6) Performance test. Measuring resources consumed, obviously time, possibly others, during repeatable, usually large-scale, operations, similar to System or Load tests. Generic data, from development testing, is necessary and may be shipped as an installation test to users. Proprietary data, under a NDA (non-disclosure agreement), may also be needed, for complex problems ans/or important customers. In normal operation, the actual outputs are not looked at, at most, spot-checked, and the tool(s) keeping track of resources are the basis of pass/fail.

2.7) Installation Test. Typically a subset of in-house performance tests, with optional, generic, data. The performance recorded is comparable between releases, instances, configurations, sites, customers, and the software maker’s own in-house performance tests. Customers can use Installation tests to verify their hardware/software environment, benchmark it, evaluate new purchases for their environment, etc.


Checking Program Output Methods:

After tool based dynamic testing, the rest of Dynamic software test is based on running the product with specific inputs and checking the outputs, in detail.

Checking can be done with with exit status, stack traces,”assert()”, exceptions, diffing large output files against ‘gold’ references, log searches, directory listings, searching for keywords in output streams indicating failure or incorrect operation, checking for expected output and no other, etc. No test failures are acceptable. Each test must be deterministic, sequence independant, and (ideally) can run automatically. No judgement required for results. All require running the program.

2.8) Unit tests of pieces of the a product, in isolation, with fake/simulated/mock resources. A great bottom-up tool for verifying software. At the unit test level is where knowledge of the code is most important to testing. It is white box/clear box, with full insight into the code under test. One explicit goal of unit test should be forcing all branches in the code to be executed. That can’t be done without allowing visibility into the code.

2.9) Integration Test. The next level above unit test, the tests of code which calls code which calls code… and the code above that! The point is that integration is where code from different groups, different companies, different points in time, certainly different engineers, comes together. Misunderstanding is always possible. Here’s one place it shows up. Visibility into the code is getting dimmer here. Some tests are more functional, if a subsystem contains complete, requirement-satisfying functions.

2.10) Functional Test. Verifying that the product will do functions which it is intended to do. Play, rewind, stop, pause, fast forward. +, -, x, /, =.  Tests here should be drawn from the Requirements documents. Things that should be tested here should start in the Requirements docs. Each requirement has to be demonstrated to have been met. Its black-box testing, run from the interface customers use, on a representative host, with no insight into the internals of the product. Unless the requirements specify low level actions.

Its not particularly combinatorial- a short program, a long program, 2+2, 1/-37. Pat head. Rub belly. Walk, Not all 3 at once.

If a word-processor has no stated limit for document size, you need to load or make a really big file, but, truly, that’s a bad spec. A practical limit of ‘n’ characters has to be agreed as the maximum size tested-to. Then you stop.

All these Tests should be drawn from the Requirements documents. Things that should be tested here should start in the Requirements docs.

All that Verification is good, but what about Validation?

Unit test,  Integration test, or Functional Test, is where Validation, proving correctness of the design, might happen. Validation test is where deep algorithms are fully exercised, broad ranges of input are fully exercised, Tests that include all possible numerals, all possible characters, all defined whitespace, read in or written out. Numbers from MinInt to MaxInt, 0 to MaxUnsigned, the full range of Unicode characters, etc., etc., are exercised.

(Errors in input numbers should be seen in System test anyway, but accepting a wide range goes here.) This is not always done very formally, because most modern code environments don’t need it. But someone ought to look at least once.

L10n (Localization) and I18n (Internationalization) that need to be selected at link time or run time can be checked here too.
This is also where path-length limits, IPv-6 addresses, etc. should be checked.

2.11) User interface test verifies the controls and indicators that users at various levels see, hear, touch, operate and respond to. This is separate from any actual work the program may do in response. This is a high-value target for automation, since it can be complex and tedious to do UI testing in great detail by hand.

2.12) System Test. Full up use of the system. Training, white-paper and demo/marketing examples. Real-world situations reproduced from bugs or solutions provided for customers. Unless requirements included complexity, this is where the complex tests start. Huge data. Complex operations.  The range of supported host configurations, min to max, gets tested here too.

We’ll want to see all the error messages, created every possible way. We’ll want to have canned setups on file, just like a customer would, and we pour them into the product, run it, and collect the output. The set pass/fail on the output.

Somewhere between System Test and Acceptance test, the scale of pass/fail goes up another level of abstraction. Software test pass/fail results are one in the same with the product pass / fail. If data and setup are good, it should run and pass. Ship the result. If the data and/or setup have a problem, it should run and fail. The failure should propagate out to be stored in detail, but in the end this is a trinary result. Pass, Fail, Not Proven

2.13) Load test, Stress test.  Load tests go to the point that all of a resource is consumed, and adding  more activity produces no more output in real time. Resources include CPU, memory, local storage, networked storage, video memory, USB ports, maximum number of users, maximum number of jobs, maximum instances of product, etc. Stress test adds data, jobs, etc, clearly (110% or more) above load test maximum.

2.14) Stability test. Long term test. Stability test and long-term test are where a server or set of servers are started and left running, doing real work, for days, weeks, months. Some of the tests must repeat inputs and expect identical outputs each time.  Resource consumption should be checked. Its fair for the application or tool to have the node to itself, but adding other applications and unrelated users here and in the Load/Stress tests is meaningful, to avoid surprises from the field.

2.15) Acceptance test.  Customer sets-up their run-time world use of the system and uses it. Everything they would normally do. If its a repeat sale, they may just clone the previous installation. Run the previous and the new system, release, patch, etc, and compare output to installed software on machines that customer likes and trusts. If the product is a new one, acceptance means judging pass-fail from the output produced.


Many other kinds of test are mentioned in conversation and literature. A web search will turn up dozens. Regression test, stability test, in the sense that a new code branch is stable, sanity test and smoke test are all forms of testing but usually, in my experience, consist of subsets of the test levels/methods listed above.

A Smoke test (run the product, make sure it loads and runs, like a hardware smoke test where you apply power, turn it on and see if any smoke comes out…) can be made from the first steps of several different methods/levels named above. If the Smoke test is more than simply running the program once, then it should probably be some part of one of the other methods/levels. Or to put it another way, the work that goes into setting up the smoke test should be shared/captured. There might be a ..test/smoke/… directory, but the contents should be copied from somewhere else.

A Sanity test, a Stability test and Regression tests are successively larger swaths, at lower and lower levels, of the System, Performance, User Interface, Functional, etc. tests. They should be specified and are not embarrassing, but their content should be drawn from or reflected by those larger level-based tests. The should not be original and alone.

What do you think?

“Testing – How does one learn QA?” – An answer I posted on the StackOverflow “Programers” forum

Ziv, the questioner asks: ” … how would one proceed if he wants to learn QA?

More specifically, a programmer who wants to learn about the QA process and how to manage a good QA methodology. I was given the role of jumpstarting a QA process in our company and I’m a bit lost, what are the different types of testing (system, integration, white box, black box) and which are most important to implement first? How would one implement them?

I wrote:

There are simple rules of thumb.

Try what the manual says. Install and run on a clean target, user license, the works. Does it work? Did you have to add anything not covered in the manual?

Are all the default control values usable? Or is there something that’s wrong, or blank, by default and always has to be changed?

Set every value in the user interface to something other than its default. Can you detect a difference caused by the change? Is it correct? Do them one at a time, or in the smallest sets possible, to make the results clear.

Set every value in the user interface to a second, non-default, value. Change everything at once. Can you detect the difference? Is it correct?

One by one, do something to cause every error message to be generated. Do something similar, but correctly, so that no error message is generated.

All of the above depend on changing a condition, between an “A” case and a “B” case, and that change having a detectable result. Then the “C” case produces another change, another result, and so forth. For 10 tests, you need 11 conditions. Using defaults as much as possible is a good first condition.

By now you’ve got a list of things to test, that you recorded, and results, that you recorded, and maybe some new bugs. Throw something big and complicated at the solution. Give it a file of 173000 words to sort, paste a Jane Austin novel or some telecommunications standard 100 pages long, a 50MB bitmap graphic, 3 hours of streaming video. Open the performance monitor and get CPU-bound, or I/O bound. For an hour. Check memory use: always increasing? Rises and falls?

Take the list of bugs closed in the last week, month, sprint, etc. Check them. All. Are they really fixed?

Keep track of what to do, how it worked on what version/release/build/configuration, open and closed bugs, what controls have been set or changed, what data, test files or examples have been used, etc. is all part of Quality world. Keep results as tables in a spread sheet, make version controlled backups / saves.

Someone writing software, or any one creating anything, has an idea of what they’re trying to make. The quality process starts with expectations. Requirements, specifications, rules, or another articulation of what’s expected. Then there’s the solution, the thing offered to perform, assist, enable or automate what’s expected. Then there are tests, operations, examples, inspections, measurements, questionnaires, etc., to relate one or more particular solution(s) to (relevant) expectations. Finally, there’s an adjustment, compensation, tuning, correction or other positive action that is hoped to affect the solution(s).

When one writes software, one has a goal of it doing something, and to the extent that’s expressed, the behavior can be checked. Hello.exe displays “Hello World” on a screen. “2**150” in the Python interpreter displays, “1427247692705959881058285969449495136382746624L”. Etc. For small problems and small solutions, its possible to exhaustively test for expected results. But you wouldn’t test a word processor just by typing in some words, or even whole documents. There are limits of do-ability and reason. If you did type in all of “Emma” by Jane Austin, would you have to try her other four novels? “Don Quixote” in Spanish?

Hence an emphasis on expectations. Meeting expectations tells you when the solution is complete. My web search for “Learn Quality Assurance” just returned 46 million potential links, so there’s no shortage of opinions. Classic books on the subject (my opinion, worth what you paid for it:) include

  • Quality is Free” by Philip Crosby,
  • Zen and the Art of Motorcycle Maintenance” by Robert Pursig
  • Managing the Software Process” by Watts Humphrey
  • “The Mythical Man Month” by Fred Brooks
  • Code Complete” by Steve McConnell

Take 5 minutes to read some of the Amazon reviews of those books and you’ll be on your way. Get one or more and read them. They’re not boring. Browse ASQ, Dr. Dobbs, Stack Overflow. Above all, just like writing software. DO it. Consider the quality of some software under your control. Does it meet expectation? If so, firm hand-shake and twinkle in the eye. Excellent!. If not, can it be corrected? Move to the next candidate.

I like the Do-Test-Evaluate-Correct loop, but its not a Universal Truth. Pick a process and follow it consciously. Have people try the testing, verification and validation steps described in the language manual they use most frequently. Its right there on their desk, or in their phone’s browser.

Look at your expectations. Are they captured in a publicly known place? With revision control? Does anyone use them? Is there any point where the solutions being produced are checked against the expectations they are supposed to be meeting?

Look at your past and current bug reports. (You need a bug tracking system. If you don’t have one, start there.) What’s the most common catastrophic bug that stops shipment or requires an immediate patch? Whats the most commonly reported customer bug? What’s the most common bug that doesn’t get fixed?

Take a look at ISO 9000 process rules. Reflect on value to your customers/users. Is there’s a “customer value statement” that explains how some change affects the customer’s perception of the value of the solution? How about in the requirements?

By “the QA process”, you could mean “Quality Assurance”, versus “QC”, “Quality Control”? You might start with the http://www.ASQ.org web site, where the “American Society for Quality” dodges the question by not specifying “Control” (their old name was “ASQC”) or “Assurance”.

Quality; alone, “assured” or “controlled”, is a big idea with multiple, overlapping definitions and usages. Some will tell you it cannot be measured in degrees- its present or not, no “high quality” or “low quality” for them. Another famous claim is that no definition is satisfactory, so its good to talk about it, but avoid being pinned down in a precise definition. How do you feel about it?


The original posting is at http://programmers.stackexchange.com/questions/255583/how-does-one-learn-qa/255595#255595

An example that pleased me: The difference between an abstract class and an interface, in Java:

Here’s the punch line:

In Java, Prussia can extend (“be a”) one of the super-classes, Holy, Roman or  Empire, but only one. Prussia can implement the other two as interfaces, but only with methods and fields uniquely its own. If Prussia is to be Holy, be Roman and be an Empire, the strictly hierarchical relationship of those three super-classes has to be worked out separately and in detail, in advance. I can only imagine Herr von Bismark would approve.


And the whole magilla:
1) What is the difference between an interface and an abstract class?

An abstract class defines data (fields) and member functions but may not, itself, be instantiated. Usually, some of the methods of an abstract class are abstract and expected to be supplied by a sub-class, but some of the methods are defined.  Unless they are final, they can be overridden, and they can always be overloaded. Private parts of an abstract super class, for example, data, are not available to a subclass, so access methods (public or protected) must be used by the subclass. An abstract superclass is “extended” by a subclass. A given subclass may only extend one super-class, but a super-class may extend another super-class, in a hierarchy. (This avoids the complexities/difficulties of multiple inheritance in C++)

An interface is a proper subset of an abstract class, but has a different scope and use. An interface has ONLY abstract member functions and static, final, fields, aka constants. Any subclass has to provide all the variable fields and code which implements an interface. The implementing class cannot override the interface’s member signatures – the signatures are what the interface *is*. It is possible to overload an interface’s signatures, adding or subtracting variables, changing return or variable types, but the overloads do not satisfy the requirements of the interface. The implementing class(s) must contain actual member functions to satisfy all of the signatures in the interface, because there is no default, no code in the interface.  As used above, a given class ‘implements’ an interface, it does not ‘extend’ it. These limitations to an interface allow a given class to implement more than one, which retains most of the utility of multiple inheritance without, as it were, opening Plethora’s bag. (grin)

For example: In Java, Prussia can extend (“be a”) one of the super-classes, Holy, Roman or  Empire, but only one. Prussia can implement the other two as interfaces with methods and fields uniquely its own. If Prussia is to be Holy, be Roman and be an Empire, the strictly hierarchical relationship of those three super-classes has to be worked out separately and in detail, in advance. I can only imagine Herr von Bismark would approve.

White Box QA Test of a 3 text-field and 2 pushbutton web panel…

In an interview, I was asked how I would construct a White Box QA test for a screen with

a Name field,
a password field,
a passwordConfirm field,
an OK button and
a Cancel button.:

I set down my best effort, below, on paper, and then checked it against a software testing book I have- “How to Break Software”. I was fairly pleased with what I’d come up with, since what the book says is to concentrate on generating each and every error message. More or less the same as what I had come up with:

name Password Password Screen

Name: [ ______________________]
Password: [_______________]
Password conf: [_______________]
[ OK ] [ Cancel ]

(Implied outputs:
Error Message: [==============];
Dialog box: msg [=============], [ OK ];
Status message: [============]; )

Note that the “Name”, “Password” and “Password conf” lines are explicitly outputs as well as inputs. They are set to something when the screen comes up. The Passwords might or might not be replaced with “*”s or big black dots. [Cancel] might or might not clear the strings

Inputs: Name, Password, PasswordConf, Ok_button, Cancel_button

(I’m conflicted on one point- Password1 and Password2 are useful definitions *inside* where keeping track that they are two, very similar, pieces of information is important. For a user, “Password” and “Password Confirm” are better definitions. I have to pick one nominclature and use it consistently.)

In addition to text in the Name, Password and Password Confirm fields, at least one or two of the following should be available:

Error Message: I assume that we have some space to output a message on error, Have to remember to blank this once we’re out of the error state.

Dialog box: If there’s a message that requires acknowledgement, then a dialog box is really what we’re talking about (modal use of the buttons in the ‘normal’ interface would be evil.) Of Necessity, a dialog box includes both an output line (which may be an error mentioned OR something else..) and an acknowledgement pushbutton.

Status Message: (blank), (ok), (error), (reset), ? – a complex screen, interface or underlying program might have an internal state, and for testing, it would be nice to show it.

By “White Box” I presume we mean we know something about how it works inside, so we know if the screen is well defined and not subject to sequence effects… or we know we DO have to address sequence effects..

For each input, there are relevant lengths, because Name and Password typically have minimum and maximums, and there are error messages relating to them:
For any text input: Name, Password, PasswordConfirm, etc., the 5 possible lengths are:
(1 character,
minimum number of characters – 1,
minimum number of characters,
maximum number of characters,
maximum number of characters + 1)

For each input, there is an acceptable character set and characters which are not allowed:
Alpha: [A-Z,a-z]
numeric: [0-9]
Alpha no case: [Aa]-[Zz]
Shift-numeric: [0-9,[!@#$%^&*()]
Special chars: [`-=[]\;’,./], [~_+{}|:”?].

Out of these categories, or a literal list of values, we can make acceptedNameChar and acceptedPasswordChar sets. We can also make unAcceptedNameChar and unAcceptedPasswordChar sets. Obviously overkill for testing the little screen this started with but worth considering in the case of algorithmically generating a bunch of non-identical test cases for a large set of screens or other objects-under-test.

name, passWd and passWdConf strings of characters are sets of the 5 possible lengths of the acceptable characters, or acceptable + unAcceptable

PRESUMING sequence doesn’t matter, before the Ok_Button is clicked, the test input and output vectors would look like:

Name PwdA PwdB , ErrorOut
—– ——- ——– ———— ————
“a”, “b”, “c” expect errors: Less than minimum , b & c don’t match…
minName-1, minPwdA, minPwdA (expect error: Less than minimum name error)
minName, minPwdA-1, minPwdA (expect error: Less than minimum passwd1)
minName, minPwdB, minPwdB-1 (expect error: Less than minimum passwd2)
minName, minPwdA, minPwdB Ok (expect error: passwds don’t match)
minName, minPwdA, minPwdA Ok (expect success)
minName, minPwdB, minPwdB Ok (expect error, name used already with different passwd- or does this just reset the password to the new one?
maxName, maxPwdB maxPwdB Ok (expect success)
maxName, maxPwdB, minPwdB Ok (expect error, name used already)
maxName, maxPwdA, maxPwdA Ok (expect success)
maxName + 1 , maxPwdB , maxPwdB error More than maximum error.
maxName, maxPwdB +1 , maxPwdB error More than maximum, pwds don’t match.
maxName1 , maxPwdA , maxPwdA+1 error More than maximum, pwds don’t match.

The vectors of input values and single outputs might expand to walking a blank through valid data. walking valid data through blanks. etc.

I prefer using arrays, arrays of structs and enums as indexes to handle this kind of thing. Because I find it easier to inspect and see whether its correct or not. Since the same values are used over and over again, it seems better controlled with the repeated values coming from a single variable.

enum lengths={ aChar, minM1, min, max, maxP1, numLengths };

String[] name=new String( “a”, “minim”, “minimu”,


“minimum_minimum_minimum…minimum_minimum_minimumX” );

String[] passWd=new String( “a”, “minim”, “minimu”,


“minimum_minimum_minimum…minimum_minimum_minimumX” );

String[] passWdB=new String( “b”, “binim”, “binimu”,


“binimum_minimum_minimum…minimum_minimum_minimumX” );

Name, PwdA, PwdB , ErrorOut?
—– —— —— ———— ————
name[ aChar ], passWd[ aChar ], passWd[ aChar ], errors: Less than minimum.
name[ aChar ], passWd[ aChar ], passWdB[ aChar ], errors: Less than minimum , Password and PasswordConf don’t match…
name[ minM1 ], passWd[ min ], passWd[ min ], error: Less than minimum name error
name[ min ], passWd[ minM1 ], passWd[ min ], errors: Less than minimum passwd1, passwords don’t match
name[ min ], passWd[ min ], passWd[ minM1 ], errors: Less than minimum passwd2, passwds don’t match
name[ min ], passWd[ minM1 ], passWd[ minM1 ], errors: Less than min. chars, both pwds.
name[ min ], passWd[ minM1 ], passWdB[ minM1 ], error: passwds don’t match
name[ min ], passWd[ min ], passWd[ min ], expect success
name[ min ], passWdB[ min ], passWdB[ min ], error, name used already with different passwd- or does this just reset the password to the new one?
name[ max ], passWd[ max ], passWd[ max ], expect success
name[ max ], passWdB[ max ], passWdB[ max ], error, name used already?
name[ maxP1 ], passWd[ max ], passWd[ max ], error More than maximum name error.
name[ max ], passWd[ maxP1 ], passWd[ max ], errors More than max pwd, pwds don’t match.
name[ max ], passWd[ max ], passWd[ maxP1 ], errors More than max pwd, pwds don’t match.
name[ max ], passWd[ max ], passWdB[ max ], error, pwds don’t match.
name[ max ], passWd[ maxP1 ], passWd[ maxP1 ], error More than max pwd.

For symmetry, I suppose name, passWd and passWdB could include a second set of strings which incorporate unacceptable characters and 5 more enums to pick them (or an “un” value to addd… or there should be three more arrays named unName, unPassWd, unPassWdB.

Point is, there are a lot of different error messages implied by this relatively simple panel, and to be certain that it behaves for each and every build.

Generally, I think its a good thing to get each error by itself from each possibly position, but not to get too worked up about combinatorials…. so walking a too short or too long string through the three inputs is fine, but having to wrangle TWO too long or too short or one of each is probably overkill- Its worth having ALL the inputs be errors to be sure that arbitrary collections of errors work, together, and you’re done, unless there’s some reason to think errors interact.