Software Test Methods, Levels, quiz question answers


Quiz questions about software test. My answers are probably longer than was hoped for, but specific, and most important, true and demonstrable.

1) What is the difference between functional testing and system testing?

2) What are the different testing methodologies?

1) System test is the equivalent of actual customers/users using the product. Carried out as if in the real world, with a range of detailed configurations, simulation of typical users working in typical way. It is one level of abstraction above Functional testing. Functional Test verifies that the product will do functions which it is intended to do. Play, rewind, stop, pause, fast forward. +, -, x, /, =.  Functional Tests must be drawn from the Requirements documents. System Test checks that a product which meets those requirements can be operated in the real world to solve real problems. Put another way, System test proves that the requirements selected for the product are correct.

This makes one wonder why engineers don’t do system test on the requirements before creating the design and code… mostly because its hard to do, and they’re sure they understand what the requirements should be, I suppose. I’ve never seen it done in depth.

 

2) “the different testing methodologies” seems over-determined. The following are ‘some’ different testing methods. There may be others.

Perhaps the intent of the question is to expose a world divided into White Box and Black Box testing, which are different from each other. But there are other dichotomies, in addition to White Box and Black Box.

Software testing methods divide into two large classes, Static and Dynamic. Static testing looks at source code, dynamic testing requires executable programs and runs them. Another division is between Using a Tool that evaluates source code and and Checking Program Output. Within either set of large groups are smaller divisions, Black Box and White Box (and Clear Box and Gray Box) are all divisions of Dynamic or Checking Output methods.  Specific methods within the large groups include

  • running source code through a compiler
  • running a stress test that consumes all of a given resource on the host
  • running a tool that looks for memory allocation and access errors
  • doing a clean install on a customer-like system and then running customer-like activities and checking their output for correctness.

Orthagonal to all of the above, Manual Test and Automated Test are infastructure-based distinctions, Automated tests may be Black Box, Unit, running a tool, checking output, or any other methodology. Manual and Automated are meta-methods.

 

Static Software Test Methods: Similar to, but not exactly the same as Tool Using Methods, to find problems in software source code.

2.1) Compile successfully, no errors or warnings. This is the first step before inspection, since nothing is better or cheaper at finding compiler problems than the compiler.

2.2) Inspection and code review, to see if the code is written to the standards that the organization enforces. I like and use code reviews, the formal Fagan system, and less formal “extreme programming” techniques like having a second person review all diffs or do a walk through with two people at the workstation. They work. The standards inspected for are usually helpful in preventing bugs or making them visible. Just looking usually improves product quality – the Western Electric effect if nothing else.

There may be some insight into product requirements and how the code meets them in a review. But the reviewers would need to know the requirements and the design of the software in some detail. Its difficult enough to get the code itself to be read. In Engineering Paradise, I suppose the requirements are formally linked to design features, and features to data and code that operates on that data, to create the feature.

2.3) Static analysis. Besides passing compiler checks without errors or warnings, there are static analysis tools, “lint” for example, that can inspect code for consistency with best practices and deterministic operation. Coverity, and others, have commercial products that do static test on source code.

2.4) Linking, loading. The final static events are linking the code and libraries required to complete the application, and writing a usable file for the executable, which the loader will load.

Dynamic Software Test Methods:

2.5) Memory access / leakage software test. Rational/IBM’s Purify, like ValGrind and BoundsChecker, run an instrumented copy of the source code under test to see memory problems in a dynamic environment. Its run and the results should be checked and responded to before a large investment in further  Dynamic testing should happen.

2.6) Performance test. Measuring resources consumed, obviously time, possibly others, during repeatable, usually large-scale, operations, similar to System or Load tests. Generic data, from development testing, is necessary and may be shipped as an installation test to users. Proprietary data, under a NDA (non-disclosure agreement), may also be needed, for complex problems ans/or important customers. In normal operation, the actual outputs are not looked at, at most, spot-checked, and the tool(s) keeping track of resources are the basis of pass/fail.

2.7) Installation Test. Typically a subset of in-house performance tests, with optional, generic, data. The performance recorded is comparable between releases, instances, configurations, sites, customers, and the software maker’s own in-house performance tests. Customers can use Installation tests to verify their hardware/software environment, benchmark it, evaluate new purchases for their environment, etc.

 

Checking Program Output Methods:

After tool based dynamic testing, the rest of Dynamic software test is based on running the product with specific inputs and checking the outputs, in detail.

Checking can be done with with exit status, stack traces,”assert()”, exceptions, diffing large output files against ‘gold’ references, log searches, directory listings, searching for keywords in output streams indicating failure or incorrect operation, checking for expected output and no other, etc. No test failures are acceptable. Each test must be deterministic, sequence independant, and (ideally) can run automatically. No judgement required for results. All require running the program.

2.8) Unit tests of pieces of the a product, in isolation, with fake/simulated/mock resources. A great bottom-up tool for verifying software. At the unit test level is where knowledge of the code is most important to testing. It is white box/clear box, with full insight into the code under test. One explicit goal of unit test should be forcing all branches in the code to be executed. That can’t be done without allowing visibility into the code.

2.9) Integration Test. The next level above unit test, the tests of code which calls code which calls code… and the code above that! The point is that integration is where code from different groups, different companies, different points in time, certainly different engineers, comes together. Misunderstanding is always possible. Here’s one place it shows up. Visibility into the code is getting dimmer here. Some tests are more functional, if a subsystem contains complete, requirement-satisfying functions.

2.10) Functional Test. Verifying that the product will do functions which it is intended to do. Play, rewind, stop, pause, fast forward. +, -, x, /, =.  Tests here should be drawn from the Requirements documents. Things that should be tested here should start in the Requirements docs. Each requirement has to be demonstrated to have been met. Its black-box testing, run from the interface customers use, on a representative host, with no insight into the internals of the product. Unless the requirements specify low level actions.

Its not particularly combinatorial- a short program, a long program, 2+2, 1/-37. Pat head. Rub belly. Walk, Not all 3 at once.

If a word-processor has no stated limit for document size, you need to load or make a really big file, but, truly, that’s a bad spec. A practical limit of ‘n’ characters has to be agreed as the maximum size tested-to. Then you stop.

All these Tests should be drawn from the Requirements documents. Things that should be tested here should start in the Requirements docs.

All that Verification is good, but what about Validation?

Unit test,  Integration test, or Functional Test, is where Validation, proving correctness of the design, might happen. Validation test is where deep algorithms are fully exercised, broad ranges of input are fully exercised, Tests that include all possible numerals, all possible characters, all defined whitespace, read in or written out. Numbers from MinInt to MaxInt, 0 to MaxUnsigned, the full range of Unicode characters, etc., etc., are exercised.

(Errors in input numbers should be seen in System test anyway, but accepting a wide range goes here.) This is not always done very formally, because most modern code environments don’t need it. But someone ought to look at least once.

L10n (Localization) and I18n (Internationalization) that need to be selected at link time or run time can be checked here too.
This is also where path-length limits, IPv-6 addresses, etc. should be checked.

2.11) User interface test verifies the controls and indicators that users at various levels see, hear, touch, operate and respond to. This is separate from any actual work the program may do in response. This is a high-value target for automation, since it can be complex and tedious to do UI testing in great detail by hand.

2.12) System Test. Full up use of the system. Training, white-paper and demo/marketing examples. Real-world situations reproduced from bugs or solutions provided for customers. Unless requirements included complexity, this is where the complex tests start. Huge data. Complex operations.  The range of supported host configurations, min to max, gets tested here too.

We’ll want to see all the error messages, created every possible way. We’ll want to have canned setups on file, just like a customer would, and we pour them into the product, run it, and collect the output. The set pass/fail on the output.

Somewhere between System Test and Acceptance test, the scale of pass/fail goes up another level of abstraction. Software test pass/fail results are one in the same with the product pass / fail. If data and setup are good, it should run and pass. Ship the result. If the data and/or setup have a problem, it should run and fail. The failure should propagate out to be stored in detail, but in the end this is a trinary result. Pass, Fail, Not Proven

2.13) Load test, Stress test.  Load tests go to the point that all of a resource is consumed, and adding  more activity produces no more output in real time. Resources include CPU, memory, local storage, networked storage, video memory, USB ports, maximum number of users, maximum number of jobs, maximum instances of product, etc. Stress test adds data, jobs, etc, clearly (110% or more) above load test maximum.

2.14) Stability test. Long term test. Stability test and long-term test are where a server or set of servers are started and left running, doing real work, for days, weeks, months. Some of the tests must repeat inputs and expect identical outputs each time.  Resource consumption should be checked. Its fair for the application or tool to have the node to itself, but adding other applications and unrelated users here and in the Load/Stress tests is meaningful, to avoid surprises from the field.

2.15) Acceptance test.  Customer sets-up their run-time world use of the system and uses it. Everything they would normally do. If its a repeat sale, they may just clone the previous installation. Run the previous and the new system, release, patch, etc, and compare output to installed software on machines that customer likes and trusts. If the product is a new one, acceptance means judging pass-fail from the output produced.

 

Many other kinds of test are mentioned in conversation and literature. A web search will turn up dozens. Regression test, stability test, in the sense that a new code branch is stable, sanity test and smoke test are all forms of testing but usually, in my experience, consist of subsets of the test levels/methods listed above.

A Smoke test (run the product, make sure it loads and runs, like a hardware smoke test where you apply power, turn it on and see if any smoke comes out…) can be made from the first steps of several different methods/levels named above. If the Smoke test is more than simply running the program once, then it should probably be some part of one of the other methods/levels. Or to put it another way, the work that goes into setting up the smoke test should be shared/captured. There might be a ..test/smoke/… directory, but the contents should be copied from somewhere else.

A Sanity test, a Stability test and Regression tests are successively larger swaths, at lower and lower levels, of the System, Performance, User Interface, Functional, etc. tests. They should be specified and are not embarrassing, but their content should be drawn from or reflected by those larger level-based tests. The should not be original and alone.

What do you think?

“Testing – How does one learn QA?” – An answer I posted on the StackOverflow “Programers” forum


Ziv, the questioner asks: ” … how would one proceed if he wants to learn QA?

More specifically, a programmer who wants to learn about the QA process and how to manage a good QA methodology. I was given the role of jumpstarting a QA process in our company and I’m a bit lost, what are the different types of testing (system, integration, white box, black box) and which are most important to implement first? How would one implement them?

I wrote:

There are simple rules of thumb.

Try what the manual says. Install and run on a clean target, user license, the works. Does it work? Did you have to add anything not covered in the manual?

Are all the default control values usable? Or is there something that’s wrong, or blank, by default and always has to be changed?

Set every value in the user interface to something other than its default. Can you detect a difference caused by the change? Is it correct? Do them one at a time, or in the smallest sets possible, to make the results clear.

Set every value in the user interface to a second, non-default, value. Change everything at once. Can you detect the difference? Is it correct?

One by one, do something to cause every error message to be generated. Do something similar, but correctly, so that no error message is generated.

All of the above depend on changing a condition, between an “A” case and a “B” case, and that change having a detectable result. Then the “C” case produces another change, another result, and so forth. For 10 tests, you need 11 conditions. Using defaults as much as possible is a good first condition.

By now you’ve got a list of things to test, that you recorded, and results, that you recorded, and maybe some new bugs. Throw something big and complicated at the solution. Give it a file of 173000 words to sort, paste a Jane Austin novel or some telecommunications standard 100 pages long, a 50MB bitmap graphic, 3 hours of streaming video. Open the performance monitor and get CPU-bound, or I/O bound. For an hour. Check memory use: always increasing? Rises and falls?

Take the list of bugs closed in the last week, month, sprint, etc. Check them. All. Are they really fixed?

Keep track of what to do, how it worked on what version/release/build/configuration, open and closed bugs, what controls have been set or changed, what data, test files or examples have been used, etc. is all part of Quality world. Keep results as tables in a spread sheet, make version controlled backups / saves.

Someone writing software, or any one creating anything, has an idea of what they’re trying to make. The quality process starts with expectations. Requirements, specifications, rules, or another articulation of what’s expected. Then there’s the solution, the thing offered to perform, assist, enable or automate what’s expected. Then there are tests, operations, examples, inspections, measurements, questionnaires, etc., to relate one or more particular solution(s) to (relevant) expectations. Finally, there’s an adjustment, compensation, tuning, correction or other positive action that is hoped to affect the solution(s).

When one writes software, one has a goal of it doing something, and to the extent that’s expressed, the behavior can be checked. Hello.exe displays “Hello World” on a screen. “2**150″ in the Python interpreter displays, “1427247692705959881058285969449495136382746624L”. Etc. For small problems and small solutions, its possible to exhaustively test for expected results. But you wouldn’t test a word processor just by typing in some words, or even whole documents. There are limits of do-ability and reason. If you did type in all of “Emma” by Jane Austin, would you have to try her other four novels? “Don Quixote” in Spanish?

Hence an emphasis on expectations. Meeting expectations tells you when the solution is complete. My web search for “Learn Quality Assurance” just returned 46 million potential links, so there’s no shortage of opinions. Classic books on the subject (my opinion, worth what you paid for it:) include

  • Quality is Free” by Philip Crosby,
  • Zen and the Art of Motorcycle Maintenance” by Robert Pursig
  • Managing the Software Process” by Watts Humphrey
  • “The Mythical Man Month” by Fred Brooks
  • Code Complete” by Steve McConnell

Take 5 minutes to read some of the Amazon reviews of those books and you’ll be on your way. Get one or more and read them. They’re not boring. Browse ASQ, Dr. Dobbs, Stack Overflow. Above all, just like writing software. DO it. Consider the quality of some software under your control. Does it meet expectation? If so, firm hand-shake and twinkle in the eye. Excellent!. If not, can it be corrected? Move to the next candidate.

I like the Do-Test-Evaluate-Correct loop, but its not a Universal Truth. Pick a process and follow it consciously. Have people try the testing, verification and validation steps described in the language manual they use most frequently. Its right there on their desk, or in their phone’s browser.

Look at your expectations. Are they captured in a publicly known place? With revision control? Does anyone use them? Is there any point where the solutions being produced are checked against the expectations they are supposed to be meeting?

Look at your past and current bug reports. (You need a bug tracking system. If you don’t have one, start there.) What’s the most common catastrophic bug that stops shipment or requires an immediate patch? Whats the most commonly reported customer bug? What’s the most common bug that doesn’t get fixed?

Take a look at ISO 9000 process rules. Reflect on value to your customers/users. Is there’s a “customer value statement” that explains how some change affects the customer’s perception of the value of the solution? How about in the requirements?

By “the QA process”, you could mean “Quality Assurance”, versus “QC”, “Quality Control”? You might start with the http://www.ASQ.org web site, where the “American Society for Quality” dodges the question by not specifying “Control” (their old name was “ASQC”) or “Assurance”.

Quality; alone, “assured” or “controlled”, is a big idea with multiple, overlapping definitions and usages. Some will tell you it cannot be measured in degrees- its present or not, no “high quality” or “low quality” for them. Another famous claim is that no definition is satisfactory, so its good to talk about it, but avoid being pinned down in a precise definition. How do you feel about it?

 

The original posting is at http://programmers.stackexchange.com/questions/255583/how-does-one-learn-qa/255595#255595

A sad, harsh, somewhat funny (ironically) reminder that all is not well, and never will be. Sheesh.


A reader offered the following comment, on a small technical article, here. I checked out his site and its pretty consistent. I choose to regard it as a cry for help, but that could be overgenerous. Count your blessings, friends. The power of youth and confusion are always around the corner…  and yet, I neither wanted to just delete this or pass it along as sent, you know?

[Edited (###### added) by me]

 

New comment waiting approval on bill abbott’s weblog

###########  commented on SPOILER ALERT! More about the NetBeans Anagram game

If you want to read about the NetBeans Anagram.java program but do NOT want a hint, just scroll down past this post to the one …

f##k you bill, you suck, monkey

Approve  Trash | Mark as Spam

More information about ########

IP: 83.244.###.### 83-244-###-###-cust-83.exponential-e.net
E-mail: #############@gmail.com
URL: http://sosickofthislife.blogspot.com
Whois: http://whois.arin.net/rest/ip/83.244.###.###

Image

Father’s day tides at Moss Beach:


Here’s the tide table for this coming weekend at Moss Beach, just north of Princeton By The Sea, at the north edge of Half Moon Bay. High tide, +6 feet, at Midnight between Friday and Saturday, 1:00am between Saturday and Sunday. Low, low, tides at 7:00am, -1.5 feet!! on Saturday, -1.25 feet, at 7:48am, Sunday.
So, by crackie, we’ll be there as early as we an on Sunday. Sunrise is before 6:00am, so no shortage of light. Do a web search and you’ll discover this place has the best tidepools that ever existed- perhaps 1/4 mile or more along the coast, as much as 200 yards off shore of the normal high tide mark. A huge shelf of very low quality rock, normally around or perhaps a bit below the 0 foot level, that will be a good foot above sea level on Sunday Morning.

Ha! I now have a trivial python program the works interactively! momdad.py:


“””
“””

import os

print “os.listdir(os.getcwd())”
print os.listdir(os.getcwd())

for fname in os.listdir(os.getcwd()):
print fname
text=open(fname).read()
print “text.count( hi mom )”
print text.count(‘hi mom’)

Link

A philosophic reason to sets pointers to NULL after free’ing them.


A philosophic reason to sets pointers to NULL after free’ing them.

On reflection, I think I got this one right, and I kinda like it! I have experienced other people’s double-free errors, surprisingly, but the value of crashing-on-access-via-null is persuasive to me. Making double-free happen silently and without error is a small cost for positively crashing on a rogue access through a freed pointer.

Hawker Hurricane Mk I, 4/27/1939-6/5/1940 dH 2-position prop, “B” pattern camo, starboard profile, 1/2 white underside, v.10


<Hawker Hurricane Mk I, 4/27/1939-6/5/1940 dH 2-position prop, "B" pattern camo, starboard profile, 1/2 white underside, v.10

Mid-production Hawker Hurricane, Mk I, with Rotol constant-speed prop and a Spitfire spinner (the dedicated Hurricane spinner arrived in late 1940). This is what one would look like if it was painted all together, and they stopped before applying the national and service markings. Just “B” pattern camouflage on top, Dark Green and Dark Earth. Starboard side white, port side black, underneath, meeting at the centerline. Nothing of the original “Aluminum” finish remains outside, but the wheel wells and inside of the undercarriage doors might well be “Aluminum”, still.

The “A” and “B” camouflage patterns were mirror images, so this starboard, “B” scheme, is the same as a port-side “A” scheme, except reversed left to right
The black/white underside recognition features was ordered in August 1938. While the requirement for black under the port wing was made known fairly readily, how to treat the starboard wing, horizontal stabilizers and fuselage underside was somewhat to very unclear between 8/38 and 4/39. The intent was to have the port side black up to the center of the fuselage, and the starboard side white, up to the center of the fuselage, as this drawing and its companion show.