Gauging Software Readiness
With Defect Tracking
In the competitive commercial software market, software companies feel compelled to release software the moment it is ready. Their task is treacherous, treading the line between releasing poor quality software early and high quality software late. A good answer to the question, "Is the software good enough to release now?" can be critical to a company’s survival. The answer is sometimes based on gut instinct, but several techniques can put this judgment on a firmer footing.
One of the easiest ways to judge whether a program is ready to release is to measure its defect density—the number of defects per line of code. Suppose that the first version of your product, GigaTron 1.0, consisted of 100,000 lines of code, that you detected 650 defects prior to the software’s release, and that 50 more defects were reported after the software was released. The software therefore had a lifetime defect count of 700 defects, and a defect density of 7 defects per thousand lines of code (KLOC).
Suppose that GigaTron 2.0 consisted of 50,000 additional lines of code, that you detected 400 defects prior to release, and another 75 after release. The total defect density of that release would be 475 total defects divided by 50,000 new lines of code, or 9.5 defects per KLOC.
Now suppose that you’re trying to decide whether GigaTron 3.0 is reliable enough to ship. It consists of 100,000 new lines of code, and you’ve detected 600 defects so far, or 6 defects per KLOC. Unless you have a good reason to think that your development process has improved with this project, your experience should lead you to expect between 7 and 10 defects per KLOC. The number of defects you should attempt to find will vary depending on the level of quality you’re aiming for. If you want to remove 95 percent of all defects before shipping, you would need to detect somewhere between 650 and 950 pre-release defects. This technique suggests that the product is not quite ready to ship.
The more historical project data you have, the more confident you can be in your pre-release defect density targets. If you have data from only two projects and the range is as broad as 7 to 10 defects per KLOC, that leaves a lot of wiggle room for an expert judgment about whether the third project will be more like the first or second. But if you’ve tracked defect data for 10 projects and found that their average lifetime defect rate is 7.4 defects per KLOC with a standard deviation of 0.4 defects, you have a great deal of guidance indeed.
Another simple defect prediction technique is to separate defect reports into two pools. Call them Pool A and Pool B. You then track the defects in these two pools separately. The distinction between the two pools is arbitrary. You could put all the defects discovered on Mondays, Wednesdays, and weekends into Pool A, and the rest of the defects into Pool B. Or you could split your test team down the middle and put half of their reported defects into one pool, half into the other. It doesn’t really matter how you make the division as long as both reporting pools operate independently and both test the full scope of the product.
Once you create a distinction between the two pools, you track the number of defects reported in Pool A, the number in Pool B, and—here’s the important part—the number of defects that are reported in both Pool A and Pool B. The number of unique defects reported at any given time is:
DefectsUnique = DefectsA + DefectsB - DefectsA&B
The number of total defects can then be approximated by the simple formula:
DefectsTotal = ( DefectsA * DefectsB ) / DefectsA&B
If the GigaTron 3.0 project has 400 defects in Pool A, 350 defects in Pool B, and 150 of the defects in both pools, the number of unique defects detected would be 400 + 350 - 150 = 600. The approximate number of total defects would be 400 * 350 / 150 = 933. This technique suggests that there are approximately 333 defects yet to be detected (about a third of the estimated total defects); quality assurance on this project still has a long way to go.
Defect seeding is a practice in which defects are intentionally inserted into a program by one group for detection by another group. The ratio of the number of seeded defects detected to the total number of defects seeded provides a rough idea of the total number of unseeded defects that have been detected.
Suppose on GigaTron 3.0 that you intentionally seeded the program with 50 errors. For best effect, the seeded errors should cover the full breadth of the product’s functionality and the full range of severities—ranging from crashing errors to cosmetic errors.
Suppose that at a point in the project when you believe testing to be almost complete you look at the seeded defect report. You find that 31 seeded defects and 600 indigenous defects have been reported. You can estimate the total number of defects with the formula:
IndigenousDefectsTotal = ( SeededDefectsPlanted / SeededDefectsFound ) * IndigenousDefectsFound
This technique suggests that GigaTron 3.0 has approximately 50 / 31 * 600 = 967 total defects.
To use defect seeding, you must seed the defects prior to the beginning of the tests whose effectiveness you want to ascertain. If your testing uses manual methods and has no systematic way of covering the same testing ground twice, you should seed defects before that testing begins. If your testing uses fully automated regression tests, you can seed defects virtually any time to ascertain the effectiveness of the automated tests.
A common problem with defect seeding programs is forgetting to remove the seeded defects. Another common problem is that removing the seeded defects introduces new errors. To prevent these problems, be sure to remove all seeded defects prior to final system testing and product release. A useful implementation standard for seeded errors is to require them to be implemented only by adding one or two lines of code that create the error; this standard assures that you can remove the seeded errors safely by simply removing the erroneous lines of code.
A colleague of mine recently added several hundred lines of code to an existing program in one sitting. The first time he compiled the code, he got a clean compile with no errors. His initial coding appeared to be flawless. When he tried to test the new functionality, however, he found that it didn’t exist. When he reexamined his new code, he found that his work had been embedded between a preprocessor macro that deactivated the new code. When he moved the new code outside the scope of the macro, it produced the usual number of compiler errors.
With software defects, no news is usually bad news. If the project has reached a late stage with few defects reported, there is a natural tendency to think, "We finally got it right and created a program with almost no defects!" In reality, no news is more often the result of insufficient testing than of superlative development practices.
Some of the more sophisticated software project estimation and control tools contain defect modeling functionality that can predict the number of defects you should expect to find at each point in a project. By comparing defects actually detected to the number predicted, you can assess whether your project is keeping up with defect detection or lagging behind.
Evaluating the combinations of defect density, defect pools, and defect seeding will give you more confidence than you could have in any of techniques individually. Examining defect density alone on GigaTron 3.0 suggested that you should expect 700 to 1000 total lifetime defects, and that you should remove 650 to 950 before product release to achieve 95 percent pre-release defect removal. If you had detected 600 defects, the defect density information alone might lead you to declare the product "almost ready to ship." But defect pooling analysis estimates that GigaTron 3.0 will produce approximately 933 total defects. Comparing the results of those two techniques suggests that you should expect a total defect count toward the high end of the defect density range instead of the low end. Because the defect seeding technique also estimates a total number of defects in the 900s, GigaTron 3.0 appears to be a relatively buggy program.
The popular literature doesn’t have much to say about defect prediction. A notable exception is Glenford Myers’ 1976 book, Software Reliability (Wiley). Lawrence H. Putnam and Ware Myers discuss the specific topic of defect modeling at some length in Measures for Excellence (Yourdon Press 1992).