Code Coverage: A misleading metric

Code coverage is the measure of how much of your code is covered while running tests. You could of course also consider manual integration tests when measuring it but usually it means automated unit test.

So this is actually a pretty useful metric. I’ve worked in projects with close to 100% code coverage and projects close to 0%. Writing code in a project with high code coverage is great because you have a safety net when changing the code. This doesn’t mean you should analyze the impact of your changes but it gives more confidence to perform required code changes.

Test coverage is a useful for finding untested code. Analyzing coverage helps you find which pieces of code that aren’t being tested. And since every piece of code not covered by tests is a potential source of errors which go undetected and it is well known that the earlier a bug is found the cheaper it is to fix it, having 100% coverage really does sound great !

In a new project without tons of legacy see code I do expect to have a high code coverage. The problem with code coverage comes from the fact that your management may not only expect high coverage but actually require a certain level of code coverage. And whenever a certain level of coverage becomes a target, developers try to reach it. The thing is that it encourages them to bend over backwards by writing poor tests just to meet the goal. It leads to writing tests for their own sake.

It is actually easy to write tests that cover code without actually checking its correctness. Just write unit tests covering the code but not containing any asserts. Obviously, code coverage does not tell you what code was tested but only what code was run. So high coverage numbers don’t necessarily mean much. High code coverage does tell you that you have a lot of tests. But it doesn’t really tell you how good your code is. Your code can be full of bugs and still you could have 100% test coverage. Pure coverage figures don’t tell you how important this code is, how complex it is, nor what’s the quality of the tests. And a high code coverage can only lead to good code if the tests you run are good. With good tests, high coverage can only be achieved with error free code. But with poor tests even crappy code can make it to 100% coverage.

So having 100% test coverage (or anything close to it) gives a false sense of confidence and of the robustness of a project. That’s why you should not make 100% test coverage the focus. Mandating a minimum code coverage under which your coverage is so bad that your automated tests are really helpful is probably fine but defining a high target test coverage threshold is useless.

Another issue with a forced march to high code coverage is that writing many tests for checking a code base which is poorly designed will likely increase the cost of changing the design later on. If you experience that simple changes to code cause disproportionally long changes to the tests i.e. if you need to fix numerous tests for each change you make in the code, it’s either a sign that there’s a problem with the tests or that your whole design is shaky.

In such cases, more having tests is not always better. Having better tests would in fact be better. You need to keep in mind that every test you’re creating is a test that will eventually have to be maintained. You need to not only write more tests to increase code coverage but evaluate whether the maintenance cost related to these additional tests justifies writing them and whether the additional code coverage reached really leads to more quality. Too many tests can slow you down. Not because the tests take time to run. You can always move slow tests to a later stage or only run them periodically. But excess tests only written to satisfy a code coverage target will drive up the cost of changes both in money and time.

On the other hand,  even though low coverage does not automatically mean your code is buggy, low coverage numbers (e.g. way below 50%) are a sign of trouble. They are not a proof but a smell of bad quality. Especially, a combination of low coverage and high complexity is not a good sign. Also, a loss of code coverage over time is a sign that code modification are not properly reflected in the tests. But we should keep in mind that low automated test coverage does not imply that the software is untested. There are other ways to test your software. Only automated tests is never enough. Product testers not only test the code on the basis of formulated requirements but also test the product looking for requirements might not have been explicitly formulated (it more often the case that you have not identified all your functionality than you think), usability issues…

Many people focus on code coverage numbers because they expect to derive from it whether they are testing enough. But in many cases, you should not worry about code coverage but about writing good tests. Code coverage is great, but your end product will be measured by its functionality and its reliability, so functionality coverage is even better. Anyway unit tests should be meant to test functionality. They are low level tests but still they should test from the perspective of the required functionality. Actually if your code and functionality are not equivalent to a large extent, then you have bigger problems than what level of code coverage you’ve reached.So the goal should not be to have 100% code coverage, but to have unit tests testing the required functionality as completely and extensively as possible.

Despite all the points above against requiring a 100% (or a very high) code coverage, the code coverage metric can actually be useful when used properly. The wide-spread negativity towards theses metrics is often due to their misuse. 100% code coverage is meaningless without implementing other habits and practices which ensure code and tests quality. So a high code coverage is anyway just the beginning, it is a good starting point to covering the actual functionality but not a goal in itself.

Actually, you should see code coverage is a byproduct of well designed and written tests, not a metric that indicates the tests are well designed or well written. On the other hand, good code with a clear design is not only easier to read and less buggy but also easier to cover. So both good quality in code and tests will lead to a higher coverage. But requiring a high code coverage will not lead to more quality in code and tests.

A way to improve the expressiveness of code coverage is to combine its measurements with other measurements like complexity measurements, correlate it with information about the importance of certain parts of the code, incorporate information about bugs reported after release…

 

Leave a Reply

Your email address will not be published.