Metrics & Measuring Performance in QA

I’ve always thought of metrics in QA as a bit of a tricky subject, as I find it difficult to identify and attach meaningful numbers to performance in a role based around providing information.
Technically, there is stuff we can quantify, but I’m dead against keeping track of statistics like personal bug counts, numbers of tests executed and so on. They bring about pointless bugs, endless raising of non-issues, and underhanded tactics all over the place, so they don’t give any true measure of an individual’s performance in the QA field. As I mentioned, the real measure of QA’s effectiveness and value is in the information they provide to their customers – The rest of their development team, the product and business teams they work with, and indeed, anyone else who is a stakeholder in the work the team carries out.
Even when trying to compare the performance of one person against another, the nature of QA means that, due to different pressures, time constraints, relative state of the system under test and so on, you will never get to see different folks running the same test in exactly the same set of circumstances. So, it’s unfair to consider using that sort of thing as measure or comparison of performance too.
But I do understand the need to monitor performance, particularly for new hires or new additions to a team, and there are a few things I use to measure the performance, throughput and relative value of folks in QA. While many of these metrics are geared towards the performance of new members of a team, they could easily be adapted to track the progress and performance of established team members too.
Bug Quality
The general quality of bugs raised should be spot checked, with closer attention being paid to bugs raised due to issues missed in testing (indicative of areas where testing and detection methods should be improved) and bugs returned as ‘Will Not Fix’ (indicative of areas where priorities and understanding of requirements / product needs / customer needs should be improved). For new hires, I’d expect numbers of such issues to decrease over time as the QA ramps up in their new domain. Also, keep an eye open for any issues which have failed to detect or report incorrect system behaviour, have been assigned an inappropriately low priority, or that otherwise understate the significance of a problem. These will highlight areas where coaching is required to improve understanding of the system under test.
Critical Bugs in Test vs Production
Keep an eye on the ratio of critical bugs (>=P2) raised in Test vs Production. Customer satisfaction is the true North of quality, and if there are more than a handful of instances of critical bugs being identified post-sprint, this could be indicative of a coaching need.
Test Coverage for Applications
Whenever a new hire fills a vacancy, I’d expect to see an increase in test coverage over time. Establish the current base line as the areas the team currently cover, and track this for increases. But, importantly, you must track for increases in areas where increases are expected. Don’t forget that, particularly with automation, there are upper limits for test coverage, so don’t make the mistake of setting a coverage target without first discussing and identifying the areas it is actually possible to cover. And of course, as well as the expected coverage levels, any timescale implemented must also be realistic, or you risk setting an unachievable target.
Load Shift
When a new QA comes on board, overall team output should increase as the new member of the team ramps up and takes on more of the testing load. This one is a bit arbitrary, and not entirely dependent on the new QA, but it’s still worth monitoring as an identifier for potential issues and bottlenecks in your team’s workflow, as well as the performance of QA.
Overall increase in Story Turnaround & Completion
Keep track of the team’s commitments for each sprint, and of how many of those commitments were delivered with a high standard of quality. Again, this isn’t always going to be directly in the hands of QA, but where a team has recently filled a vacancy, I’d expect a month-on-month increases in the number of stories committed to, percentage of commitments met, and an increase in the speed with which stories are completed. Take the current averages as a baseline, and monitor for the expected increases.
Not a ’numbers’ metric, but arguably the most important one. Are the QA team making meaningful contributions to Retrospectives? Planning & Estimation? How are they communicating the information they’re finding during the course of their work? For new team members, I’d look for their engagement to increase as they ramp up in their new domain and adapt to the team and company culture. But as QA professionals bringing a fresh pair of eyes to the team, I’d expect there to be some level of insight and engagement from the very beginning. I’d also expect the QAs to be actively involved in the solutions to any bugs / issues they raise – So, conferring with developers working on solutions, discussing how the fix should be retested etc. to improve their knowledge of the system under test and its workings.

Regardless of how you decide to measure performance in QA, it is worth remembering that any metric should be used as an informational tool rather than any kind of absolute measure. The reality is that there is no substitute for getting to know what your folks are doing, the problems they encounter, how they handle those problems, and how they communicate with the people around them. These are the things that the team and your customers will be assessing their performance on, and the truest measure of success is also the most simple – ‘Is the customer happy?’

Exploratory Testing ≠ Random. Exploratory Testing = Chaos.

If you’re describing exploratory testing as anything like ‘Randomly poking about in the corners of the system’, you’re using the wrong language.

I first published this article on my LinkedIn profile about a year ago, but I’m so proud of it, I thought it was worth republishing on my own site.

Many times, I’ve heard exploratory testing described using terms alluding to randomness. I’ve almost certainly been guilty of it myself. But on reflection, this is a stance that I wholeheartedly disagree with.

I’ve mentioned on a few occasions that I feel exploratory testing is a highly specialised skill. It should be the largest and most important part of a manual testing role, and it dovetails with standardised automated checks of the system under test to provide a true assurance of quality. So if you’re describing exploratory testing as anything akin to ‘Randomly poking about in the corners of the system’, you’re using the wrong language. You’re giving the impression that you’re only going to find issues by dumb luck, and you’re doing both yourself and your profession a great disservice.

And really, any randomness in testing renders that test unreliable. There’s nothing worse than finding a big, nasty bug that seriously compromises the quality of your system and not knowing exactly how you got there. If you can’t recreate what you did, then you can’t fully prove the issue exists, you can’t systematically check the area around the bug to see if it occurs in just the way you’ve discovered or if there are other ways to trigger it, and worst of all, you can’t get anything done about it. Having a bug returned with a ‘Cannot Reproduce’ status is highly frustrating.

Software development, and especially exploratory testing, includes many elements of chaos in the mathematical sense. But it is very important not to incorrectly use chaos as a synonym for random. Edward Lorenz described chaos as: “When the present determines the future, but the approximate present does not approximately determine the future.”

In our industry’s terms, this means that the user journey through a single path of the system may be a complicated one with a large number of variables along the way. But it is still a deterministic practice where the pattern for each variable can only evolve within a limited scope, making it possible to predict the end state of the system, rather than a stochastic practice where outcomes are truly random. Therefore, with good understanding of the system under test, well written acceptance criteria, and proper insight into the path the developers will take to meet those criteria, a tester should be able to predict how the system will be affected by a change, precisely determine the automated checks they’ll need to run, and the areas they will need to focus on during exploratory testing.

One of the most often used examples of chaos theory is that of a butterfly flapping its wings in Brazil causing tornadoes in Texas. This is very poetic, and is almost entirely incorrect.

Chaos theory actually states that a butterfly flapping its wings can change the course, accelerate, decelerate, or otherwise greatly affect the outcome of a tornado already in effect. That man Lorenz again: “A very small change in initial conditions had created a significantly different outcome.” The butterfly does not create or power the tornado, but it causes a tiny change to the initial state of the system being examined, which cascades to incrementally bigger changes in subsequent events. It’s the theoretical flap that makes the difference between a breeze blowing itself out over the Gulf of Mexico, and an F5 tornado levelling Dallas.

It is therefore vitally important to understand that we work with the actual definition of chaos rather than the wider perception. The butterfly flap of a change to the search mechanics of a retail site won’t directly cause tornadoes in the payment handling system (Unless the two systems are somehow intrinsically linked – but I bet you a penny that they’re not), so there’d be no point in exploring the payment system during testing of the change to the search mechanics.

But, for example, if that change is not explored properly and issues are not pinpointed, a quirk in the requests and responses made as a result of the search could go unnoticed, leading to the service returning a search results page with incomplete or incorrect information, which will in turn lead to a downturn in traffic going to the payment handling system and a loss of revenue. A tornado, which could have been predicted and prevented, has hit.

So with chaos theory stating that even a small change to an initial state can yield catastrophic results down the line, it makes sense to explore the area where the cause of any issues will originate first, and move on from there. Examine the movement of butterfly’s wings to determine the actual path and end state of the event, rather than finding yourself picking through the rubble that used to be Cowboys Stadium, cursing all Lepidoptera.