Why might a product test show a negative result?

Tens of thousands of new products are tested each year, as part of concept screening, NPD, and volumetric testing. Some products produce a positive result, and everybody is pretty happy, but many produce a negative result. A negative result might be that a product has a low stated intention to purchase or it might be that it fails to create the attitude or belief scores that were being sought.

Assuming that the research was conducted with a relatively accepted technique, what might the negative result mean?

A bad product/idea
One possibility is that the product is simply not good enough. This means that if the product is launched, as currently envisaged, it is very likely to fail. In statistical terms this is the true negative.

The false negative
The second possibility is that the result is a Type II error, i.e. a false negative. The product is good, but the test has not shown this. Designers and creatives seem to think this is the case in a large proportion of the cases, and there are many ways that this false negative result can occur.

The test was unlucky
If a test is based on a procedure with a true negative rate of 80% then one-in-five times a success will be recorded as a failure. A recent article in the New Scientist (19 October, 2013) pointed out that since most tests focus on minimising Type I errors (false positives) the typical true negative rate is often much less than 80%, meaning unlucky results will be more common.

The sample size was too small
If a stimulus produces a large effect, it will obvious even with a small sample, but if the effect is small a large sample is needed to indicate it, and if it is not indicated, it will typically be called a failure. For example, if a sample of 1000 (randomly selected) people is used, the result is normally taken to be +/- 3%, which means relatively small benefits can be identified. However, if a sample size of 100 is used the same assumptions would imply +/- 10%, which means effects have to be much larger to be likely to be found.

The description does not adequately describe the product
If the product description is not good enough, then the result of the test is going to be unreliable, which could result in a good idea getting a bad result.

The product or its use can’t be envisaged
Some products only become appealing once they are used, apps and software often fall into this category, but so do products as varied as balsamic vinegar, comfy socks, and travel cards (such as the London Oyster). Some products only become appealing when other people start to use them, as Mark Earls has shown in “I’ll have what she’s having”. Generally, copying behaviour is hard to predict from market research tests, producing a large number of false positives and false negatives. In these cases, the purchase intention scale (and alternatives such as predictive markets) can be very poor indicators of likely success.

In many cases people may be able to identify that they like a product, but are unable to reliably forecast whether they will actually buy and use it, i.e. they can’t envisage how the product will fit in their lives. For example, I have lost count of the number of holiday locations and restaurants I have been convinced that I would re-visit, only to be proved wrong. This is another situation where the researcher’s trusty purchase intention scale can be a very poor indicator.

The wrong people were researched
If the people who might buy the product were not researched, then the result of the test is unlikely to forecast their behaviour. For example, in the UK, energy drinks are less about sports people than office workers looking for a boost. Range Rovers are less for country folk than they are for Londoners.

So, how should a bad result be dealt with?
This is where science becomes art, and sometimes it will be wrong (but the science is also wrong some of the time). So, here are a few thoughts/suggestions.

If you expected the product/concept to fail, the test has probably told you what you already knew, so it is most likely safe to accept the negative finding.
If you have tested several similar products, and this is the one of the weaker results, it is probably a good indication the product is weak.

In both of these cases, the role of the modern market researcher is not just to give bad news, it should also include suggesting recommendations for what might work, either modifications to the product/concept, or alternative ideas.

If you can’t see why it failed
If you can’t see why a product failed, try to find ways of understanding why. Look at the open-ended comments to see if they provide clues. Try to assess whether the idea was communicated. For example, did people understand the benefits, and reject them, or not understand the benefits?

Is the product/concept one where people are likely to be able to envisage how the product would fit in their life? If not, you might want to suggest qualitative testing, in-home use test, or virtual reality testing.

Some additional questions
To help understand why products produce a failing score, I find it useful to include the following in screening studies:

• What sort of people might use this product?
• Why might they like it/use it?
• What changes would improve it for these people?

Related