Posted by Ray Poynter, 30 January 2020
Today, I read two separate posts about what a nonsense rating scales are. In one of them the author of the post talked about how they recently bought some hiking boots. They wrote about how they looked and compared the weight of the shoes, they looked at the tread on the sole, they looked at the attractiveness of the shoes, they tried some boots on and ended up buying a pair. They made the point that at no stage did they rate any of the shoes on a scale 1-5 or 0-10 in terms of weight or attractiveness or quality of tread.
However, they actually described exactly why researchers use rating scales and why they sometimes work. As a shopper, this person looked at the weight of several shoes, some were much too heavy, some a bit too heavy and some about right. As researchers we want to know which shoes were too heavy, a bit too heavy and about right. We want some sort of guidance as to what too heavy means.
We could ask the person to use complex language, via a discourse, to describe what they saw, and in a qualitative interview with a small number of people that is viable. But, with a large number of people we would get a lot of don’t knows and conflated/confused language which at present we can’t turn back into meaningful metrics.
We could ask the person to rank the boots, but that creates two problems. Firstly, if two shoes were exactly the same weight, or had exactly the same tread on the sole, ranking would require them to rank one above the other. Secondly, imagine there are 4 shoes, two of them weigh way too much, one is a bit heavy and one is about right. If we ask the participant to rank them we end up with an inference that the gap between the two much to heavy shoes is of the same importance as the gap between the heavy shoe and the almost right shoe, and that gap is almost the same as the gap between the almost right shoe and the perfect shoe. The perceptions the buyer has about the boots is more than ordinal, there is a sense of some of these differences having different magnitudes.
So, back in the 1930s psychometricians like Rensis Likert realised we could ask participants to approximate their inner feelings and judgements by using scales. Nobody believed people had scales inside their head, or that they explicitly used scales when making decisions. The scales are a measurement instrument which allow researchers to get an approximation of how much people feel, for example, the different shoes weigh. When we sum these approximations, we get data that allows us to relate the perceptions of weight, tread, attractiveness, price etc to the purchases made.
Manifest scales and latent variables
The traditional way of looking at the rating scale process is to think in terms of latent variables. A latent variable is a something that exists, but which cannot be directly measured. A good example is intelligence, we are pretty sure it exists, but it cannot be directly measured. In order to gain an approximate measure of intelligence we measure other things, which can be measured, and combine these together. In the case of intelligence, these other things tend to be a combination of scales looks at things such as verbal comprehension, working memory, processing speed etc. The things we can measure we call manifest variables.
As another example, consider somebody trying to work out what sort of animals visit their garden during the night. They can’t see during the night, but during the day they might look for footprints, fur or feathers, droppings, scratch marks etc. The visible signs are the manifest variables and the animals that caused them are the latent variables.
When researchers using rating scales to understand why and how people make decisions the scales are the manifest variables (they are used because they can be measured) and the underlying perceptions and motivations are the latent variables. We are only interested in the scales because they help us approximate the things we are seeking to study, and because the things we wish to study can’t be measured directly.
Why ask me about attributes I don’t even consider?
Some people complain that rating scales measure things that don’t matter. For example, a participant who is not interested in the weight of boots is asked to assess them on a scale that runs, for example, from Light to Heavy. That is because some people are interested in factors that correlate highly with weight. In order to measure the different, common underlying, latent factors we need to include all of the scales that are relevant. This means that for any given participant there will be some scales that are not relevant, something which the analysis will reveal. If a key factor is ignored (i.e. if rating scales that can be used to approximate the latent factor are not asked), the results will be error prone.
When does scaling work?
- When the scales have been properly designed and researched.
- When participants can convert their ‘feelings’ into values – for example assigning a boot to a position on a scale that runs from Very Light to Very Heavy.
- When the purchase is related to the participants feelings/perceptions about the product.
When does scaling work less well?
- When there are scales missing.
- When the scales are not worded in ways that make sense to the participants.
- When we ask people for views about products that they have no views about (i.e. we collect noise).
- Where the purchase does not relate to the buyer’s perceptions. For example, I looked at all the boots, but in the end they only had one pair in my size. I did not like them, I preferred others, but had to buy them as I needed them that day.
The link between ratings scales and Behavioural Economics
The key message is that we do not use rating scales to measure brand, product or service perceptions because people use rating scales. We use rating scales because they are a psychometric research instrument that can under the right circumstances allow us to glimpse the feelings and heuristics that people use to make decisions.
Behavioural Economics, through people like Daniel Kahneman, has shown us that most decisions are emotional not ‘rational’. These decisions are made by people’s feelings, which are in turn are manifested by heuristics. When done well, the rating scale process allows researchers to measure System 2 at work. Remember, the values that somebody ascribes to the a boot when answering a questions about ‘Heavy’ is not necessarily based on how many grams it weighs, it will be about how heavy the boot looks, if it is picked up, how it compares to others that were picked up, and other aspects of weight.