Posted by Ray Poynter, 4 May 2019
This question has become a hot topic in a forum that I am a member of, and this post is based on my comment posted in that group. Most of the comments in that format relate to the practical problems of screening potential qual participants via an extensive set of questions which then have to be interpreted by a computer. Many researchers would prefer to talk fresh participants, but from a practical point of view, the algorithm route tends to favour the use of databases and the recruitment of professional qual participants. However, I have a second, and I think equally strong, reason for distrusting the algorithms when recruiting for depths, groups, ethnography etc.
I spend quite a bit of my time creating algorithms to help people identify which segment somebody is in. When done well they are great for their proper purpose, which is to divide a large number of people into segments so that 70%-90% are in the right segment. This is the basis of Golden Questions that are used with trackers, large quant studies, and real applications like service delivery (where the large number might be millions not 1000s). The key need in these cases is to have something that can be automated.
However, the algorithm is not the cluster and the algorithm is not a way of finding people who are typical of a cluster. As well as the 10%-30% who are allocated to the wrong cluster (usually to an adjacent cluster) there are many people who are only marginally in that cluster. Remember, when dealing with attitudes and behaviour the clusters we create do not really exist, they are useful groupings of people, but there are thousands of alternative groupings that would be equally ‘valid’. Some people lie near the middle of a cluster and are quite different from the centre of the other clusters. But, some people lie on the border between one cluster and another, they will be closer to people in another group than they are to the middle of their group (but they are closer to the centre of their group than to the centre of any of the other groups).
In technical terms, cluster membership is a latent structure (i.e. cannot be directly measured) so we approximate it by using manifest variables (things we can measure). For example, intelligence is a latent structure, we can’t properly define or measure intelligence, so we use IQ tests as a method of approximating how intelligent somebody is. If you have an algorithm that is 80% accurate and you apply it to 100,000 people, it will put about 20,000 of them in the wrong group, and that will normally NOT be a problem. Most people will get the right mailing or service option, and most people you offer the service to will be relevant.
However, if you have an algorithm that is 80% accurate and you use it for qual, then you can get problems. For example, in a group of 8 people you might get 8 that are actually in the cluster (by chance), but on average you will get 6 or 7 that are from that segment, and 1 or 2 that do not match it. This matters even more if you are doing 10 depth interviews, because on average 2 of them will be with the ‘wrong’ people. However, it can be much worse, sometimes (because of how the random distribution function works) you will get only 1 or 2 people who are actually in the cluster, and they could be outliers in the cluster.
For most qual purposes, the client would be better off describing in words the sort of people who typify a cluster. For example, young couples looking to buy a first home, both working, some savings but still taking holidays abroad at least once a year, already looking at properties and absorbing home-buying/improving media. Rather than trying to apply algorithms which are hard to apply and will not necessarily generate participants who are typical of the segment.
You might also want to push these verbal descriptions to get people who are ultras in this group (e.g. young couples looking to buy a first home, both working, have a deposit, have spoken to a mortgage advisor, have viewed at least one property, have had an overseas holiday and subscribe to some sort of home buying / improving media). From a recruiters point of view, instead of applying an algorithm to hundreds of people to find 8 people, they can identify that they need to contact young couples, looking for a home, who have had at least one meeting with a mortgage provider as their first cut, before refining the recruitment using other criteria.
One partial exception relates to online communities. If your client has a community of, say, 10,000 people, all of whom have answered the Golden Questions, then this can provide a good start point for recruiting people for qual – but even then I would want to check that the people found by the algorithm were sufficiently typical. Remember, the algorithm will normally allocate everybody to one of the clusters, but for a qual interview you only want people who are typical of the cluster or of a sub-group within the cluster.