Published by Ray Poynter, 8 August 2022
“What is the right sample size?”, this is a question I get asked quite often. In this article, I will address the question in the context of quantitative surveys. In a separate post, I will review the same question in the context of qualitative research.
The assumptions underpinning the question
This question, “What is the right sample size for a survey?” is based on two assumptions.
- If the sample size is too small, the results will be unreliable.
- If enough people are sampled, the right answer will be reliably found.
The first assumption is generally a good assumption, if the sample is too small, the results are likely to be unreliable. If the results are unreliable, there is a good chance they will not be valid.
The second assumption is not a safe assumption. There are lots of reasons why a survey might generate the wrong results, and the size of the sample is just one of them. In this post, I will talk about why this is not a safe assumption and then share some rules of thumb that are based on my forty-four years of experience in research.
Why a bigger sample does not always help
There are broadly three situations where a larger sample size is not as helpful as people might expect it to be:
- Where the sample is too unrepresentative of the population. For example, a sample of viewers of Fox TV would not be predictive of the political views of the US population, even if you were to take a sample of one million Fox TV viewers.
- Where the technique being used is not sufficiently accurate or reliable. For example, if, in a grocery shopping study, we ask people to rate the importance of factors such as cost, environment, ethics, convenience, and frequency of advertising, we will tend to get the wrong answers. Research has shown that a) people can’t access their own motivational structure and b) social desirability bias changes the answers from those the participant might believe to be their preferences. Making the sample larger does not change the error.
- Where the answer is unknowable. For example, if we show ten marketing concepts to participants and try to assess what will happen the following year when one of these concepts is used, we are likely to be wrong. The concept that will be launched next year will differ from the concept tested (it will be more finished and tangible), the world may have changed (rival campaigns, pandemics etc), the impact will be determined by distribution and marketing spend in addition to the content of the campaign, and we humans struggle to forecast what we will do in the future. A larger sample will not change the error.
For a fuller understanding of the situations where a larger sample size may not help, look at the literature on Total Survey Error. TSE shows that in a survey project, there are seven key types of error (Validity, Measurement, Processing, Coverage, Sampling and Non-response). Increasing the sample size only deals with Sampling Error, it does not help with the other six categories of Error.
When does a larger sample size help?
A bigger sample size helps when sampling error is a problem that needs tackling and where other problems (such as those mentioned above) have been accounted for.
Sampling error refers to the random errors that occur when we take a random probability sample from a population. The larger the sample, the more likely it is that the random sampling error is small. Statisticians use the term reliable to mean consistent (they do not use it to mean valid). A larger sample from the same source will be more statistically reliable than a smaller sample but not necessarily more valid.
If we are asking the right questions, to the right people, in the right way, if we process the answers correctly, and if the sampling approximates a random probability process, then a larger sample size will help. It will help because it will probably make the sampling error smaller.
The formal method for selecting a sample size
Research literature has a standard description of how the sample size should be selected. The researcher estimates what the standard deviation is going to be. The client is asked to select the size of error they are willing to accept (for example, plus or minus 3% or plus or minus 10%) The client is also asked how confident they want to be that the error is not going to be bigger than the level they have just selected (for example, 90% or 95% confident).
This gives us the formula n=(z2* 𝜎2)/e2, where e is the error the client is willing to accept, z is drawn from the normal distribution and relates to the probability of the error being even worse, and 𝜎 is the standard deviation of the population.
Rather than use the formula manually, there are a wide number of free online sample size calculators (which you can find via Google).
Two problems with the formal approach are 1) most clients don’t think in terms of acceptable error bounds and confidence levels, and 2) most samples are not true random probability samples.
The link between heterogeneity and sample size
While we are thinking about the formal theory, it is worth paying attention to the role of differences. If most members of a population are relatively similar, then a small sample has a good chance of representing that population. But if the population is quite varied, then a small sample is less likely to be accurate. To use maths for a moment, the standard error of the sample is given by 𝜎/Ön. What this means is that n gets bigger, the standard error gets smaller (because we are dividing by a bigger number). As the standard deviation gets bigger, the standard error gets bigger.
One way to make your research more reliable (consistent) is to use a more homogenous sample if possible. Depending on what you need to measure, you might find your study is more reliable if you select your population as ‘regular customers’ as opposed to ‘ever buy’, or vegans rather than combining vegans, vegetarians and pescatarians.
General steps for selecting a sample size
The researcher and the end user of the research need to work their way through the following.
- How accurate does the estimate need to be? For example, if we are looking at 10 new marketing concepts and we want to know which ones seem good and which seem rubbish, then we do not need a high degree of accuracy. If, on the other hand, we want to predict the number of customers who will churn if we increase the price by 10%, we might need to be much more accurate.
- How accurate can we be? My suggestion for thinking about this question is to imagine we are going to talk to a census, i.e. everybody in the target population. If we were to survey everybody, how accurate could we be? Can we ask questions that extract meaning, can people answer our questions in ways that are true to their actions, and can we process the answers in ways that generate the required insight? There is no point in trying to be more accurate than it is possible to be.
- How homogenous is the population? Can we make the population more homogenous by redefining the target group?
- What is the quality of the available sample? If the sample approximates to a random probability sample, then the larger the better. If the sample does not approximate to a random probability sample, then a larger sample will be more reliable (consistent) but not necessarily more valid. With non-probability samples, we tend to estimate the likely validity empirically, i.e. has it worked in the past (even though we know this is risky).
Rules of Thumb
I have just spent some time talking about why sample size is often not the key issue, I have shown where sample size fits in the picture, and I have talked about how to calculate the sample size in terms of sampling error. However, people still like to have some rules of thumb to help them know whether they are in the right area. This can be useful. If you work out your sample size and find it is close to the rules of thumb, that is helpful triangulation. Similarly, if the proposed sample size for a project is quite different from the rules of thumb, it might prompt you to investigate why.
- Try to get 400 survey responses. And really try to avoid having fewer than 200.
- Try to get 100 responses for each cell you want to look at and compare. And really try to avoid looking at cells with fewer than 50 people.
- Apply this logic to things like segmentation. If you think there might be seven groups and you want every group to have a least 100 people, you will need about 1000 people for your sample size (since the groups will not all be the same size).
The Golden Rule? Listen to the client
The golden rule is to understand what your audience, stakeholders or decision makers consider to be a robust sample size.
Over the last forty years, I have worked with clients who insist on a sample of 1000 people, and I have worked with clients who are comfortable with as few as 100 people. What the client believes is not the only factor, but it is an important one.
If the client thinks a sample of 100 people for a project is OK and you think it needs a minimum of 400, then you have to decide whether the client understands the consequences of a smaller sample or not. If you are sure the client is making an informed decision, fine. If you don’t think the client is making an informed decision, you need to make your case.
If the client thinks you need a sample of 1000 people and you think that a sample of 400 is fine, then I would still make the case to the client, but I would not be as forceful. The downsides of a larger sample than is necessary tend to be cost and time.
If the sample is too small, it is unreliable and therefore not valid. If the sample is big, it will tend to be more reliable (consistent) but not necessarily valid.
Your thoughts and suggestions?
Here are my thoughts, what would you add, remove, change or question?
Budget and Tim
Dr Amr Mansour in the comments below and T.S. Lim in a Linked in discussion has brought up the very important issue of budget, to which I will add time. A better sample and/or a bigger sample may cost more than you can afford, or take longer than can be permitted. These are key considerations and I appreciate them being highlighted.
Some people have suggested that the rules for some health care research are different. The reasons put forward include that healthcare professionals are more homogenous, interviews with healthcare professionals are more difficult to get and more expensive, and the population may be smaller. This is some truth in this, but not as much as people sometimes think.
If a population is more homogenous in terms of the topic being researched, then the standard deviation is smaller, consequentially, the sampling error is likely to be smaller, and so the sample size can be smaller.
However, imagine we have conducted a study with 40 thoracic surgeons from the USA population of about 4000 thoracic surgeons. We find that 50% of these surgeons say they would use the drug we have asked them to evaluate. Statistics tells us that the 95% confidence interval is plus or minus 16%. That is, we are 95% sure that if we had asked this question to all 4000 doctors, the answer would be between 34% and 66%.
This is the same as if we had asked 40 Coca-Cola drinkers (from the millions in the USA) to evaluate a new container and if we had found that 50% said they would use it. The homogeneity applies to the thing being measured (i.e. to the 50% saying they would use/buy it) not to their general homogeneity.
Because doctors are more difficult to interview and much more expensive to interview, the cost of being more accurate is higher. In a health study, a client may have to accept a confidence interval of plus and minus 16%, because they can’t justify spending enough money to make the estimate more accurate. By comparison, Coca-Cola might feel that it was worth taking the sample size to 400 and having a confidence interval of plus and minus 5%.
What about the small population of health care professionals? If we sample more than 5% of the population, the rules change (a bit). We apply the Finite Population Correction Factor (Google it if you want to know more). If we interviewed 40 thoracic surgeons from a niche where there were just 400 such doctors, we would be interviewing 10% of the population, so we would apply the correction factor. Instead of plus or minus 16%, our confidence interval would be closer to plus or minus 15%.
Another way to be flexible with smaller sample sizes is to use a lower level of confidence. I have worked with healthcare clients who use the 80% confidence level instead of 95%. If you do that in the case of 40 doctors, from a population of 4000, where 50% say they will use the drug, the confidence interval becomes plus and minus 10%.
The statistical rules do not change when we do healthcare research, but the cost-benefit analysis of more accuracy changes.
Also, we must remember not to get too focused on the sampling error. The sampling error describes the probable difference between sampling 40 doctors versus the whole population of relevant doctors. But it does not tell us whether doctors can predict what they would do in the future if this drug were available. If we think they are not very good at predicting what they would do, there is little point in spending lots of money to reduce the sampling error.