Researchers should be aware of the problems with observational data

Posted by Ray Poynter, 18 May, 2018

The world is shifting from asking questions to utilising observational data (mostly for very good reasons) and this is creating a new set of problems that researchers need to recognise and address.

What is observational data?
Observational data refers to information gathered without the subject of the research (for example an individual customer, patient, employee, etc.) having to be explicitly involved in recording what they are doing. For example, collecting data without people having to respond to a questionnaire, without having to take part in a depth interview, and without having to maintain a research diary.

Most big data is observational data, for example, the transaction records from a bank, people’s viewing habits on a video streaming service, or posts in social media. But, observational data can also be small data (based on just a few people). For example, participant ethnographic methods, used to to study people in their everyday lives, collects observational data, that is clearly not ‘big data’.

Observational data can be based on census or it can be based on sample. For example, a few years ago a leading mobile phone company was able to sell very detailed data about the movements of its contract customers (over ten million people), but it could not provide this information for its millions of ‘pay as you go’ customers. In this case the mobile phone company was (depending on your view) offing a census of its contract customers, or it was offering a large sample of its total customer base (or a sample of all mobile phone users in the country). In contrast, a food delivery company from a small town may have data on all of its one-thousand customers. The data might comprise: what was purchased, when it was purchased (date and time), the price, the delivery time, and perhaps background variables such as the weather. This observational data would be a census, even though it was based on just 1000 customers.

Observational data can be relatively objective or more subjective. If, for example, the data comprises a digital record of all bank transactions it would be considered objective and numeric. If the data were ethnographic notes from a researcher observing the customers of coffee shop, the data would be more subjective, and (in all likelihood) less numerical.

Observational data can be numbers, images, videos – indeed anything that can be recorded. Observational data can be recorded without people actively doing anything, for example monitoring their mobile phone connections to cells, or it can be the result of actions they take as part of their everyday life (for example things they post to social media).

Observational data can be mixed with question-type data. For example a food delivery company may have numerous observational data points about each customer and each purchase, but they might also ask for a satisfaction score and a satisfaction comment – these two pieces of data are not observational data, but can be used to help interpret observational data.

There are also some nuanced observational techniques which blend questions and observations, for example an ad testing system where a sample of people watch one or more ads, answer some traditional questions, but they are also observed using techniques such as eye-tracking, facial coding, and perhaps some form of brain scanning. This is observational data, but not based on observing people in their natural environment, going about their everyday lives.

Why the shift to observational data?
There has been a major shift towards observational data in terms of gathering data to inform insights about people and the actions they take. This has been the result of several trends that have tended to pull in the same direction:

The growing realisation of the limitations of questions. For example, the extent to which people are poor witnesses to their own motivations and plans. These issues have been highlighted by neuroscientists (e.g. Antonio Damasio) and behavioural economists (e.g. Daniel Kahneman and Dan Ariely) – but researchers have been aware of these issues for decades, and have sought to mitigate them.
The increased difficulty in conducting question-based research. For example, declining response rates and the problems of accessing representative samples.
Observational data is increasingly available. This change in availability is largely because of the shift to a digital world. The internet and smart devices (smartcards, smartphones, smart homes etc) mean that people create a digital wake of information behind them that can be used to create observational data sets. Not only is this observational data widely available, it is often much cheaper than data collected via researchers asking questions.
Increased processing power. In the past, one of the reasons to focus on small amounts of qualitative data or on the responses to highly structured questionnaires was the challenge of processing them. As computers and algorithms have become more powerful, the range of options has expanded.
From a sample to a census. In many cases, observational data allows researchers to work with a census rather than a sample – for example studying the purchase/travel choices of every customer of a specific airline. This sometimes has genuine benefits (e.g. eliminating sampling error and potentially sampling bias) and frequently has ‘face value’ benefits.

What are the main problems with observational data?
Despite the attractiveness of real data, from real customers, living real, everyday-lives, observational data creates its own problems. Researchers need to be aware of these problems and seek to address them. The problems include the following ten issues:

Where the observational data tells you the wrong thing. For example, when HRT was first assessed using observational data it was decided that it reduced heart problems in women, which led to it being widely prescribed. Later a ‘proper’ randomized controlled test indicated that HRT was slightly bad for women’s hearts. The observational data had not accounted for the fact that wealthier/healthier women were more likely to be prescribed HRT.The leading data scientist/commentator Nate Silver has said that as big data grows, the proportion of spurious correlations will grow much faster than the proportion of useful, meaningful findings. Within this category of problems are selection bias, survival bias, the post hoc ergo propter hoc fallacy, and random variation providing spurious correlations.
Confusing cause and effect. There may be a real relationship in the observational data, but the direction of causality may be wrongly determined. A rooster crows before dawn, but it does not cause the dawn, the impending dawn is the trigger for the rooster to crow. In terms of marketing, consider the case where somebody searches on Google and because of what they find they decide to buy a specific smart watch. Alternatively, they might have decided to buy that watch because their friend recommended it and then used Google to find out which stores near them stock it. With observational data, identifying cause and effect can be difficult.
Ignoring the true driver. Consider the case where the head of social marketing shows the company’s Chief Marketing Office that the sales of his company’s ice cream appear to be driven by social media advertising. When the advertising spend goes up, the sales of ice-cream go up, and when the advertising spend goes down, the sales of ice-cream go down. The CMO may (if she or he is savvy) point out that sales go up in the summer and down in the winter, and that the social media spend follows that pattern too (to maximise share of market).
Where multicollinearity means that the individual factors can’t be measured. If all the brands in a market move their prices up and down together it will not be possible to model the linkage between price and brands from observational data – because there is not sufficient variation. If a complex, multi-channel advertising campaign is launched across all the channels at the same time, it will often be impossible to accurately measure the impact of one element of the campaign in isolation – for example how much did adding that famous TV personality contribute to the change in sales.
The relationship is too complex and/or chaotic to measure. Some relationships can be approximated with relatively simple models (models made up of linear components) – however other relationships are more complex. Nate Silver contrasts weather forecasting (which has improved dramatically over the last thirty years) with the prediction of earthquakes (which has not really changed at all during that time, and some experts in the field fear it may never be predictable). Nassim Taleb has written (in his book ‘Black Swan’) about the rare events that are impossible to predict with traditional statistics.
Because of feedback loops between cause and effect. Economists use observational data to try to understand markets and to make predictions about things like recessions. However, economists have a very bad record at predicting recessions and this is largely for two reasons. Firstly, there are many more variables than recessions, which means that there is an infinite number solutions (if you think back to high school mathematics you will remember you need more observations than variables). Secondly, economists look at previous recessions and deduce that when the currency does X, governments do Y, and investors and companies do Z, the result is A, B & C. However, governments, investors, and companies also look at what they did last time, and at what the economists have published, and the next time the currency does X, the governments, investors, and companies change their behaviour because of what was learned from the last iteration.When smart metres are installed in people’s homes, they provide great observational data about how energy is consumed, but they also highlight this to the consumer, who may then adjust what they do (for example to reduce costs). Similarly, data from the effectiveness of digital campaigns is often used in real time to adjust the campaign, this can lead to better results, but can confound the statistician’s ability to measure overall relationships.
The measurement effect. In many situations, when you measure something you change it. For example, if you put a thermometer in a glass of water to measure the temperature of the water you will (very slightly) change the temperature of the water. Researchers have shown that just by painting a pair of eyes over an honesty box for paying for donuts they can change human behaviour. As more people become aware that their behaviour is being measured, the behaviour we are seeking to measure may change.
Confusing influence and homophily. Sinan Aral has shown that researchers often confuse influence (the extent to which we copy somebody else) with homophily (where we hang out with people who choose the same things as us). For example, do smokers smoke because their friends do (influence), or do smokers hang out together because they smoke? Sinan Aral’s research, based on observational data generated by experiments has shown that observational data based on naturally occurring phenomenon can be misleading if the wrong model is assumed (for example if the model assumes that behaviour is driven by influence rather than homophily). One example of this effect is when examining the impact of campaigns that use free samples, simple research can often show the ROI of the samples given away, but experiments may show that many of the people who were given the free samples would have bought anyway, changing the ROI.
Not explaining the why. Analysis of observational data may tell us that a specific pattern is happening, but it may not tell us why it is happening, and to utilise the pattern we may need to know ‘the Why’. For example, observational studies show that when rain is forecast, fewer people walk or cycle to work, and more people use private or public transport – in this case the ‘Why?’ seems to be relatively straightforward. A nice example of where observational data does not provide the why is given by Ben Wellington in his 2014 New York TEDx video. From New York City data he identifies which fire hydrants in Manhattan generate the highest revenue. Two hydrants in particular generate many more fines than any other in the City – over $55,000 a day. But Wellington can’t intuit what is causing it. So, he visits the location, looks at it, photographs it, and identifies that it is because of unclear signage and a specific road layout. In research terms what he has done is use qualitative research to understand the why from a big data analysis.
Inability to Research things that have not happened (yet). A large proportion of market research relates to things that do not yet exist, for example advertising and concept pre-testing. No amount of listening to social media or analysing purchase behaviour data is going to tell you whether the next ad for your airline is going to ‘work’. Purely observational data will not tell you which new flavours you should add to your drink range. In both of these cases, observational data can provide some useful input, but it can’t solve the problem.

What steps should researchers be taking to minimise the problems with observational data?
There are a wide variety of things that researchers can and should do to improve their use of observational data, including:

Consider the counter factual. What would have happened if we had not done X. For example, if we had not used social media to promote our ice-cream in the summer, what would the sales have been? The counter factual is likely to be an approximation, (for example, the sales for Jun-August will be the average of the last three years).
Make predictions in advance. There are so many ways of analysing observational data that if the analysis starts after the project has finished it is likely that positive news can be found – but this news may not be valid or sufficiently robust. The best practice is to say in advance that this activity (e.g. this new advertising campaign, is supposed to work by reaching this group, it is supposed to make them see the brand as more ‘edgy’, and it is supposed to increase trial by 10% and sales by 8%. Armed with these predictions it is much easier to assess whether the campaign had the desired effect.
Try to model the data to mitigate various biases. Techniques used to mitigate the problems with observational data include weighting the data to make it better match the population, using matching to find similar people in the population who were not exposed to the stimuli, and utilising Bayesian statistics (based on the probability of X given that Y has happened). These models can be very complex, but that will not necessarily make them correct. If a systematic underlying bias has been missed, then this modelling will make the data more plausible, but not necessarily more accurate.
Build experiments into the observational process. If you want to measure the impact of a complex multi-channel advertising campaign, do not launch with a big bang. Try to vary the sequence and spend across groups that can be measured (in the old days this would be by geography, with digital it can be achieved by creating groups). At the recent ESOMAR APAC conference in Bangkok, Thailand (13-15 May, 2018) Brent Smart, Chief Marketing Officer, IAG, (an Australian insurance company) showed a case where he conducted an experiment on a specific campaign – one group of people saw it, one group did not. The experiment showed the campaign delivered no additional sales, but the attribution analysis that was conducted (two versions) showed substantial incremental sales.
Use surveys and qual to explain the missing part of the picture. When you need to know ‘the Why’, or when you want to know how to change things, traditional research can be the extra ingredient that observational (especially big data) needs. In many cases, like the Ben Wellington example above, the research that will unlock quant, observational data will be qualitative.

The ethics of observational data collection
There are two main ethical issues about using observational data collection. The first is pretty obvious, do you have permission to collect and use the data? The debacle over Facebook and Cambridge Analytica and Europe’s new GDPR requirements show the importance of ensuring that you have permission to process individual data, including observational data.

The second problem relates to information you might uncover and the actions you should take. This ethical problem is clearest in social and medical research. Two different treatments or plans are put into use – one group receive option A and the other group option B. However, if the results of the trial show that one of the treatments is less good (or actually harmful) then it may not be ethical to keep the trial going, which means the information may be compromised.

In a commercial situation, in many countries, if the analysis of observational data shows that people are on the wrong mobile phone plan, if they are paying too much for their energy, or if they could have bought a cheaper ticket, then it may be unethical not to intervene (indeed some governments/courts may also decide it is illegal mis-selling).

Observational research should be reviewed for its ethical implications – and the nature of what is ethical is likely to continuously evolve for the foreseeable future.

Want to find out more about this topic?
I am going to be broadcasting a webinar presentation on this topic as part of the NewMR Future of Data Collection event on Thursday, 24 May. You can find out more, or sign up to, this free webinar by clicking here.

I will be presenting a paper on this topic at the BigSurv18 conference in Barcelona, 25-27 October. To find out more about this conference, click here.

I am also preparing a paper to submit to the IJMR (i.e. lots more references and less opinion Ray Poynter) – if you have material or ideas that you think I should address, please reach out to me.

Related