Most good research depends on synthetic analysis!
One of the things that has caught my eye about the recent discussion about synthetic data have been the different reactions to the word synthetic. Whilst most of the debate has focused on the underlying methods, there have been numerous sideswipes at the word “synthetic”. At first, this struck me as odd, so I did some introspection and reading to find out why my reaction differed from that of others. I quickly realised that, in many cases, it reflected people’s arts/science background along with various life experiences, cultural phenomena, and interests.
So, let’s start with my deliberately provocative (weekend) statement that ‘Most good research depends on synthetic analysis!’ This flows from the widely held view that good results tend to be based on synthesis. We take different data, different perspectives, and different starting points and draw conclusions through a process of synthesising the information. The alternative to synthesis is to perform the process from a single source or our existing knowledge. The grammar of English means that this is a synthetic process. Synthesis is the noun, synthesising is the present participle, and synthetic is the adjective. When we approach the word synthetic from the perspective of logic and analysis, it stands on a pedestal.
However, if we consider the word synthetic from an arts perspective, it is often negative. If art, music, movies, or books are described as synthetic, it is usually a bad sign. In the field of human interactions, most people would not want synthetic love, empathy, or encouragement.
This contrasts with the field of science, where synthetic is often a positive. For example, before the advent of synthetic insulin, it was derived from the pancreas of pigs and cows. ‘Real’ insulin was less pure and sometimes caused allergic reactions. Indeed, from the 19th Century onwards, organic chemistry (focusing on creating synthetic products such as dyes, drugs, and carbon fibre) was the driving force behind the chemical and pharmaceutical industries.
There are also some very interesting specific niches. For example, consider two cases relating to clothing. Synthetic fur means fake fur to most people. People who love fur tend to despise synthetic fur, and people who hate fur tend not to like synthetic fur. By contrast, in my sport of running ultra races in cold and rainy conditions, there are many races where the organisers have banned natural products in the insulation garments (i.e. down jackets are banned) and base layers (i.e. t-shirts), because they increase the risk of hypothermia if the conditions are too wet and cold. In these events, I must wear and carry clothing made of (suitable) synthetic materials.
Does any of this matter?
Not much, but I thought it was interesting; perhaps others might, too. The real decision lies in how to assess the process of creating data versus using primary data. It is interesting to note the different reactions to the word synthetic, but that is not an important issue, IMHO.
Should we avoid the term Synthetic Data?
I would probably use a different term if I were selling a product based on Synthetic Data. This is because the term has negative connotations for some people, and I do not want my marketing to push water uphill.
However, for regulators (such as ESOMAR, MRS, IA, TRS, etc.) and lawmakers, I would stick to synthetic data. Synthetic data clearly describes the process of combining inputs to create something that was not in existence before the creation (as opposed to extracting rubber from trees, iron from iron ore, or meaning from interviews with real people).
I agree with Ray. synthetic products and processes are often a better, purer and more stable and predictable than the thing the are replacing. The question is not should we go synthetic, it is rather whether we have put in place enough checking/auditing/verifying processes in place to ensure we know wht we are getting, and more importantly that we know that we are still in control or at least have sufficient control to predict when we will eventually lose control, hence prompting us early to find yet another better way of doing it.
When “synthetic survey data” was recently announce I had a poor reaction and then realized that I have been touting, predicting, and using something similar for almost 20 years. I used and advocated for agent based models to understand marketing behavior for a long time. I’ve built agent based grocery stores where agent shoppers shop and pay for products just like you and me, but the word “synthetic” got to me not the practice. I prefer “simulated”. Synthetic unfortunately has a connotation of “false” or at least “made up” despite its favorable acceptance in the contexts you mentioned. By “simulated” has a long standing acceptance in market research. We understand concepts of fit and measure it for models. We understand even conceptually the errors of prediction. “Synthetic” implies no such quality control. If it triggers a negative reaction in even someone who has been advocating this approach for so long, best find another word.