The Long Survey Will Be Replaced By the Sparse Survey

Sparse DataGuest post by Jeffrey Henning. Jeffrey Henning, PRC, is president of Researchscape International. He is a Director at Large on the Marketing Research Institute International’s Board of Directors. You can follow him on Twitter @jhenning.


Ray Poynter argues that the long survey will be replaced by ongoing discussions, in-the-moment research, and observational data. I think it is far more likely that the long survey will be replaced by another type of survey. This is much less of a change for corporate researchers and research agencies than shifting to radically different methodologies. Researchers today are very comfortable with developing long and complex surveys. And of course an entire industry exists providing access to panelists who will take surveys.

Long surveys, however, produce lower quality results than shorter surveys, as Ray points out. Respondents tire, and begin satisficing, providing less accurate answers as survey length increases. Respondents who stick with a long survey often differ in key ways from the target demographic; this bias is exaggerated even further when researchers throw out incomplete responses. Respondents are increasingly taking surveys on mobile devices, where long surveys get even longer as compound questions such as matrix or grid questions get rendered as a series of single rating questions across multiple screens. (According to my own research on research, paid panelists provide similar quality results on the same questionnaire taken at a desktop or on mobile. However, unpaid panelists, such as customers, do a much poorer job on the mobile version of such questionnaires.)

But what kind of survey might supplant the long survey?

In computer science, an array is a sequence of data accessible by an index: looking up U.S. population by year, for instance. Most arrays are comparatively short and typically each record contains an entry. Sparse arrays, on the other hand, are comparatively very long and most records are empty: a list of U.S. citizens by Social Security Number, for instance. (As a result, sparse arrays are implemented differently from conventional arrays.)

Similarly, a sparse survey is one where most questions are not directly asked to an individual respondent. To keep the survey short, from the respondent’s perspective, respondents are skipped over most of the questions through different techniques:

  • Verbatim cascades – Open-ended or verbatim questions provide great color commentary about the topics being researched. But respondents quickly tire of answering verbatims, and verbatim quality (length and relevance of response) degrade with each subsequent verbatim question. One way to handle this is to only ask a respondent the one verbatim most appropriate to them, selected from a bank of 3 to 10 questions.
  • Iceberg matrices – We often set up matrix questions with 20 to 40 rows, but then program the matrix to only show the most relevant rows to the respondent. We sometimes prioritize and cap the number of rows shown in some fashion. (We call these “iceberg matrices” because most of the matrix is never seen by a respondent; it’s below the water line, as it were.)
  • Slices – Chadwick Martin Bailey has presented a series of case studies, including one where a survey has been divided into different slices. Full results are developed using imputation. “For this particular experiment [taking a purchase journey study], ‘contiguous’ slices flowed better for the respondent,” according to Chris Neal, one of the study leads. “But contiguous slices can be much more challenging than ‘random’ slices for certain modeling tools if you want to impute missing data with confidence. Think through your upfront master design very carefully.”

Another way to hide the true length of the survey is to hide questions and populate them from past surveys or from databases. Why ask demographics if you already have those on file? No respondent ever sees hidden questions, yet they are analyzed just like conventional survey questions. This is becoming increasingly popular. ResearchNow can append information about registered voters or household credit information. Lightspeed GMI has one panel where panelists have opted in to provide access to their credit card data, which is appended to each survey response. Lightspeed GMI also has a separate panel where people have provided their Facebook profile; you can append to your survey what brands from a competitive set that respondents have “liked” on Facebook. For one customer satisfaction survey, we appended 15 fields representing categories of purchases made using a retail loyalty card—integrating actual behavioral data into the survey. For our own survey of our customers, we append data about the attributes of the project we just completed on behalf of that customer.

As these examples show, long surveys are already evolving into sparse surveys. Of course, there are many barriers to adoption:

  • Many survey tools today do not support the types of skip patterns necessary from sparse surveys. Costly custom scripting is needed. Two recent examples from our work—
    • In one survey we had two sets of “verbatim cascades.” Respondents who had selected from a set of choices were asked one follow-up verbatim about the item we cared about the most. Once we had 50 responses to that verbatim, we hid it so that lower priority verbatim questions could collect results. Since the user interface didn’t have an operator for “count of responses so far,” this required custom scripting.
    • In another questionnaire, discussing over 20 competitors, we asked respondents a series of follow-up questions for each of the 4 competitors for which we had so far collected the fewest responses. This required a fairly elaborate prioritized list and added about $4,000 to the price of the project.
  • More respondents are required to answer a sparse survey than its equivalent long survey—
    • This is cost prohibitive. For instance, I was recently asked by a client to double the length of a B2B survey from 20 questions to 40 questions. The panel company offered me the same rate for the longer survey, so fortunately there was no added cost. Hypothetically reversing the process, going from a 40- to a 20-question survey would have required doubling the sample and doubling the cost.
    • Small available sample and declining response rates mean there are no additional respondents to select from. The survey ends up being long because this is the one opportunity to get responses from this audience.
  • Broad crosstabs aren’t possible. You can’t crosstabulate every question against every other question, since many respondents will not have answered both, and may not have answered either. Crossstabs in such work are based on the questions everyone sees – questions that specify the brands, products or attributes that matter to them, and demographic and firmographic questions.

Sparse surveys do not make sense as a replacement for every long survey. But surveys with long lists of brands, products, or attributes are often a good fit for this technique. Expect to see more of them, even as respondents see less of such surveys.