Finding and crafting compelling data stories (fast)
Finding and crafting compelling data stories (fast)
Presented by Tim Bock, CEO, Displayr Being able to tell a good data story isn’t just nice; it’s necessity. Do you struggle to work out which results are interesting and which to leave out? Are you ever guilty of presenting data-dump-style reports filled with ‘just in case’ insights?
Automated Transcript via HappyScribe
Note the automated transcript only tends to be about 70% accurate – but it will help get a bit more detail about what is in the video.
“Thank you for joining this webinar, this webinar is focused on how to find and craft compelling data stories. Quickly. I’m going to take you through a five step plan. It starts with creating and implementing an analysis plan, then data reduction, pyramid building, ordering the narrative and techniques for propelling the story forward. We’ll start by musing, though, on what makes telling data stories hard.
We’ve all seen a bad data story. Slide after slide of boring results. The clients bought. And the presenter is getting more and more anxious at McKinsey to have a special term for such a data story. They call it the anxious parade of knowledge, where the presenter is sharing data in the hope the client get something from it. We don’t want this. We want this. To understand how to get there, we need to dig a bit deeper into what makes data storytelling so challenging.
The first hard bit is that facts need to be found in the data. We can’t just create them. The data can be confusing. The facts can be ambiguous and weak is this smudge or a monster? And our audience needs to use these facts to plan. It follows from these last three points that the audience isn’t just there to listen, they need to stress test any results. They need to make sure that the data is robust enough to rely upon when doing their planning.
So how do we do it? As I said, there’s a five step process and the very first step is to create an analysis plan. It’s a natural thing when you’ve collected data to be all excited and just want to jump in and play with the data. However, as a pretty general rule, this always fails. You end up getting lost and finding no story. We’ll start by looking at an analysis plan for a study investigating the appeal of this product, the airlock.
I’ll give you a moment to read the description. This is an example of an analysis plan. Don’t worry if some of the jargon is new to you, we’ll cover some of it in this webinar and at the end or share some links to other resources. You a quick chance to digest it, though. Now, a lot of this is kind of pro forma type stuff you just got to think about and check be be careful about, but the real key bit is the business and the planned tables and visualisations.
The most important thing is that you want to spend time carefully thinking about the goals of the study and working out the analysis that will best achieve these goals. And this is the key plant analysis and not always subject specific. There’s no general rules about how to do them. Then the second keep it is the exploratory analysis. These are the additional analyses that you undertake in the hope you’re going to find something interesting. There is a bit of a formula for how you do that.
You find the key questions in this study. The key question is how likely the people said they’d be to buy the outlook and just cross to have them buy everything else. Now, the planned analysis generally will be quite a few in a study here. I’ve just got two slides of them, but the exploratory analysis can be quite volume, voluminous, very difficult word to pronounce.
And that creates one of the biggest challenges, which is how do you read through them all? And the solution to that is you want to employ a technique known as data reduction or I should say a group of techniques, when you’ve got lots of tables, you need to reduce the number of tables. The basic idea of debt reduction is we chuck away a lot of data in the hope that were chatting away noise so that all that’s left is the story.
The pope of the day asked Michelangelo how he’d carved this most famous of all his statues. He said, it’s simple. I just remove everything that’s not David and this is precisely how we find the story in our data.
We remove everything that’s not interesting to our stakeholders by reducing the quantity of data we distill and concentrated, so we end up with a much stronger flavor, much like fish sauce for those of you that like Vietnamese and Thai food.
So how do we actually do it? Well, there are eight techniques and we tend to apply them in any kind of order and keep applying and reapplying until we’ve simplified the data. Let’s do it. Let’s look at all of these exploratory tables just ahead, the first one. This table compares how long it took people to complete the questionnaire by their stated likelihood of buying the product, that is the purchase intent. It’s hard to imagine a less interesting table now we could just delete it, but with exploratory analysis, we can entirely automate the process.
Statistics has invented a nice little tool that summarizes whether exploratory crosstabs are interesting. It’s called stat testing. We can just delete all exploratory tables that are not statistically significant. Display a just type the word delete in the little search box at the top, it’ll show us the various ways of deleting. I’ve got various options here for deleting tables at different levels of significance. I’m going to take the one, which is that the smallest level, which means we can get most tables deleted because I’m lazy and I want to read too much.
It’s telling me everything that’s going to get deleted and you can see here we’ve only got many fewer tables left to read, so we’ve already reduced the data pretty comprehensively. Let’s do more data reduction here. I’ve got a table which is showing age by purchase intent. I’ve actually got 18 rows of numbers here and I’ve got eight columns. And if we multiply them together means this table of ours has got quite a lot of little numbers that I’ve got to look at about one hundred and forty four.
So I want to again reduce them. This first column is less than 18 years. These are people that have got no data that don’t even people. So I’m just going to get rid of them. The second thing I’ve got on this table is it showing Collum per cent, Roe per cent in count.
Now, if you knew the market research, you might not know this, but most of the time and my only interest in the column per cent, so we can save a lot of time by just turning up the statistics of.
All right, we’re doing well. We’ve already got rid of more than two thirds of the numbers on this table. What else can we do? If you look at the columns, you can see here that that is 24 and a 25 to 34. They’ve both got relatively high proportions of people around 20 percent to 25 percent saying they definitely buy the product. So might merge them together because this difference, 25 to 21 is not so interesting. It’s not that big.
And again, by merging and we can reduce the data further. If I look at the second two columns, I can see that they’re also broadly similar numbers, merge them again. The last to get a merge them again. This process emerging, it’s very important each time we merge, we reduce the number of numbers to stealing the story.
Now, the net column, it just showed me overall percentages of people, as I’m interested in looking at percentage differences by age, overall columns, redundant information from here, I’m just going to get rid of it. This bottom that they’re all 100 percent, that’s definitely not interesting, get rid of that. Now, at the moment, we’re showing all of the different percentages, but any one of these statistics is really relevant, which is the top one, the others kind of follow from the top one.
So I’m going to get rid of them. Now, the significance test, which was shown with little arrows and colors to help me find the key patterns in the data, but now the patterns are really obvious.
And so they just a bit of visual clutter. Let’s get rid of them.
Percentage signs. Well, I can tell the percentage here so I can get rid of them as well. All the clutter, look at the table styles, a lot of blue, intense, dark blue. They’re simplified with pale colors. So really greatly simplify this and effectively reduce the data quite massively here. Nice work. Now, I talked about different data reduction techniques, we’ve just deleted a whole lot of interesting things about removed clutter with merge similar things, that approach of just using the top box for different will buy a case of replacing data with a summary statistic.
We’re going to do that again, doing a more complicated example. I’ll do my best to explain this one and hopefully it’s not going to go too quickly. But I’ve got time constraints. I’m going to start by creating a really big crosstab.
In this study, I’ve got data on how often people watch different episodes of different TV shows. Now data like this can be interesting in its own right, but usually we’re interested in how watching one TV show relates to watching other TV shows, which is known as duplication. And the reason we care is it helps us plan advertising and also helps us plan what TV shows we should create. So to create that data can create a huge crosstab crosstab in all of the viewing data by itself.
And the resulting crosstab I’m going to create contains 1622 numbers. That’s an awful lot of numbers, isn’t it? Too many to read in this webinar. If you look at scroll bars here, you can say, wow, I really got a lot of data and it goes to the right.
So what I want to do, well, I’m going to do this technique I described before replacing numbers with a summary statistic. And what I’m going to do is I’m going to replace all of these percentages with instead correlations. That’s a pretty easy process. I’m just going to change the data and tell it rather than be categorical data. So I’m going to change it to numeric data and displace can automatically calculate the correlations for me then. And it’s doing some heavy thinking, they’re doing a lot of heavy thinking.
And so now I’ve got a table, let me just resize it. Now, I’ve got a little some category that’s been crammed into this table was just like a net, not that interested, get rid of that. OK, so now we’ve reduced all the data down to one hundred and twenty one numbers, we still need to make it a bit easier to read. It’s a bit hard to see.
It’s just so many things that are significant. Usually a great idea is to sort the table. I’m going to sort by values Sinding. Right, and so even though it’s one hundred twenty one numbers, which is a massive data reduction from the one thousand six hundred twenty we had before, still a bit too busy. So I’m going to use the heat map to summarize it. And I’ll hook that up to the episodes watched by episodes, watch tape. Now, the way the heat maps going to work is, the darker the shading, the stronger the correlation there is between the different shows.
What we can see here is can it forms is like little box here and a box down here in the bottom. Right. This is telling us that the shows, the first five shows are actually relatively strongly correlated with each other. And there’s a theme here. These are all science fiction shows. So it’s telling us that all the science fiction shows are most correlated. The viewing with all the other science fiction shows and the more dramatic shows are more correlated with the other dramatic shows.
Now, The Handmaid’s Tale for those if you’re not familiar with, is actually science fiction. But it’s kind of adult science fiction, more sophisticated literary science fiction. And so it’s operating as a drama show. We can see that we’ve got very strong correlations between Star Trek, Discovery and Picard, which makes sense as they’re both in the Star Trek universe. And they are also colliding with Doctor Who, which kind of makes sense that this is the more classic, shall we say, nerdy science fiction.
Now, we’ve done a great job of reduced 1620 numbers. Just a little pattern here. We can get a little bit further using correspondence analysis. Now, correspondence analysis is a specific technique. It’s great for market research and it’s part of a more general strategy of doing decompositions. I’m not actually going to find that much detail in this webinar. Just illustrate the main one, which is correspondence analysis. How do we do it again if you can’t find menus and you can type correspondence and it’ll show me where it is?
Well, hook this up to a data. And this one now group together, the TV shows that have got the strongest patterns between them and we can see here that go to science fiction on the left hand side and a drama on the right hand side. That’s the biggest pattern. And we can see here that the Mandalorian stands out from the other more classic science fiction kind of shows simply fact. You a little bit different to the others as well.
So that’s even greater compression of the dead.
Now, I’ve gone through all of these different techniques except changing the scale, which is just regarding most market researchers are pretty familiar with the last one that I’m not going to cover an example here, but I’ve got some materials that link to the end of the Web and which you can find out more about it. Just using common sense. Don’t show results that don’t make sense if they’re inconsistent with everything else or if they smell of them or if there are alternative plausible explanations for the results.
All right. Step three, build pyramids. Remember earlier I talked about the need for people to stress test results. Now this is actually an example of a pyramid that I’ve created in the data itself. And it’s a general technique by pyramid. We just made the key conclusion, a result at the top and the supporting information underneath. And you can make pyramids really, really big. You can keep going. So here I’ve got recommendations supporting information, supporting kind of conclusions and the data that leads to those conclusions.
This pyramids can get bigger and bigger and bigger. Now, there’s a few key aspects of the pyramid that make them particularly useful for market research. The first one is that they chunked together the information, which makes it easier for people to remember results and also that link to give the information, which makes it easy for people to remember things because it just taps into the associative way that our brains work. Not the best kind of pyramid is what’s called a recommendation pyramid.
We tell the client what specifically you think they should do and the evidence that led to that recommendation with all of the data and in your analysis state, you should be grouping together all of your analysis into these kind of pyramid structures to try and figure out what the patterns are. And you can do that like with, you know, Post-it notes on the wall or in display. We’ve actually created it so you can actually drag and drop individual results around.
And you can see here I’m just dragging pages. And when I do, that actually forms pyramids directly in the document itself, which is because I like pyramids. Anyway, so build pyramids is very, very important. The let me just scroll down and get back to where we were.
Now, once we’ve got our recommendation pyramid, then I’ll give you a quick chance to read this. We then need to create the actual narrative, so the pyramid is how we organize our results to create meaning that we, as researchers understand ordering the narrative is how we present those results so that our client can understand them. One of the classic ways of doing that is picks up stories spined. Now, if we’re talking about the pyramid, you could see this being recommended show the key results, then the findings and then the conclusion in many market research presentations are done that way.
But this is what I call a mouse trap, because we want people, as I said before, to stress test results. They can only do that if they know the key conclusions. And so we actually need to always be presenting the key finding at the beginning. It’s the only way people can stress test results showed them the key finding and supporting information. Then, given the data, they can go through it stress testing as they go.
Once you’ve got your narrative structure worked out, you then got to find the best way to propel the story forward. Those great wizards of podcasting that this American Life team say, we’ve basically got 60 seconds before people get bored. So we’ve got to come up with a technique every 60 seconds. Here’s some techniques. So let you pause this later and read them if you want to. The first and a key one, I think, is to make sure that each of your pages follows the pyramid structure.
The key conclusion at the top, then the key supporting data, little more detail, then the data. But it’s written in such a way that people only need to read the data if they want to dig into it. They can just read heading and move on. Here’s another example and key conclusions at the top, and the supporting data is shown underneath another technique to use if the hero’s journey.
This is the classic hero’s journey. And we want to focus particularly on the story gap, which is the difference between the abyss, where it’s all going bad and where the client wants to get to where the hero wants to get to.
In Star Wars, Luke thought it was going to be squashed together, squashed. Sorry, that’s the abyss of the Death Star. The bigger that gap, the more the excitement I let off this webinar showing. The anxious pride of knowledge, this is what the hero wanted, again, creating the excitement in your case, you genuinely wanted to contrast the consumer’s current state with where the consumer wants to get to. Another great technique is to promote interactivity, allow users to create their own stories when you’re presenting to their users, I mean, the clients, the users of your research.
I want to click on options and figure out interactively what the data means, because by participating in this process, they get engagement. Here’s a summary of the techniques we’ve been through. Here’s some links to other resources, news, lots of great resources on the new website as well.“