How to map a hashtag: Identify leaders, groups, and topics with network analysis

Click here to access the slides.

Click here to access the slides.

Transcript of recording with Marc Smith – generated automatically by HappyScribe which means it will be about 80% accurate – if you spot confusing errors, please email The timestamps are included to help you jump directly to a point of interest.


[00:00:12.470] – Marc Smith

Good morning, good afternoon, good evening. My name is Mark Smith. I’m a sociologist with the Social Media Research Foundation. I’m here today to talk about our tools to facilitate mapping hashtags and other kinds of collections of annotated connections, the kinds of things you find in social media. Using network analysis as a way to quickly identify the leaders, the groups and the topics common in a social media topics stream. I’m a sociologist and I work with the Nodexl team, we make tools that attempt to make it easier for non-programmers to collect and analyse social media data, using network analysis methods to visualize them, and then to generate interfaces that allow for discovery and presentation of insights into collections of connections.


[00:01:12.800] – Marc Smith

The phenomenon I think many of us might think of or a metaphor for the phenomena we may think of is a collection of tweets, probably brings with it the idea of a flock. And indeed, there are some really positive aspects of this metaphor that there’s no formal authority, all sorts of structures and shapes form and then dissolve. And it involves a lot of individuals. But it’s also a little misleading in the sense that birds, or at least these birds, starlings, are not the same as humans.


[00:01:51.560] – Marc Smith

And in fact, if humans were birds, we would probably be more like chickens in the sense that we have a pecking order, that there is a hierarchy. There doesn’t seem to be a hierarchy here. No one bird is more important than others. And what network theory is going to show us is that it’s always the case that some birds are, well, more influential than others. So imagine some of these birds having a lot more influence over the flock than the average bird.


[00:02:22.420] – Marc Smith

So what we are eager to explore are the ways in which we might actually visualize the pattern of connections in the populations found in things like hashtags and discussion groups and email lists and web boards. So particularly those things that have reply as a structure where one person can reply to another, which isn’t universal. Some things like wikis don’t have reply’s or don’t have parts of them don’t have refires. So what we are exploring is ways to try to come up with pictures of collections of connections.


[00:03:05.860] – Marc Smith

And sometimes the results aren’t that great, but critiques of network theory that suggests that visualizations of networks are bad because there are bad networks, visualizations are somewhat limited because after all, that would be like saying that photography is bad because there are bad photographs. And it is true that there are about photographs, but it’s also true that there are good photographs and there are better and worse ways of making photographs. And so one of the things that we’re interested in exploring our methodologies and techniques for visualization of collections, of connections, of networks in a way that improves our ability to visually perceive structures like clusters, leaders, divisions and the presence of, for example, the fragmentary population of the population of people who over here, for example, have not connected to each other or to anybody else.


[00:04:06.040] – Marc Smith

And no one is connected to them. They’re isolates. So there are many different ways to imagine drawing these pictures and we’re open to hearing about others, and if you follow that link, you can play with this sort of 3D spinning version of the network visualization. But I think what we’re trying to do is just make the core concept of network analysis much more graspable by essentially putting it into Excel and making certain kinds of patterns and structures of connections more visible.


[00:04:40.880] – Marc Smith

Like here, the dots are sized and coloured based on how many connections they have there, what is referred to as degree. So if you count up the number of lines for every dot, you’ll see that its colour and its size is proportionate to that number. And that number is also listed here in are of the spreadsheet to the side. And if we decide to count by a slightly different attribute like this, between this and reality, we discover that where what’s a node like we saw before was a large button.


[00:05:16.100] – Marc Smith

O is no longer large. It’s become small because these other nodes are big because if they were to be removed, the network would be fragmented, whereas if you removed, so actually the network would pretty much stay the same. And so we’re going to come back to this notion of between the centrality as an indicator of importance throughout the rest of the discussion today. So when we look at tweets, I think the big distinction between a network approach and a lot of other approaches is that where you may see just a collection of words, messages, we see the encoding of links.


[00:05:55.640] – Marc Smith

And so the same message may encode more than one link. And over time, the pattern of linkages reveals structures, highlights certain people as being in positions that are distinct from others. And so layering these networks together really helps. The picture reveals a picture of who are the important people and where do they fit within the discussion. And so network analysis, I think, adds this middle tier, the group metric, the mezo layer, we have a micro, it’s an individual, we have a macro and that’s the group as a whole or some collection of content.


[00:06:37.740] – Marc Smith

But network analysis allows us to create these clusters at the middle tier that are based on patterns of connection rather than patterns of content use. And so two groups of people might use the same terms but not connect to each other. And they would then therefore form separate clusters, even if they use the same language. Typically, different clusters do use different language. So here’s our conversations or I hope conversations that matter to you, the hashtags mrx, NewMR and marketing.


[00:07:13.470] – Marc Smith

And you can find these maps and reports and others at Node Excel Graph Gallery Dog, and you can search for the word marketing. And I think you’ll see quite a number of these. And the point of graph gallery is to be a kind of Instagram for graphs, you might call it Instagram if we won’t, but you might. And the hope is that, like Instagram providing you with this very lots of images from lots of places. And if if you have an interest in a topic and going and going to Instagram and doing a search is certainly going to reveal lots of photographs of a place or a person that you might be interested in.


[00:07:53.480] – Marc Smith

Similarly, note, Excel Graph Gallery is a place you can go to ask a question about conversations and to see photographs of groups, of people talking about different topics. And when we see these images, what we see is that there are a variety of shapes to the crowds that form when people form groups and networks in social media that they don’t do it all the same, that there are some variations and these variations are not infinite. There is a limited set of structures that we find in this data that reveal things like the fragmented structure, which is the group over here where nobody is talking to anybody.


[00:08:38.570] – Marc Smith

The wagon wheel structure where there is one person in the middle and an audience is formed around it. And in some rare cases, there will be the community structure where lots of people are densely interconnected. But when we look at these images, it does bring to mind the question just how social is social media? And if social is defined as the presence of reciprocity, people replying to each other and density groups of people interconnected rather than a kind of hub and spoke or lots of fragmentation, if those are the necessary conditions for social.


[00:09:17.570] – Marc Smith

The answer to the question, just how social is social media might very well be. Not very much not to say it’s impossible, but that when you find it, it’s usually rare and therefore precious. So zooming in on our conversation networks, we can see who the people who occupy that position known as between this and reality, are in our discussion communities, and many of these people are no doubt familiar to you. And they show up because they are in positions like the centre of a wagon wheel structure.


[00:09:56.680] – Marc Smith

And when we have these clusters discovered in the data, we can then find out that different clusters have different topics of interest. They have a different pattern of hashtags, for example, and that the pattern of any one group, it’s worth noting, might be very different than the pattern used by the group as a whole. The group as a whole has one set, but each subset or subgroup has its own vector of hash tags. And similarly, a vector of you are L’s and words and other kinds of information tokens.


[00:10:31.340] – Marc Smith

This is the hashtag mrx, and so it’s a larger population, and yet I think at least four of the top 10 people from NewMR are the same people who are the prominent people, the high between the centrality people at the centre of the discussion and the mrx discussion has a similar set of hashtags, but again each cluster can have a different set of hash tags than the group as a whole. And we can compare this works further to, let’s say, just marketing, which is a much larger topic, and it attracts an enormous amount of content in a very short amount of time in the people associated with this discussion, just #marketing versus other topics like mrx are very different and the topics have changed.


[00:11:21.150] – Marc Smith

Analytics is not a topic when it comes to marketing at the moment, whereas it is a big topic in mrx. So of course we recommend articles that maybe we wrote and we wrote a few of these. We do suggest you point at Pew Research and type in the word nerd. So you’ll get this article mapping Twitter topic networks from polarized crowds to community clusters. And I would say the main takeaway of this is this pattern that there is the observation of all of these networks over time has led to the recognition of about six basic patterns that we observe, which is not to say that there can’t be a seventh, but that we simply look for evidence that we were we would like to see naturally occurring networks from social platforms that demonstrate a certain structure.


[00:12:12.140] – Marc Smith

And then we would expand this table. These are examples of what those schematic structures look like when we look at real data. So this is the schematic, that was the schematic structure. This is the actual real data. So going back, we can now look at our discussions and ask questions like, well, which of the six patterns do we most resemble and who are the people at the centre of those patterns and what are the topics that are being discussed?


[00:12:45.380] – Marc Smith

And the hope? The goal is that while some of these discussions can be high volume, fast moving, demanding to keep up with and that they come with their own opportunity costs, if you’re paying attention to one topic, you can’t pay attention to another topic. It would be nice if we could find some solution to the situational awareness problem. How do you keep your finger on the pulse of many, many conversations? I like to think that the machines got us into this problem.


[00:13:16.580] – Marc Smith

They made it possible for humans to create more content than humans can consume, and so therefore it’s their fault. So if the machines have got us into this problem, the machines will just have to get us out of it. And to that end, we imagine a social media network analysis, data flow that acts as a pre-processor for human attention, a way to deal with the fact that human beings have a relatively limited budget for tweets they can read.


[00:13:48.800] – Marc Smith

How many tweets can you read in a day? If you can read ten tweets in a minute, that’s 600 tweets in an hour. And how many hours can you spend reading tweets? And yet most topics of interest could easily exceed that attention capacity by a thousand tweets a minute, you know, ten thousand tweets an hour. It could easily get to that level of volume. And so we imagine in the same way that, you know, you need a microscope to see really small things and you’ll need a telescope to see really far things.


[00:14:23.030] – Marc Smith

You may need a connect scope to see really connected things. And our goal would be to sort of free up the observer and the analysts from the reading of thousands of tweets to try to come away with a view of not just the the tree, but the forest. And so you could ask questions now, like, what does your network look like? And what about your competitor or your aspirational competitor, the people that you feel are doing the job better than you.


[00:14:55.790] – Marc Smith

And you could use them as a benchmark to see in what way do your interventions actually change your network in ways that generate the results that you want? There are many different shapes that we’ve seen. Not all of them are equally good for all purposes. So what we’ve discovered is that delivering the spreadsheets and the Web reports has been useful, but what people want is to kind of noodle around in the data to engage in the sixth and seventh tasks that we’re trying to support.


[00:15:30.630] – Marc Smith

There are tasks like collect the data, store it, analyse it, visualize it, build a report about it. And we thought that was a lot. It turns out that there are two more tasks after that, and that’s discovery and presentation. And so the goal is to now allow people to engage in the discovery tasks, to look at, let’s say, the history of NewMR in a way that’s a lot friendlier than Excel. B Look-Up is a very difficult formula to work with in Excel, and so pivoting everything on everything is now possible.


[00:16:06.060] – Marc Smith

And so we can see where the spike is, we could zoom in on it. We can explore the population of contributors in a way that really reveals how rare a certain kind of social influence is the people in the upper right hand quadrant. There aren’t that many of them as compared to the total population. And when we see those names, I think many of them are going to appear as people that are prominent in our community. The advantage of this kind of interface is that you can click pretty much anywhere and say, yes, but I want to only look at, let’s say, for example, the people who used an iPhone.


[00:16:43.320] – Marc Smith

And so the client app thing here is the breakdown of Android and iPhone and Web client, all the different ways that people can get into Twitter. Similarly, we can look at how they broke into separate groups because it’s not just one big network. It’s actually lots of little networks connected. We can look at what content was top by various dimensions like which got both many tweets and favourites. Some things get favourites and things get retweeted, but which got both.


[00:17:17.070] – Marc Smith

And, of course, the ever popular word cloud for hashtags to give us a sense of what the topical focus for our conversations are at a glance, rather than consuming many thousands of messages. So the hope is that things like sentiment can reveal topic areas of concern to the population, maybe two subpopulations. So if we were to now come over to the group’s tool, we could pick a subgroup and we would see just the negative words used by different groups rather than the conversation as a whole, the population.


[00:17:52.440] – Marc Smith

And of course, then it’s very interesting to look at where is the conversation from? So where are the new smart people and how would that then potentially compare to the marketing conversation so we can look at marketing full up, go back and look at NewMR or even compare that further with mrx. And each of these is pulling from slightly different geographies. Maybe there’s a lot of overlap, but certain topics reach out to certain geographies. And of course, we could zoom into that and figure out who those people are and what those messages are.


[00:18:24.040] – Marc Smith

So there are a group of people who were necessary to achieve these goals and there’s a lot of academic material out there, if you look at Google scholar ULFA and search for Noetic. So I think you’ll find a lot of papers that are trying to essentially chart the atlas of social cyberspace in the same way that the geographic information system people are trying to map and chart the landscape of social media. So those are our goals and some of our results. I hope you’ll find them useful.


[00:18:57.490] – Marc Smith

I’m open to hearing any questions, queries, comments, concerns. And I thank you for your time today.