Open data, what does it mean and why do we need it?

Image representing open dataThe post below is a guest post from Will Poynter, lead engineer at CLOSER Discovery, based in the UK.

There is a common misconception that open data means making data public. This is one, very narrow, way of opening up data.

What is open data?
I prefer to refer to opening up data, an action, rather than open data, a noun. This is because open data suggests an absolute state, while openness is relative to the environment and user. I.e. data is not either open or not, it exists along a spectrum depending who you are, what you would like to do with the data and where we are in the timeline of the data.

Before we get too abstract let me set out an example. Let’s use a teacher’s notebook. This notebook contains everything for our teacher, including comments on pupils, marking, ideas for lesson plans and personal notes. Currently our teacher keeps this notebook to himself and shares it with no one; definitely not open data.

Now suppose our teacher would like to begin opening up the data inside his notebook so that he can share ideas for lesson plans with all the teachers in the school. Even though the data is being opened up, it doesn’t mean the information will be made public or even shared with pupils. Currently the infrastructure the teacher uses is insufficient to allow him to share his lesson plans with his colleagues, due to the personal information inside. But thankfully there is a simple solution. He creates a second notebook, which is just for lesson plans. The new notebook is clearly labelled with
“Lesson Plan Ideas”
“For Staff Only”
“By Mr Pennyfeather”
and is kept in the departmental office.

To summarise, our teacher has now created new data governance that opens up some of his data to a specific audience, which he then enabled by implementing new infrastructure. Our teacher has retained the privacy of his personal notes and marking by ‘decoupling’ the different datasets.

After the success of sharing his lesson plans, he also wishes to share his marking with other teachers in the same department. Ideally, he would also like to see the marking of the other teachers in the department too. There are many possible solutions to meet this need, but the decision is made that all three of the teachers in the department will create two additional copies of their marking and share them with their colleagues.

Technically speaking, this solution does meet the basic requirement of sharing marking with other teachers within the department, but it is riddled with problems. How will the teachers store their colleagues’ copies? With what frequency or timeframe will the marking be shared? How will amendments to marking be handled? And in the interest in data privacy, who is responsible for disclosing a marking copy? It does not take long to see this is a bad model and will be prone to error or misuse.

In this example, we have seen two cases of opening up data, one good and one bad. But they both had good intentions and sensible requirements, the difference lies in the implementation. Our teacher should be sharing his marking with other teachers in his department, but he perhaps needs outside help to design the infrastructure to allow him to do it in a secure and manageable way (e.g. computerised database).

Why do we need open data?
As I have discussed in a previous post, the scale of data being collected is growing rapidly and can cost a lot of money to store and manage. By opening up data it allows for much more effective and efficient use of the data. E.g. by teachers sharing lesson ideas, duplicate lesson plans can be avoided and existing lesson plans can be improved.

Finally, a side effect, or necessary requirement, of opening up data is that an organisation must have documentation and data governance. This provides a healthy level of transparency for everyone.

7 May 2017, Will Poynter