Discovery: a new tool for accessing longitudinal cohort studies

Are you interested in longitudinal data? For example, The Hertfordshire Cohort Study (following 3000 men and women since they were born in the 1930s), or the British Cohort Study (17,000 people born in a single week in 1970). If so, you will be aware of the problems such as: knowing what data is available, what questions were asked, and where is the data stored.

Discovery is a new tool (still in a relatively early stage of development) from CLOSER, that makes eight longitudinal studies more accessible. Please try it

Where are the variables I am interested in?

Perhaps the best way to understand the usefulness and power of Discovery is an example based on finding variables of interest.

The image below shows the home page of Discovery, and it shows that the eight studies include over 55,000 variables [note, if you click on the images they should enlarge]. With so many variables, nobody wants to scroll through them all to find items that are useful to their project or query.

Page from Discovery
By clicking on Variables we can filter the list by Study, Life Stage and Topic, as in the image below.

Discovery Page

In this example, let’s assume we are interested in alcohol consumption. So, in the Topic selection we pick ‘Health behaviour’ and let’s pick ‘Mid-adulthood’ as the target Life Stage. This selection reduces the variables to 250 – still a large number.

At this stage we could refine our selection by specifying which Studies we are interested in, and which Sweeps (or Waves as they are sometimes called) are we interested in. But we can also use search, and in this example we will enter ‘Alcohol’ and sort by ‘Relevance’. This produces a list of 7 variables.

Discovery Page

If we click on the first variable, B9SCQ32, we are shown a summary of the data. The data has been taken from the 1970 British Cohort Study, from the 2012 Sweep (when the participants were 42 years old) and the information came from a paper self-completion exercise. A key benefit of Discovery is being able to find relevant data from multiple studies, each with multiple sweeps (waves).

Discovery Page

The summary, in the image above, shows that the dataset contains 8623 valid responses to the question about frequency of having an alcoholic drink. The various links show the exact wording of the question, the details of the study, and a link access the data itself. To access the data you will need to have an account with The UK Data Service (not usually a problem if you are an academic, if not there will be a longer process).

So, Please Try It, Please Leave Feedback

If you want to know more, click here to visit Discovery.

You will find a few things that are not as smooth or as intuitive as they could be. So, please leave your feedback.

Want to Learn How to Replicate Discovery?

The CLOSER Discovery team have documented how they have created this great resource via a Wiki – if you want to build something similar (or simply learn about the project) you will find visiting the Wiki invaluable.

A great example showing the level of thinking that underpins Discovery is shown by the page looking at How To / Document a Questionnaire / Principles which uses images from real studies to illustrate how principles such as “Do not correct the questionnaire”, and how to work with mistakes you find.


Quick declaration of interest, one of my family members is a key member of the team who have created Discovery 🙂 

One thought on “Discovery: a new tool for accessing longitudinal cohort studies

Comments are closed.