Big data, what is it and will it be here to stay?

Although the term “big data” has been in use since the 1990s, it has gained popularity massively over the last 5 years.

Technological terms come in and out of fashion all of the time, e.g. we still have websites, but we do not call them the ‘World Wide Web’ anymore. We also see terminology evolve over time, e.g. ‘social network’ has been replaced by ‘social media’.

So, why has “big data” become common parlance, and will it be fleeting or is it going to stay? Well, big data has been used to describe datasets of all shapes and sizes. But the key theme is datasets too large to be handled using conventional methods. Either the dataset has become too large to either be collected, analysed or managed efficiently, or a combination of all three. Therefore, big data indicates the need for new tools and methods for handling large quantities of data.

One key area that generates “big data” is passively collected data. E.g. accelerometers. Accelerometers are being used to collect very useful information about; how active a person is, how much they sleep, time spent sedentary, etc. But when data is collected passively like this, it is possible to generate datasets tens or hundreds of times bigger than is typical. Therefore, new automation processes are needed to make the dataset more digestible, e.g. algorithms to estimate sleeping times.

Big Data and Scaling

Ultimately, will the “big data” phrase be fleeting? I think so, although data is growing very rapidly at the moment (over 2,500,000,000 Gigabytes per day), and it will continue to grow rapidly for the foreseeable future, the latency in our methods of dealing this big data will reduce. The hardware to handle large datasets already exists, now we need the forward thinking infrastructure, such as international standards, the open-data initiative and the practical application of artificial intelligence.

The term “big data” will likely be outlived and outdated by another much longer running term, “scaling”. New methods and tools being produced today are being put through rigorous testing and examination to enable them to scale up in size as the data does.

2 May 2017, Will Poynter

