Big Data

When Google decides which websites you should see and Amazon is set to send out goods you have not yet ordered, then you can be assured that Big Data is involved in some way. But where does this ‘Big Data’ come from, what does it mean for us and how can it be used in a meaningful way?

Autor*in Jana Light, 11.11.14

The newest methods of analysis combined with innovative technology allow for the storage of massive amounts of data – far above the abilities of the human brain. It doesn’t matter whether we use the internet to find information or to do our shopping, many of our daily online actions leave traces without us being aware and are saved in the form of data. These tiny individual actions are then agglomerated to one Big-Data Ocean. Big Data (or Smart Data) is responsible for the changing way our public and private lives are displayed. Business strategies, customised advertising, as well as social causes such as crisis management and environmental protection profit from this data.

What exactly is this data, why is it being saved and who has access to it?

From Big Data to Smart Data

Behind the term ‘Big Data’ lies much more than just a large amount of information. Experts have defined this new social resource using three elements – the three ‘V’s: Volume, Velocity and Variety. Volume describes the ever growing amount of data, velocity for the growing speed at which data is organised and utilised, and variety pertains to the increasing variety of data types and sources. The worldwide data pool in 2011 contained 1.8 zettabyte of data; a number with 21 zeros, with an average daily growth rate of 2.5 exabytes.

One reason for this impressive growth is the fact that more and more devices are used to, or are able to, access the internet. Washing machines, digital cameras, and even parking ticket machines collect and store data regarding their location, time, date, and user activity – amongst others. This data is then transferred to the “Internet of Things”. However, a vast majority of the data is created by our daily online actions. We tweet, and blog, and post and search, we bank and we shop – all this feeds the internet with information. The ‘volume’ of data is growing exponentially: since the beginning of the internet until 2003 approx. 5 billion GB of data were collected. Today, this same volume is acquired in just 15 minutes!

However, simply collecting data is pointless – the true value comes when it is ordered and analysed, or, in other words, turned into Smart Data. There’s so much information floating around the internet that without computer analysis, no patterns are visible. That means that in order to find links between bits of information, it must be saved on multiple processors which are all working at the same time, on the same problem. The machines find the suitable data by themselves and link them, effectively creating new linkages which humans may never have noticed.

Who Has Access to Smart Data?

Not everyone has the means to analyse Big Data. You need the relevant know-how and the right software to profit from Big Data. One of the greatest hurdles is the enormous amount of free space required – roughly 250,000 times the size of space available on a tablet. This excludes individuals as well as most companies from performing such an analysis – however, very large companies may have the capacity. These include Facebook and Amazon, which analyse customer-specific datasets created while viewing and using the website, as well as when ordering goods. Using this data they can find out more about their users and individualise and customise advertisements and deals.

There are also some commercial analysis institutions, such as IBM, which analyse data to create forecasts and sell these to paying customers. Who exactly has access to our data, and for what it is used, often remains a mystery.

The Open Data movement advocates the transparency of Smart Data and wants the analysed data to be available to everyone. Their vision is to create a society where everyone has access to all data so that everyone can profit from it. Where previously exclusive contracts with governments were needed to access data about – for example – education, traffic or finances, now everyone can access this information, often for free. This is thanks to initiatives such as the Open Government, and NGOs and social entrepreneurs profit immensely from it. More about this can be found at: Open Knowledge Foundation International.

The source of this openly accessible data is often governments, especially in the areas of geo-information, culture, science, finances, consumer protection, statistics, weather, environment, transport, and politics and administration. (Source: Open Data Handbook). Because of this, Open Data only contains information not bound to one individual, which is especially important when regarding privacy and data protection aspects, and to the movement as a whole. It is seen as the differentiator between the Open Data movement and the commercial side of Big Data.

But regardless of whether Smart Data is seen as a commercial product or not – for what is it actually used?

The potential of Big Data.

The new opportunities of data usage have a great potential for improving inequality and injustices on many levels, as well as increasing efficiency in areas such as energy and resources.

Environmental Sector

Many large, influential companies are frequently criticised for destroying the environment, not acting sustainably and through this behaviour contributing greatly to climate change and pollution. At the same time, in some parts of the world, society is preparing for an energy revolution, through which renewable energies will (hopefully) take the lead. In order to reach the set goals and to keep up with the competition, it’s very important for companies to be efficient, to eliminate deficits and to have a long-term gameplan. Previously, you had to wait for relevant studies and then adjust the company strategy accordingly. Now, however, Smart Data can offer the same information at a much quicker rate. Smart Meters, for example, aggregate data about energy and water usage in real time and then present them in a way that is understandable to end users.

IBM offers real time analysis for the energy sector, showing how much energy needs to be added to power lines. This allows energy production to be adjusted to the actual demand. This data is – though obviously useful for everyone – not accessible by all, but limited to those who pay for this service.

The connection between Big Data usage and the newest civilian drone and satellite technology allows us to receive real time images of areas hit by natural catastrophes. Through data feeding, the drones can determine their travel path by themselves, and can transmit images and other data, which can be used to create a map of the affected area and to identify locations of injured people or logistical problems. This is then published online and is open source.

Another way environmental maps can be created through crowdsourcing and Open Source maps is found in the app Wildlife Witness. Using the app, people can report illegal wild animal trade by uploading photos, videos and locations. This information is compiled to create a map which NGOs and governments can use to more effectively police and prevent these crimes from occurring. Every user can contribute to the protection of wildlife in a simple and quick way.

Economic Sector

Especially in poorer countries, Big Data can have a big impact on the economic situation of its population. The speedy acquisition of socio-economic data via avenues such as social networks can help support an immediate response to problems. Social media analysis allows the UN initiative Global Pulse to make accurate predictions about impending unemployment or changing market prices. The analysis of tweets has helped Global Pulse warn the citizens of Indonesia about impending food price increases. Whether tweets on Twitter belong to the domain of personal data and whether they should be encrypted is difficult to determine – the legal grey zone regarding the intelligent data use is too large.

Social Sector

It is fairly obvious that a quick information exchange leads to more transparency and knowledge, which can be very helpful in the social sector. Political mishaps, problems with support projects and illnesses can be prevented ahead of time.

  • Politics

In politics, a quick analysis and opening up of data can lead to an increase in the efficiency of political measures. Problems within the population can be put directly into connection with the actions of politicians. This means that the effect of new laws can be proven and their use evaluated. Those in charge can then be held responsible. Additionally, the publication of data can lead to an increase of transparency in the political sector and this can help prevent corruption. (Source: Universitas)

The governments of Germany, the USA and the UK have made information accessible to the public in various areas such as healthcare, population, demographics and education their own GovData portals. The insights gained from this are published and maps created. The general population profits from this sharing of information.

Another example is the German portal (MP-watch) – it collects all election results and promises, as well as what the candidates actually vote for. By comparing the two, it becomes apparent how strongly the MPs really represent the causes.

  • Society

Between NGOs, aid organisations, donors and recipients there is often a lack of communication, which leads to projects not being realised despite good intentions. Some initiatives focusing on data agglomeration and transfer are keen to change this situation.

Markets for Good’s aim is to create an information infrastructure within the social sector which gives an overview of all reports, studies and observations in the non-profit area. Additionally, an online overview shows which organisations collect data and are willing to share it, thus creating a network between the organisations, improving communication. If any organisations are struggling with the analysis of their data they can receive help from Data Kind. Data Kind connects statistics experts with NGOs, in order to get the most out of their datasets.

International Aid Transparency – a British initiative from the department of international development – along with the organisations Washfunders have focused on the public disclosure and expenses of aid organisations, in an effort to make development aid more transparent and effective. The goal is for donors and politicians to be more informed about what their money goes towards.

In our article on digital and online activism you can find more examples about how data gathering can be used in crisis areas worldwide.

  • Health

Open Data is also of use to healthcare systems worldwide, as the more people know about an illness, the better they can protect themselves against it. The Malaria Atlas Project has recruited a team of experts to help create malaria maps, which show the distribution and expansion of malaria in various areas. These maps are open source.

Tendai is a project which, through the gathering of data, aims to improve healthcare in eight different developing countries. Health advisors in the different countries learn how to collect the availability and market prices of medication using their smartphone, and how to conduct interviews. This data is then analysed by the headquarters and published. This helps cushion variability of medicine prices.

When looking at all these different projects that use Big Data, the potential of our newest resource seems infinite. When everyone knows everything, everyone is better off – but is there also information which shouldn’t be shared? Personal or otherwise?

Risks and Challenges

Since the NSA scandal in 2013, public discussion about the risks and problems associated with Big Data has intensified. The problem here lies in the question of the use of private data – in most cases, we do not have a say over whether data is collected, by whom, and how they are used and stored. As laws about privacy stem from an era before the internet was invented, the vast majority of countries do not have laws regulating Big Data. This gives cause to view our new resource with ambivalence – especially as once Big Data has started, it cannot be stopped.

As long as the data is treated responsibly it can contribute to the solution of many social and economic challenges, as well as bring transparency into decision-making processes. Projects like Smartcitizen are a positive example of what Big Data could do to help each individual create a more objective view of the world, and to help them improve it. However, when personal data is sold, or used for commercial purposes (such as personalised advertisements) there is no added value for society or the environment. Furthermore, companies are then in possession of our private information.

The collection and analysis of Big Data can be a huge benefit to our society, as long as this process is as transparent as possible and the results available to as many people as possible.

Sources and Links:

The Role of Digital Tools in the World of Online Activism

It's impossible to imagine the world of online activism with the digital tools that underpin them. But their impact is not without downfalls.


Every day, we are watched. From cameras on the street and in shops, to officers of the law, and of course the accumulation of data online.

Using Data and Mapping to Pinpoint Contaminated Sources of Groundwater

The interactive portal Groundwater Assessment Platform plots out sources of groundwater that are contaminated by arsenic or high levels of fluoride.

Satellites: Environmental Protectors and Development Aid in Space?

Initially developed for military purposes, satellites are these days being used more and more frequently in the fields of enviromental protection and humanitarian aid, opening up completely new possibilities in the fight against illegal logging, epidemics, environmental pollution and facilitating in monitoring and protecting endangered species. But what do satellites actually do and how is their data used? Can they really be helpful as environmental and development aids?

Bruno Cordioli
World’s First-Carbon Negative Data Centre

We spend so much time online today, using various forms of cloud services, emailing back and forth and surfing, all of which is powered by servers. The IT and telecom industry is one of the largest energy consumers, leaving behind gigantic carbon footprints. The good news is the world first carbon-negative data centre will soon land in Sweden!

Living Data Competition Brings Agricultural Open Data Sources ‘to Life’

Research findings should not be confined to academic journals and traditional published channels; with the emergence of Open Data (complete sets of information on an issue or activity that are open and free to use, reuse, and redistribute), information sharing can be done through ICT tools such as infographics, photo films and animated graphics to unleash research findings in paper entries in m