Saturday, 14 September 2013

BIG data: its gains, losses and absurdities

In my first post ever on this blog I talked about the something called the 'knowledge economy'. This is one of the most common memes of our time (if you have no idea what a 'meme' is, click here). Like other things in this world, memes tend to cluster around other memes and one of them as far as 'knowledge economy' goes tends to be something called 'big data'.

Big data is just what it is; enormous amounts of raw, unprocessed information that cannot be sorted or dealt with using traditional, hands on means. No human can look at big data and derive any real meaning from it. Big data is usually handled with the help of computers. Examples of big data being used today abound in the world (meteorological data being used to forecast the weather; the richer the data set, the more accurate the forecast) and the problem has also given birth to many companies, the most commonly cited being Google (whose very name is derived the term 'googolplex' which is 10^10^100). The company's computers crawl through the web, indexing trillions of pages to make them searchable for all of us. Despite that, the internet remains far from being entirely indexed for 2 reasons; primo its growing everyday and secundo its so damn big. As a side note, check out this nifty website to see how big the internet probably is (I say probably because the map is obviously incomplete).

Big data exists the moment something exists in this universe. The universe and all its contents has existed for 13.7 billion years. The trick is capturing that data, storing and analyzing it for useful patterns. The tools for performing these actions have only arrived in the last two centuries in the form of cheap, compact storage devices, powerful processors, networking technologies and, most importantly perhaps, necessity.

To illustrate the importance of necessity's role in spurring the development of tools for handling big data, take a look at this TEDEd video:

Since the time when IT first became a ubiquitous part of our lives, big data has become relatively easy to collect. Mapping has enjoyed the fruits of big data; the maps of today are no longer physical pieces of paper that my parents and I used extensively on our excursions around the city of London in 2002. Nowadays people have access to compact, electronic maps on their smartphones which are rich in social data, never become outdated and can even feature live updates on such important things like traffic information (the company Waze, which was purchased by Google in June 2013. allows you to access social info on traffic with GPS enabled phones).

However, what good technology has given us can also be used against us. Politics aside, big data technology has also allowed security agencies to look into our activities with unprecedented impunity. Privacy policies and laws may forever be left in the dust as the pace of technological development outstrips the ability of lawmakers to protect our digital privacy online (if you believe in such a thing anyway). Though the aim is purely (I think) to catch the bad guy, I guess we are in real danger of being wrongly accused by overzealous/over-legislated security agencies or caught by nervous, tyrannical regimes seeking to protect their illegitimate hold on power. All thanks to big data and the technologies it has spawned.

But the ultimate problem of big data (especially of the social media variety) is its terrible need to be verified or curated (and that's why crowd-sourced projects like Wikipedia still needs editors or there would be chaos) to ensure we don't end up turning noise into conclusions. Taking unverified data as real information is the biggest absurdity of the internet today. Perhaps that is the reason why teachers hate students referring to Wikipedia for information. Though I disagree with not using Wikipedia completely, CITING it in presentations and essays is another thing entirely. Its just not there yet. Give the technology time to mature and maybe...

Some also say that social media (think Twitter and Facebook which also deal with big data produced by millions of narcissistic humans) could be the new source for ALL news. While that is true if you have friends who love tweeting the news or giving accurate tweets (you could end up being given a skewed picture of the news), you really should be able to verify that raw piece of data with multiple sources. Big data is nothing if there is no useful way of deriving valuable and realistic knowledge from it, that is if there is any useful thing to be derived from it at all, which is another problem entirely.

The one thing that I find most amazing about big data is how much we humans have produced in our existence; more than 90% of it has been produced only in the last several years. By the time my grand children are born, we're going to have a real problem of storing that data let alone finding ways of scrutinizing it for useful patterns. But as usual, necessity will always mother an invention for the job.