The big deal about big data
Its hard to avoid big data. The words are thrown at us in news reports and from documentaries all the time. But weve lived in an information age for decades. What has changed?
Take a look at a success story of the big data age: Netflix. Once a DVD rental service, the company has transformed itself as a result of big data and the change is far more than simply moving from DVDs to the internet. Providing an on-demand video service inevitably involves handling large amounts of data. But so did renting DVDs. All a DVD does is store gigabytes of data on an optical disc. In either case were dealing with data processing on a large scale. But big data means far more than this. Its about making use of the whole spectrum of data that is available to transform a service or organisation.
Netflix demonstrates how an on-demand video company can put big data at its heart. Services like Netflix involve more two-way communication than a conventional broadcast. The company knows who is watching what, when and where. Its systems can cross-index measures of a viewers interests, along with their feedback. We as viewers see the outcome of this analysis in the recommendations Netflix makes, and sometimes they seem odd, because the system is attempting to predict the likes and dislikes of a single individual. But from the Netflix viewpoint, there is a much greater and more effective benefit in matching preferences across large populations: it can transform the process by which new series are commissioned.
Take, for instance, the first Netflix commission to break through as a major series: House of Cards. Had this been a project for a conventional network, the broadcaster would have produced a pilot, tried it out on various audiences, perhaps risked funding a short season (which could be cancelled part way through) and only then committed to the series wholeheartedly. Netflix short-circuited this process thanks to big data.
The producers behind the series, Mordecai Wiczyk and Asif Satchu, had toured the US networks in 2011, trying to get funding to produce a pilot. However, there hadnt been a successful political drama since The West Wing finished in 2006 and the people controlling the money felt that House of Cards was too high risk. However, Netflix knew from their mass of customer data that they had a large customer base who appreciated the humour and darkness of the original BBC drama the show was based on, which was already in the Netflix library. Equally, Netflix had a lot of customers who liked the work of director David Fincher and actor Kevin Spacey, who became central to the making of the series.
Rather than commission a pilot, with strong evidence that they had a ready audience, Netflix put $100 million up front for the first two series, totalling 26 episodes. This meant that the makers of House of Cards could confidently paint on a much larger canvas and give the series far more depth than it might otherwise have had. And the outcome was a huge success. Not every Netflix drama can be as successful as House of Cards. But many have paid off, and even when the takeup is slower, as with the 2016 Netflix drama The Crown, given a similar high-cost two-season start, shows have far longer to succeed than when conventionally broadcast. The model has already delivered several major triumphs, with decisions driven by big data rather than the gut feel of industry executives, infamous for getting it wrong far more frequently than they get it right.
The ability to understand the potential audience for a new series was not the only way that big data helped make House of Cards a success. Clever use of data meant, for instance, that different trailers for the series could be made available to different segments of the Netflix audience. And crucially, rather than release the series episode by episode, a week at a time as a conventional network would, Netflix made the whole season available at once. With no advertising to require an audience to be spread across time, Netflix could put viewing control in the hands of the audience. This has since become the most common release strategy for streaming series, and its a model that is only possible because of the big data approach.
Big data is not all about business, though. Among other things, it has the potential to transform policing by predicting likely crime locations; to animate a still photograph; to provide the first ever vehicle for genuine democracy; to predict the next New York Times bestseller; to give us an understanding of the fundamental structure of nature; and to revolutionise medicine.
Less attractively, it means that corporations and governments have the potential to know far more about you, whether to sell to you or to attempt to control you. Dont doubt it big data is here to stay, making it essential to understand both the benefits and the risks.
The key
Just as happened with Netflixs analysis of the potential House of Cards audience, the power of big data derives from collecting vast quantities of information and analysing it in ways that humans could never achieve without computers in an attempt to perform the apparently impossible.
Data has been with us a long time. We are going to reach back 6,000 years to the beginnings of agricultural societies to see the concept of data being introduced. Over time, through accounting and the written word, data became the backbone of civilisation. We will see how data evolved in the seventeenth and eighteenth centuries to be a tool to attempt to open a window on the future. But the attempt was always restricted by the narrow scope of the data available and by the limitations of our ability to analyse it. Now, for the first time, big data is opening up a new world. Sometimes its in a flashy way with computers like Amazons Echo that we interact with using only speech. Sometimes its under the surface, as happened with supermarket loyalty cards. Whats clear is that the applications of big data are multiplying rapidly and possess huge potential to impact us for better or worse.
How can there be so much latent power in something so basic as data? To answer that we need to get a better feel for what big data really is and how it can be used. Lets start with that d word.
Data is
According to the dictionary, data derives from the plural of the Latin datum, meaning the thing thats given. Most scientists pretend that we speak Latin, and tell us that data should be a plural, saying the data are convincing rather than the data is convincing. However, the usually conservative