10 Things You Will Learn In This Book
1. It may be trendy but its not new
Data journalism has been around as long as theres been data certainly at least since Florence Nightingales famous graphics and report into the conditions faced by British soldiers of 1858. The first ever edition of the Guardians news coverage was dominated by a large (leaked) table listing every school in Manchester, its costs and pupil numbers.
The big difference? Data was published in books, very expensive books where graphics are referred to as figures. Now we have spreadsheets and files formatted for computers. Which means we can make the computers ask the questions.
2. Open data means open data journalism
Now statistics have become democratised: no longer the preserve of the few but of everyone who has a spreadsheet package on their laptop, desktop or even their mobile and tablet. Anyone can take on a fearsome set of data and wrangle it into shape. Of course, they may not be right, but now you can easily find someone to help you. We are not wandering alone any more.
Data journalism is all about diverse sources. At the Guardian, being part of the news process means that were part of the news desk (news organisations are obsessed with internal geography), go to the key news meetings and try to make sure that data is part of editorial debate.
3. Has data journalism become curation?
Sometimes. Theres now so much data out there in the world that we try to provide the key facts for each story and finding the right information can be as much of a lengthy journalistic task as finding the right interviewee for an article. Weve started providing searches into world government data and international development data.
4. Bigger datasets, smaller things
The datasets are getting massive 391,000 records for WikiLeaks Iraq release, millions for the Treasury Coins database. The indices of multiple deprivation, which is how the government measures poverty across England, has 32,482 records. Increasingly government data comes in big packages about tiny things. Making that data more accessible and easier to do stuff with has become part of the data journalism process.
5. Data journalism is 80% perspiration, 10% great idea, 10% output.
It just is. We spend hours making datasets work, reformatting pdfs, mashing datasets together. Mostly, we act as the bridge between the data (and those who are pretty much hopeless at explaining it) and the people out there in the real world who want to understand what that story is really about.
6. Long and short-form
Traditionally, some of the worst data journalism involved spending weeks on a single dataset, noodling around and eventually producing something mildly diverting. Some of the best involves weeks of investigative data management before coming up with incredible scoops. But increasingly theres a new short-form of data journalism, which is about swiftly finding the key data, analysing it and guiding readers through it while the story is still in the news. The trick is to produce these news data analyses, using the tech we have, as quickly as we can. And still get it right.
7. Anyone can do it
Especially with the free tools we use, such as Google Fusion Tables, Many Eyes, Google Charts or Timetric and you can see some of the stuff our users have produced and posted on our Flickr group.
8. but looks can be everything
Good design still really matters. Something like our guide to the senior civil service , or who knows who in the News of the World phone hacking affair or even what happened when work because theyre designed not by machine but by humans who understand the issues involved.
9. You dont have to be a programmer
You can become a top coder if you want. But the bigger task is to think about the data like a journalist, rather than an analyst. Whats interesting about these numbers? Whats new? What would happen if I mashed it up with something else? Answering those questions is more important than anything else.
This stuff works best when its a combination of both. Our guide to Nato operations in Libya is dynamically fed from a spreadsheet, which updates from the Nato daily action briefing. It looks good because its been well-designed; it works because its easy to update every day.
10. Its (still) all about stories
Data journalism is not graphics and visualisations. Its about telling the story in the best way possible. Sometimes that will be a visualisation or a map.But sometimes its a news story. Sometimes, just publishing the number is enough.
If data journalism is about anything, its the flexibility to search for new ways of storytelling. And more and more reporters are realising that. Suddenly we have company and competition. So being a data journalist is no longer unusual.
Its just journalism.
IN PRACTICE: THE SIZE OF A BILLION
How big is a billion? Billions are everywhere. The US has a budget deficit of around $100bn a month; the UKs government spends nearly 700bn a year in budget deficits; the world now has over seven billion people in it. In terms of a lot of the stories we do, a billion is where a number really matters and has an impact.
Everyone knows that, right? Youd be surprised. For a number that is bandied around so readily, very few people really understand what it is.
It doesnt help that using the word billion depends on where you live. The US system, which is used by the government and the Bank of England in the UK, goes like this:
1,000 thousand
1,000,000 million
1,000,000,000 billion
1,000,000,000,000 trillion
1,000,000,000,000,000 quadrillion
So it basically goes up in thousands. A thousand times a thousand is a million, a thousand times a million is a billion and so on.
But if youre reading this from France or Germany, 1,000,000,000 is actually a milliard a number that has not featured in a Guardian news story since 2004, except in the corrections column. The European billion is a million times a million and this used to be called the British system. Confused yet? Theres also the inexorable logic of inflation a trillion is becoming common too. Then get your head around the fact that a US trillion is a European billion. And a European trillion? Thats a quintillion.
Mathematicians will tell you that the European system is more logical, but in a sense that is now academic. The nine-zero billion is in the ascendency.
1. Data Everywhere
Data journalism or computer-assisted reporting? What is it? How do you describe it? Is it even real journalism?
These are just two terms for the latest trend, a field combining spreadsheets, graphics, data analysis and the biggest news stories to dominate reporting in the last two years.
The WikiLeaks releases on Afghanistan, Iraq and the US embassy cables; the UK MPs expenses scandal; the global recession; even the swine flu panic reporting on all of those events was arguably only possible and irrevocably changed by the existence of reporters who are not afraid of maths, know how to use a spreadsheet, work with the latest web visualisation tools and crucially know what questions to ask.
What is data journalism? It reflects the new transparency movement spreading across the globe, from Washington DC to Sydney, via California, London, Paris and Spain.
Its hard to know what came first: the data or the demand for it. Or maybe the two have grown symbiotically. But it seems there was a tipping point where a number of factors combined to form an unstoppable movement. I would argue they were:
- the widespread availability of data via the internet