All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other non-commercial uses permitted by copyright law.
THE COLLECTION OF DATA
With its aptness to describe the past and predict our future, data has fast become a currency to store, trade, and profit from, and yet its application exceeds even that. Our universe is increasingly documented by large-scale empirical observation and data is the chosen format to chronicle the constant flux of information.
The upsurge of data-related technology and mass digitalization means that most data circulates as electronically stored information. But it would be wrong to view data collection as a new phenomenon or a recent uptick in the new Information Age. While the equipment installed to store, manage, and mine data consists of cutting-edge advancements in computer processing, todays servers and smart devices are merely the latest apparatus in a long line of evolutionary development.
In the Upper Palaeolithic Period (25,000 to 30,000 years ago), c utting-edge was a genuine description of data collection methods as tally marks were etched into animal bones as a way to store information. Later civilizations expanded record-holding, including the invention of symbolic writing systems by the Ancient Egyptians and the Sumerians of Ancient Mesopotamia.
The peoples of Mesopotamia, in the region of modern-day Iraq, continued to make consistent headway, first in 3500 BC with large-scale surveys of their lands and people, and again in 1700 BC with the creation of a counting device composed of stones on a wooden board. Archaeologists have even uncovered clay tablets that show the marks of cryptography, further emphasizing the regions superior grasp of information processing in the ancient world. Similar to how companies encrypt their sensitive data today, artisans in Mesopotamia used cryptography to protect their secret recipe for pottery glaze.
Advancements in record-keeping, surveys, and coding systems came in the centuries that followed but it would be some 2,000 years before a precise term for observing and recording information was officially devised. Historians say the word data entered the vernacular during the high watermark of the Scientific Revolution in the decade of 1640 AD.
In the same decade, Blaise Pascal invented the automatic calculating device using an arrangement of cogs that could quickly add and subtract large numbers. The Frenchmans calculating device proved an important innovation in automation and built on a number of breakthroughs before him, including the Indo-Arabic numeral system, negative numbers, and the place value system.
The place value system was devised in Ancient Egypt in 3500 BC as a method for representing larger units and simplifying arithmetic. Instead of using 151 individual symbols or marks, the value of 151, for example, could be expressed using seven symbols (1 hundred, 5 tens, and 1 unit). More than 3,000 years later, in 200 BC, commercial transactions in China began processing negative numbers using red and black rods representing payments and debts respectively. Numerals for their part originated in India in 100 BC and were promptly adopted by scholars in the Arab world. Europes adoption of Indo-Arabic numbers came in the Middle Agesjust in time for Blaise Pascal and the Scientific Revolution.
Starting with the Copernican Revolution in 1543 and culminating in 1687 with Isaac Newton's "grand synthesis", the Scientific Revolution helped to lay the conceptual, methodological, and institutional foundations of modern science. The era is notable for an acceleration of scientific discoveries that began with Nicholas Copernicus dismissing Earth as the stationary center of the universe. This was the first breakthrough in a long line of scientific discoveries and inventions from this period.
By the time Newton published Laws of Motion (considered one of the most important works in the history of science), Europe was tinkering with electricity, the telescope, the microscope, calculus and logarithms, air pressure, and Pascals mechanical calculator. Against this energetic backdrop of discovery, the word data started appearing in books and intellectual conversation. However, it wasnt new in the same sense of other intellectual discoveries from this period. Information stored in a format suitable for processing and analysis existed long before Sir Francis Bacon championed the Scientific Revolution. Even the name itself was an extraction of an existing Latin word, datum, or that is given. But what the new moniker lacked in originality, it made up in enthusiasm, for data served as a vital tool and a zeitgeist of the Scientific Revolution.
There were far-reaching changes concerning how the physical world was studied, analyzed and represented during the Scientific Revolution writes historian John Henry . According to Henry, detailed observations and investigatory experiments came to replace reliance on ancient authority as the supreme source of knowledge for interpreting the natural world. New theories from eminent figures of this period, including Copernicus, Galilei, Pascal, and Newton, were formed and backed by data.
Lawrence Principle in The Scientific Revolution: A Very Short Introduction calls the era a busy laboratory of experimentation in all areas of thought and practice.
Evidence of the new emphasis on data for interpreting outside phenomena can be found in John Graunts 1662 study on Natural and Political Observations Made upon the Bills of Mortality . In response to a public health crisis in Europe , Graunt developed the first life table that surmised the probability of survival for a range of age groups. By analyzing the weekly bills of mortality (deaths), Graunt attempted to create a warning system to offset the spread of an epidemic plague in London . While the system was never implemented , Graunts experiment in data processing divulged several interesting findings and served as a useful estimation of Londons unstable population.
Mechanical inventions over the next two centuries carried the arrival of automated machines for data processing and basic statistical analysis, which gradually supplanted the heavy reliance on manual data collection. Following the acceleration of new technology in the 20 th Century, the word data became increasingly associated with computers, and in 1946 its definition expanded to include "transmittable and storable computer information. Data processing appeared in the 1950s, and emanating from prolific developments in database storage, big data joined the lexicon in the 1990s.
While data has expanded beyond scientific circles to a daily ritual for the rank-and-file knowledge worker, true understandingnoted in this book as data literacy lies in knowing whats behind the data. Everything, from the datas source to the choice of input variables and visual representation impact the accuracy, relevance, and value of the data and mark its journey from raw data to business insight. Equivalent to two identical blocks of stone, one deposited to a workshop in Florence and the other to a stonemason in Istanbul, raw data can produce different perspectives. After chiseling at the data, data scientists at separate workstations can arrive at two distinct viewpoints regarding the meaning and value of the same original data.