ISBN 9781547417957
e-ISBN (PDF) 9781547401567
e-ISBN (EPUB) 9781547401581
Bibliographic information published by the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de.
2020 Peter Ghavami, published by Walter de Gruyter Inc., Boston/Berlin
To my beautiful wife Massi,
whose unwavering love and support make these accomplishments possible and worth pursuing.
Acknowledgments
This book was only possible as a result of my collaboration with many world renowned data scientists, researchers, CIOs and leading technology innovators who have taught me a tremendous deal about scientific research, innovation and more importantly about the value of collaboration. To all of them I owe a huge debt of gratitude.
Peter Ghavami
March 2019
About the Author
Peter Ghavami, Ph.D., is a world renowned consultant and best-selling author of several IT books. He has been consultant and advisor to many Fortune 500 companies around the world on IT strategy, big data analytics, innovation and new technology development. His book on clinical data analytics titled Clinical Intelligence has been a best-seller among data analytics books.
His career started as a software engineer, with progressive responsibilities to technology leadership roles such as: director of engineering, chief scientist, VP of engineering and product management at various high technology firms. He has held leadership roles in data analytics including, Group Vice President of data analytics at Gartner and VP of Informatics.
His first book titled Lean, Agile and Six Sigma IT Management is still widely used by IT professionals and universities around the world. His books have been selected as text books by several universities. Dr. Ghavami has over 25 years of experience in technology development, IT leadership, data analytics, supercomputing, software engineering and innovation.
Peter K. Ghavami received his BA from Oregon State University in Mathematics with emphasis in Computer Science. He received his M.S. in Engineering Management from Portland State University. He completed his Ph.D. in industrial and systems engineering at the University of Washington, specializing in prognostics, the application of analytics to predict failures in systems.
Dr. Ghavami has been on the advisory board of several analytics companies and is often invited as a lecturer and speaker on this topic. He is a member of IEEE Reliability Society, IEEE Life Sciences Initiative and HIMSS. He can be reached at .
Introduction
Data is the fingerprint of creation. And Analytics is the new Queen of Sciences. There is hardly any human activity, business decision, strategy or physical entity that does not either produce data or involve data analytics to inform it. Data analytics has become core to our endeavors from business to medicine, research, management, product development, to all facets of life.
From a business perspective, data is now viewed as the new gold. And data analytics, the machinery that mines, molds and mints it. Data analytics is a set of computer-enabled analytics methods, processes and discipline of extracting and transforming raw data into meaningful insight, new discovery and knowledge that helps make more effective decisions. Another definition describes it as the discipline of extracting and analyzing data to deliver new insight about the past performance, current operations and prediction of future events.
Data analytics is gaining significant prominence not just for improving business outcomes or operational processes; it certainly is the new tool to improve quality, reduce costs and improve customer satisfaction. But, its fast becoming a necessity for operational, administrative and even legal reasons.
We can trace the first use of data analytics to the early 1850s, to a celebrated English social reformer, statistician and founder of modern nursing, Florence Nightingale. She has gained prominence for her bravery and caring during the Crimean War, tending to wounded soldiers. But her contributions to statistics and use of statistics to improve healthcare were just as impressive. She was the first to use statistical methods and reasoning to prove better hygiene reduces wound infections and consequently soldier fatalities.
At some point during the Crimean War, her advocacy for better hygiene reduced the number of fatalities due to infections by 10X. She was a prodigy who helped popularize graphical representation of statistical data and is attributed to have invented a form of pie-chart that we now call polar area diagram. She is attributed with saying: To understand Gods thoughts we must study statistics, for these are the measure of his purpose. Florence Nightingale is arguably the first data scientist in history.
Data analytics has come a long way since then and is now gaining popularity thanks to eruption of five new technologies called SMAC: social media, mobility, analytics, and cloud computing. You might add another to the acronym for sensors, and the internet of things (IoT). Each of these technologies is significant in how they transform the business and the amount of data that they generate.
Portrait of Florence Nightingale, the First Data Scientist.
In 2001, META (now Gartner) reported a substantial increase in the size of data, the increasing rate at which data is produced and wide range of formats. They termed this shift big data. Big data is known by its three key attributes, known as the three Vs: volume, velocity, and variety. Though, four more Vs are often added to the list: veracity, variability, value and visualization.
The world storage volume is increasing at a rapid pace, estimated to double every year. The velocity at which this data is generated is rising, fueled by the advent of mobile devices and social networking. In medicine and healthcare, the cost and size of sensors has shrunk, making continuous patient monitoring and data acquisition from a multitude of human physiological systems an accepted practice. The internet of things (IoT) will use smart devices that interact with each other generating the vast majority of data, known as machine data, in the near future.
Currently 90% of big data is known to have accumulated in the last two years. Pundits estimate that by 2020, we will have 50 times the amount of data we had in 2011. Its expected that self-driving cars will generate 2 Petabytes of data every year. Cisco predicts that by 2022 the mobile data traffic will reach 1 zettabyte.
With the advent of smaller, inexpensive sensors and volume of data collected from customers, smart devices and applications, were challenged with making increasingly analytical decisions from a large set of data that are being collected in the moment. This trend is only increasing giving rise to whats known in the industry as the big data problem: The rate of data accumulation is rising faster than our cognitive capacity to analyze increasingly large data sets to make decisions. The big data problem offers an opportunity for improved predictive analytics and prognostics.
The variety of data is also increasing. The adoption of digital transformations across all industries and businesses is generating large volume and diverse data sets. Consider the medical data that was confined to paper for too long. As governments such as the United States push medical institutions to transform their practice into electronic and digital format, patient data can take diverse forms. Its now common to think of electronic medical record (EMR) to include diverse forms of data such as audio recordings, MRI, ultrasound, computed tomography (CT) and other diagnostic images, videos captured during surgery or directly from patients, color images of burns and wounds, digital images of dental x-rays, waveforms of brain scans, electro cardiogram (EKG), genetic sequence information and the list goes on.