How Data Science Is Transforming Health Care
Tim OReilly
Mike Loukides
Julie Steele
Colin Hill
Beijing Cambridge Farnham Kln Sebastopol Tokyo
Special Upgrade Offer
If you purchased this ebook directly from oreilly.com, you have the following benefits:
DRM-free ebooksuse your ebooks across devices without restrictions or limitations
Multiple formatsuse on your laptop, tablet, or phone
Lifetime access, with free updates
Dropbox syncingyour files, anywhere
If you purchased this ebook from another retailer, you can upgrade your ebook to access your ebook upgrade.
Please note that upgrade offers are not available from sample content.
Chapter 1. Introduction
The best minds of my generation are thinking about how to make people click ads.
Jeff Hammerbacherearly Facebook employee
Work on stuff that matters.
Tim OReilly
In the early days of the 20th century, department store magnate John Wanamaker famously said, I know that half of my advertising doesnt work. The problem is that I dont know which half.
The consumer Internet revolution was fueled by a search for the answer to Wanamakers question. Google AdWords and the pay-per-click model began the transformation of a business in which advertisers paid for ad impressions into one in which they pay for results. Cost per thousand impressions (CPM) was outperformed by cost per click (CPC), and a new industry was born. Its important to understand why CPC outperformed CPM, though. Superficially, its because Google was able to track when a user clicked on a link, and was therefore able to bill based on success. But billing based on success doesnt fundamentally change anything unless you can also change the success rate, and thats what Google was able to do. By using data to understand each users behavior, Google was able to place advertisements that an individual was likely to click. They knew which half of their advertising was more likely to be effective, and didnt bother with the rest.
Since then, data and predictive analytics have driven ever deeper insight into user behavior such that companies like Google, Facebook, Twitter, and LinkedIn are fundamentally data companies. And data isnt just transforming the consumer Internet. It is transforming finance, design, and manufacturingand perhaps most importantly, health care.How is data science transforming health care? There are many ways in which health care is changing, and needs to change. Were focusing on one particular issue: the problem Wanamaker described when talking about his advertising. How do you make sure youre spending money effectively? Is it possible to know what will work in advance?
Too often, when doctors order a treatment, whether its surgery or an over-the-counter medication, they are applying a standard of care treatment or some variation that is based on their own intuition, effectively hoping for the best. The sad truth of medicine is that we dont always understand the relationship between treatments and outcomes. We have studies to show that various treatments will work more often than placebos; but, like Wanamaker, we know that much of our medicine doesnt work for half of our patients, we just dont know which half. At least, not in advance. One of data sciences many promises is that, if we can collect enough data about medical treatments and use that data effectively, well be able to predict more accurately which treatments will be effective for which patient, and which treatments wont.
A better understanding of the relationship between treatments, outcomes, and patients will have a huge impact on the practice of medicine in the United States. Health care is expensive. The U.S. spends over $2.6 trillion on health care every year, an amount that constitutes a serious fiscal burden for government, businesses, and our society as a whole. These costs include over $600 billion of unexplained variations in treatments: treatments that cause no differences in outcomes, or even make the patients condition worse. We have reached a point at which our need to understand treatment effectiveness has become vitalto the health care system and to the health and sustainability of the economy overall.
Why do we believe that data science has the potential to revolutionize health care? After all, the medical industry has had data for generations: clinical studies, insurance data, hospital records. But the health care industry is now awash in data in a way that it has never been before: from biological data such as gene expression, next-generation DNA sequence data, proteomics, and metabolomics, to clinical data and health outcomes data contained in ever more prevalent electronic health records (EHRs) and longitudinal drug and medical claims. We have entered a new era in which we can work on massive datasets effectively, combining data from clinical trials and direct observation by practicing physicians (the records generated by our $2.6 trillion of medical expense). When we combine data with the resources needed to work on the data, we can start asking the important questions, the Wanamaker questions, about what treatments work and for whom.
The opportunities are huge: for entrepreneurs and data scientists looking to put their skills to work disrupting a large market, for researchers trying to make sense out of the flood of data they are now generating, and for existing companies (including health insurance companies, biotech, pharmaceutical, and medical device companies, hospitals and other care providers) that are looking to remake their businesses for the coming world of outcome-based payment models.
Chapter 2. Making Health Care More Effective
What, specifically, does data allow us to do that we couldnt do before? For the past 60 or so years of medical history, weve treated patients as some sort of an average. A doctor would diagnose a condition and recommend a treatment based on what worked for most people, as reflected in large clinical studies. Over the years, weve become more sophisticated about what that average patient means, but that same statistical approach didnt allow for differences between patients. A treatment was deemed effective or ineffective, safe or unsafe, based on double-blind studies that rarely took into account the differences between patients. With the data thats now available, we can go much further. The exceptions to this are relatively recent and have been dominated by cancer treatments, the first being Herceptin for breast cancer in women who over-express the Her2 receptor. With the data thats now available, we can go much further for a broad range of diseases and interventions that are not just drugs but include surgery, disease management programs, medical devices, patient adherence, and care delivery.
For a long time, we thought that Tamoxifen was roughly 80% effective for breast cancer patients. But now we know much more: we know that its 100% effective in 70% to 80% of the patients, and ineffective in the rest. Thats not word games, because we can now use genetic markers to tell whether its likely to be effective or ineffective for any given patient, and we can tell in advance whether to treat with Tamoxifen or to try something else.
Two factors lie behind this new approach to medicine: a different way of using data, and the availability of new kinds of data. Its not just stating that the drug is effective on most patients, based on trials (indeed, 80% is an enviable success rate); its using artificial intelligence techniques to divide the patients into groups and then determine the difference between those groups. Were not asking whether the drug is effective; were asking a fundamentally different question: for which patients is this drug effective? Were asking about the patients, not just the treatments. A drug thats only effective on 1% of patients might be very valuable if we can tell who that 1% is, though it would certainly be rejected by any traditional clinical trial.