• Complain

Geraldine Van der Auwera - Genomics in the Cloud: Using Docker, GATK, and WDL in Terra

Here you can read online Geraldine Van der Auwera - Genomics in the Cloud: Using Docker, GATK, and WDL in Terra full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2020, publisher: OReilly Media, genre: Romance novel. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Geraldine Van der Auwera Genomics in the Cloud: Using Docker, GATK, and WDL in Terra

Genomics in the Cloud: Using Docker, GATK, and WDL in Terra: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Genomics in the Cloud: Using Docker, GATK, and WDL in Terra" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Data in the genomics field is booming. In just a few years, organizations such as the National Institutes of Health (NIH) will host 50+ petabytes??or over 50 million gigabytes??of genomic data, and they??re turning to cloud infrastructure to make that data available to the research community. How do you adapt analysis tools and protocols to access and analyze that volume of data in the cloud?

With this practical book, researchers will learn how to work with genomics algorithms using open source tools including the Genome Analysis Toolkit (GATK), Docker, WDL, and Terra. Geraldine Van der Auwera, longtime custodian of the GATK user community, and Brian O??Connor of the UC Santa Cruz Genomics Institute, guide you through the process. You??ll learn by working with real data and genomics algorithms from the field.

This book covers:

  • Essential genomics and computing technology background
  • Basic cloud computing operations
  • Getting started with GATK, plus three major GATK Best Practices pipelines
  • Automating analysis with scripted workflows using WDL and Cromwell
  • Scaling up workflow execution in the cloud, including parallelization and cost optimization
  • Interactive analysis in the cloud using Jupyter notebooks
  • Secure collaboration and computational reproducibility using Terra

Geraldine Van der Auwera: author's other books


Who wrote Genomics in the Cloud: Using Docker, GATK, and WDL in Terra? Find out the surname, the name of the author of the book and a list of all author's works by series.

Genomics in the Cloud: Using Docker, GATK, and WDL in Terra — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Genomics in the Cloud: Using Docker, GATK, and WDL in Terra" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Praise for Genomics in the Cloud This book captures the essence of whats been - photo 1
Praise for Genomics in the Cloud

This book captures the essence of whats been learned about bringing genomics to the cloud. And it lays out an accessible path for newcomers to join this exciting and important ecosystem.

Eric S. Lander, Founding Director, The Broad Institute of MIT and Harvard

This book is a fantastic introduction to modern genome analysis using state-of-the-art tools and practices. It covers everything a reader needs to get their own analyses running in an open, repeatable way. This is the quintessential primer on the GATK and cloud-based analysis with Terra.

Jonathan Smith, Principal Software Engineer, The Broad Institute of MIT and Harvard

This is a great primer about reproducible bioinformatics in the cloud. Geraldine and Brianare at the forefront of this field, so we are learning from the best. And for those who have yet towork with Terra, look no further for an excellent introduction to it!

Jessica Maia, Data Scientist, BD

Transferring from physics to cancer research as I did, I learned genomics, sequencing, statistics piecemeal. I could have used a book like this back then, because no matter how much time youve spent in the field or if its your first contact, theres something new to learn and an appreciation for the bigger picture to be gained.

Aaron Chevalier, PhD Candidate, Boston University

Genomics in the Cloud covers everything from the science of genomic analysis to the computing technologies used to process this data at massive scale; presented in a way that lets you jump right in and run the same tools in the cloud that are used by biologists, researchers, and clinicians worldwide.

Andrew Moschetti, Senior Solutions Architect, Google Cloud Life Sciences

As the volume of genomic data increases, implementing analysis using best practice cloud patterns becomes more and more important. In this book, youll learn these patterns via practical examples that you can try out using your own data and research questions.

Lynn Langit, Cloud Architect, Google Developer Expert and AWS Community Hero

Genomics in the Cloud is an excellent introduction both to genomics and cloud-based research, perfect for those who wish to capitalize on the cloud environment to move their research forward and for those who wish to better understand this space.

David E. Mohs, Software Engineer, The Broad Institute of MIT and Harvard

Genomics in the Cloud

by Geraldine A. Van der Auwera and Brian D. OConnor

Copyright 2020 The Broad Institute, Inc. and Brian OConnor All rights reserved.

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

  • Acquisitions Editor: Rachel Novak
  • Development Editor: Michele Cronin
  • Production Editor: Katherine Tozer
  • Copyeditor: Octal Publishing, LLC
  • Proofreader: Sharon Wilkey
  • Indexer: Ellen Troutman-Zaig
  • Interior Designer: David Futato
  • Cover Designer: Karen Montgomery
  • Illustrator: Rebecca Demarest
  • April 2020: First Edition
Revision History for the First Edition
  • 2020-04-02: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781491975190 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. Genomics in the Cloud, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-97519-0

[LSI]

Foreword

I migrated from mathematics into the field of genomics in 1985roughly a year before the field officially came into existence. The word genomics was coined in 1986, which also saw the first public debate, at the Cold Spring Harbor Laboratory, about the notion of mounting a Human Genome Project.

Its hard to imagine how much has changed since then. Computers hardly figured in biomedicinethe initial design for the Whitehead Institute for Biomedical Research, founded in the early 1980s, included no provision for a computer. Large amounts of data were seen as a nuisance, not an assetin a Nature article reporting on the Human Genome Project debate, the journals biology editor wrote, If the skill and ingenuity of modern biology are already stretched to interpret sequences of known importance, such as those of the DMD and CGD genes, what possible use could be made of more sequences?

Despite such doubts, biologists eventually decided to press onlaunching the Human Genome Project, their first major data gathering effort, in 1990. One of the important motivations was the prospect of deploying systematic methodsrather than guessworkto discover the genes responsible for human diseases. In 1980, a brilliant biologist, David Botstein, had conceived how to find the location of genes for rare monogenic diseases by tracing their inheritance in families relative to a genetic map of DNA variants across the human genome. Realizing the full power of the idea, though, would require mappingand eventually sequencingthe entire human genome.

The Human Genome Project was an extraordinary collaboration that spanned six countries and twenty institutions, took thirteen years, and cost $3 billion. When the dust settled, the world had the three billion nucleotide-long DNA sequence of a single human genome.

With this project completed, many biologists thought that business would return to usual. But what happened next was even more remarkable. Over the next 15 years, biology became an information sciencein which the generation of massive amounts of data reshaped the field. For example:

  • Genetic mapping in families revealed the genes responsible for more than 5,000 serious rare monogenic disorders.

  • New kinds of genetic mapping in populations led to the discovery of ~100,000 robust associations of specific genetic regions with common diseases and traits.

  • Genetic analysis of thousands of tumors uncovered hundreds of new genes in which mutations propelled cancer.

Remarkably, the cost of sequencing a human genome fell by a factor of five millionfrom $3 billion to $600and the cost is likely to reach $100 in the coming years. More than one million genomes have been sequenced so far. Overall, genomic data of all kinds is doubling roughly every eight months.

None of this would have been possible without the development of powerful new computational methods and tools to work with the many new types of data that were being generated. A good example is the Genome Analysis Toolkit, developed by colleagues at the Broad Institute, which youll read a lot more about in this book.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Genomics in the Cloud: Using Docker, GATK, and WDL in Terra»

Look at similar books to Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Genomics in the Cloud: Using Docker, GATK, and WDL in Terra»

Discussion, reviews of the book Genomics in the Cloud: Using Docker, GATK, and WDL in Terra and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.