Valliappa Lakshmanan - Data Science on the Google Cloud Platform
Here you can read online Valliappa Lakshmanan - Data Science on the Google Cloud Platform full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2018, publisher: O’Reilly Media, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:
Romance novel
Science fiction
Adventure
Detective
Science
History
Home and family
Prose
Art
Politics
Computer
Non-fiction
Religion
Business
Children
Humor
Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.
- Book:Data Science on the Google Cloud Platform
- Author:
- Publisher:O’Reilly Media
- Genre:
- Year:2018
- Rating:5 / 5
- Favourites:Add to favourites
- Your mark:
- 100
- 1
- 2
- 3
- 4
- 5
Data Science on the Google Cloud Platform: summary, description and annotation
We offer to read an annotation, description, summary or preface (depends on what the author of the book "Data Science on the Google Cloud Platform" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.
Data Science on the Google Cloud Platform — read online for free the complete book (whole text) full work
Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Data Science on the Google Cloud Platform" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.
Font size:
Interval:
Bookmark:
by Valliappa Lakshmanan
Copyright 2018 Google Inc. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Editor: Tim McGovern
- Production Editor: Kristen Brown
- Copyeditor: Octal Publishing, Inc.
- Proofreader: Rachel Monaghan
- Indexer: Judith McConville
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Rebecca Demarest
- January 2018: First Edition
- 2017-12-12: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781491974568 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Data Science on the Google Cloud Platform, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-491-97456-8
[LSI]
In my current role at Google, I get to work alongside data scientists and data engineers in a variety of industries as they move their data processing and analysis methods to the public cloud. Some try to do the same things they do on-premises, the same way they do them, just on rented computing resources. The visionary users, though, rethink their systems, transform how they work with data, and thereby are able to innovate faster.
As early as 2011, an article in Harvard Business Review recognized that some of cloud computings greatest successes come from allowing groups and communities to work together in ways that were not previously possible. This is now much more widely recognizedan MIT survey in 2017 found that more respondents (45%) cited increased agility rather than cost savings (34%) as the reason to move to the public cloud.
In this book, we walk through an example of this new transformative, more collaborative way of doing data science. You will learn how to implement an end-to-end data pipelinewe will begin with ingesting the data in a serverless way and work our way through data exploration, dashboards, relational databases, and streaming data all the way to training and making operational a machine learning model. I cover all these aspects of data-based services because data engineers will be involved in designing the services, developing the statistical and machine learning models and implementing them in large-scale production and in real time.
If you use computers to work with data, this book is for you. You might go by the title of data analyst, database administrator, data engineer, data scientist, or systems programmer today. Although your role might be narrower today (perhaps you do only data analysis, or only model building, or only DevOps), you want to stretch your wings a bityou want to learn how to create data science models as well as how to implement them at scale in production systems.
Google Cloud Platform is designed to make you forget about infrastructure. The marquee data servicesGoogle BigQuery, Cloud Dataflow, Cloud Pub/Sub, and Cloud ML Engineare all serverless and autoscaling. When you submit a query to BigQuery, it is run on thousands of nodes, and you get your result back; you dont spin up a cluster or install any software. Similarly, in Cloud Dataflow, when you submit a data pipeline, and in Cloud Machine Learning Engine, when you submit a machine learning job, you can process data at scale and train models at scale without worrying about cluster management or failure recovery. Cloud Pub/Sub is a global messaging service that autoscales to the throughput and number of subscribers and publishers without any work on your part. Even when youre running open source software like Apache Spark thats designed to operate on a cluster, Google Cloud Platform makes it easy. Leave your data on Google Cloud Storage, not in HDFS, and spin up a job-specific cluster to run the Spark job. After the job completes, you can safely delete the cluster. Because of this job-specific infrastructure, theres no need to fear overprovisioning hardware or running out of capacity to run a job when you need it. Plus, data is encrypted, both at rest and in transit, and kept secure. As a data scientist, not having to manage infrastructure is incredibly liberating.
The reason that you can afford to forget about virtual machines and clusters when running on Google Cloud Platform comes down to networking. The network bisection bandwidth within a Google Cloud Platform datacenter is 1 PBps, and so sustained reads off Cloud Storage are extremely fast. What this means is that you dont need to shard your data as you would with traditional MapReduce jobs. Instead, Google Cloud Platform can autoscale your compute jobs by shuffling the data onto new compute nodes as needed. Hence, youre liberated from cluster management when doing data science on Google Cloud Platform.
These autoscaled, fully managed services make it easier to implement data science models at scalewhich is why data scientists no longer need to hand off their models to data engineers. Instead, they can write a data science workload, submit it to the cloud, and have that workload executed automatically in an autoscaled manner. At the same time, data science packages are becoming simpler and simpler. So, it has become extremely easy for an engineer to slurp in data and use a canned model to get an initial (and often very good) model up and running. With well-designed packages and easy-to-consume APIs, you dont need to know the esoteric details of data science algorithmsonly what each algorithm does, and how to link algorithms together to solve realistic problems. This convergence between data science and data engineering is why you can stretch your wings beyond your current role.
Rather than simply read this book cover-to-cover, I strongly encourage you to follow along with me by also trying out the code. The file in each folder of the GitHub repository .
The following typographical conventions are used in this book:
ItalicIndicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.
This element signifies a tip or suggestion.
Font size:
Interval:
Bookmark:
Similar books «Data Science on the Google Cloud Platform»
Look at similar books to Data Science on the Google Cloud Platform. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.
Discussion, reviews of the book Data Science on the Google Cloud Platform and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.