• Complain

Jan Kunigk - Architecting Modern Data Platforms

Here you can read online Jan Kunigk - Architecting Modern Data Platforms full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2018, publisher: OReilly Media, Inc., genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Jan Kunigk Architecting Modern Data Platforms

Architecting Modern Data Platforms: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Architecting Modern Data Platforms" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Jan Kunigk: author's other books


Who wrote Architecting Modern Data Platforms? Find out the surname, the name of the author of the book and a list of all author's works by series.

Architecting Modern Data Platforms — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Architecting Modern Data Platforms" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Architecting Modern Data Platforms

by Jan Kunigk , Ian Buss , Paul Wilkinson , and Lars George

Copyright 2019 Jan Kunigk, Lars George, Ian Buss, and Paul Wilkinson. All rights reserved.

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

  • Editors: Nicole Tache and Michele Cronin
  • Production Editors: Nicholas Adams and
    Kristen Brown
  • Copyeditor: Shannon Wright
  • Proofreader: Rachel Head
  • Indexer: Ellen Troutman-Zaig
  • Interior Designer: David Futato
  • Cover Designer: Karen Montgomery
  • Illustrator: Rebecca Demarest
  • December 2018: First Edition
Revision History for the First Edition
  • 2018-12-05: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781491969274 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. Architecting Modern Data Platforms, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-96927-4

[LSI]

Foreword

Many of the ideas that underpin the Apache Hadoop project are decades old. Academia and industry have been exploring distributed storage and computation since the 1960s. The entire tech industry grew out of government and business demand for data processing, and at every step along that path, the data seemed big to the people in the moment. Even some of the most advanced and interesting applications go way back: machine learning, a capability thats new to many enterprises, traces its origins to academic research in the 1950s and to practical systems work in the 1960s and 1970s.

But real, practical, useful, massively scalable, and reliable systems simply could not be foundat least not cheaplyuntil Google confronted the problem of the internet in the late 1990s and early 2000s. Collecting, indexing, and analyzing the entire web was impossible, using commercially available technology of the time.

Google dusted off the decades of research in large-scale systems. Its architects realized that, for the first time ever, the computers and networking they required could be had, at reasonable cost.

Its workon the Google File System (GFS) for storage and on the MapReduce framework for computationcreated the big data industry.

This work led to the creation of the open source Hadoop project in 2005 by Mike Cafarella and Doug Cutting. The fact that the software was easy to get, and could be improved and extended by a global developer community, made it attractive to a wide audience. At first, other consumer internet companies used the software to follow Googles lead. Quickly, though, traditional enterprises noticed that something was happening and looked for ways to get involved.

In the decade-plus since the Hadoop project began, the ecosystem has exploded. Once, the only storage system was the Hadoop Distributed File System (HDFS), based on GFS. Today, HDFS is thriving, but there are plenty of other choices: Amazon S3 or Microsoft Azure Data Lake Store (ADLS) for cloud storage, for example, or Apache Kudu for IoT and analytic data. Similarly, MapReduce was originally the only option for analyzing data. Now, users can choose among MapReduce, Apache Spark for stream processing and machine learning workloads, SQL engines like Apache Impala and Apache Hive, and more.

All of these new projects have adopted the fundamental architecture of Hadoop: large-scale, distributed, shared-nothing systems, connected by a good network, working together to solve the same problem. Hadoop is the open source progenitor, but the big data ecosystem built on it is vastly more powerfuland more usefulthan the original Hadoop project.

That explosion of innovation means big data is more valuable than ever before. Enterprises are eager to adopt the technology. They want to predict customer behavior, foresee failure of machines on their factory floors or trucks in their fleets, spot fraud in their transaction flows, and deliver targeted careand better outcomesto patients in hospitals.

But that innovation, so valuable, also confounds them. How can they keep up with the pace of improvement, and the flurry of new projects, in the open source ecosystem? How can they deploy and operate these systems in their own datacenters, meeting the reliability and stability expectations of users and the requirements of the business? How can they secure their data and enforce the policies that protect private information from cyberattacks?

Mastering the platform in an enterprise context raises new challenges that run deep in the data. We have been able to store and search a months worth of data, or a quarters, for a very long time. Now, we can store and search a decades worth, or a centurys. That large quantitative difference turns into a qualitative difference: what new applications can we build when we can think about a century?

The book before you is your guide to answering those questions as you build your enterprise big data platform.

Jan, Ian, Lars, and Paulthis books authorsare hands-on practitioners in the field, with many years of experience helping enterprises get real value from big data. They are not only users of Hadoop, Impala, Hive, and Spark, but they are also active participants in the open source community, helping to shape those projects, and their capabilities, for enterprise adoption. They are experts in the analytic, data processing, and machine learning capabilities that the ecosystem offers.

When technology moves quickly, its important to focus on techniques and ideas that stand the test of time. The advice here works for the softwareHadoop and its many associated servicesthat exists today. The thinking and design, though, are tied not to specific projects but to the fundamental architecture that made Hadoop successful: large-scale, distributed, shared-nothing software requires a new approach to operations, to security, and to governance.

You will learn those techniques and those ideas here.

Mike Olson

Founder and Chief Strategy Officer at Cloudera

Preface

If youre reading this book, it will come as no surprise that we are in the middle of a revolution in the way data is stored and processed in the enterprise. As anyone who has been in IT for any length of time knows, the technologies and approaches behind data processing and storage are always evolving. However, in the past 10 to 15 years, the pace of change has been remarkable. We have moved from a world where almost all enterprise data was processed and analyzed using variants of SQL and was contained in some form of relational database to one in which an enterprises data may be found in a variety of so-called NoSQL storage engines. Each of these engines sacrifices some constraint of the relational model to achieve superior performance and scalability for a certain use case. The modern data landscape includes nonrelational key-value stores, distributed filesystems, distributed columnar databases, log stores, and document stores, in addition to traditional relational databases. The data in these systems is exploited in a multitude of ways and is processed using distributed batch-processing algorithms, stream processing, massively parallel processing query engines, free-text searches, and machine learning pipelines.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Architecting Modern Data Platforms»

Look at similar books to Architecting Modern Data Platforms. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Architecting Modern Data Platforms»

Discussion, reviews of the book Architecting Modern Data Platforms and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.