• Complain

Nathan Marz - Big Data: Principles and best practices of scalable realtime data systems

Here you can read online Nathan Marz - Big Data: Principles and best practices of scalable realtime data systems full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2015, publisher: Manning Publications, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Nathan Marz Big Data: Principles and best practices of scalable realtime data systems
  • Book:
    Big Data: Principles and best practices of scalable realtime data systems
  • Author:
  • Publisher:
    Manning Publications
  • Genre:
  • Year:
    2015
  • Rating:
    5 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 100
    • 1
    • 2
    • 3
    • 4
    • 5

Big Data: Principles and best practices of scalable realtime data systems: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Big Data: Principles and best practices of scalable realtime data systems" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Summary

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once theyre built.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Book

Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive.

Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. Youll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, youll learn specific technologies like Hadoop, Storm, and NoSQL databases.

This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.

Whats Inside

  • Introduction to big data systems
  • Real-time processing of web-scale data
  • Tools like Hadoop, Cassandra, and Storm
  • Extensions to traditional database skills

About the Authors

Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.

Table of Contents

  1. A new paradigm for Big Data
  2. PART 1 BATCH LAYER
  3. Data model for Big Data
  4. Data model for Big Data: Illustration
  5. Data storage on the batch layer
  6. Data storage on the batch layer: Illustration
  7. Batch layer
  8. Batch layer: Illustration
  9. An example batch layer: Architecture and algorithms
  10. An example batch layer: Implementation
  11. PART 2 SERVING LAYER
  12. Serving layer
  13. Serving layer: Illustration
  14. PART 3 SPEED LAYER
  15. Realtime views
  16. Realtime views: Illustration
  17. Queuing and stream processing
  18. Queuing and stream processing: Illustration
  19. Micro-batch stream processing
  20. Micro-batch stream processing: Illustration
  21. Lambda Architecture in depth

Nathan Marz: author's other books


Who wrote Big Data: Principles and best practices of scalable realtime data systems? Find out the surname, the name of the author of the book and a list of all author's works by series.

Big Data: Principles and best practices of scalable realtime data systems — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Big Data: Principles and best practices of scalable realtime data systems" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Big Data: Principles and best practices of scalable realtime data systems
Nathan Marz with James Warren

Big Data Principles and best practices of scalable realtime data systems - image 1

Copyright

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact

Special Sales DepartmentManning Publications Co.20 Baldwin RoadPO Box 761Shelter Island, NY 11964Email: orders@manning.com

2015 by Manning Publications Co. All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

Picture 2 Recognizing the importance of preserving what has been written, it is Mannings policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.

Picture 3Manning Publications Co.20 Baldwin RoadPO Box 761Shelter Island, NY 11964Development editors: Renae Gregoire, Jennifer StoutTechnical development editor: Jerry GainesCopyeditor: Andy CarrollProofreader: Katie TennantTechnical proofreader: Jerry KuchTypesetter: Gordan SalinovicCover designer: Marija Tudor

ISBN 9781617290343

Printed in the United States of America

1 2 3 4 5 6 7 8 9 10 EBM 20 19 18 17 16 15

Brief Table of Contents
Table of Contents
Preface

When I first entered the world of Big Data, it felt like the Wild West of software development. Many were abandoning the relational database and its familiar comforts for NoSQL databases with highly restricted data models designed to scale to thousands of machines. The number of NoSQL databases, many of them with only minor differences between them, became overwhelming. A new project called Hadoop began to make waves, promising the ability to do deep analyses on huge amounts of data. Making sense of how to use these new tools was bewildering.

At the time, I was trying to handle the scaling problems we were faced with at the company at which I worked. The architecture was intimidatingly complexa web of sharded relational databases, queues, workers, masters, and slaves. Corruption had worked its way into the databases, and special code existed in the application to handle the corruption. Slaves were always behind. I decided to explore alternative Big Data technologies to see if there was a better design for our data architecture.

One experience from my early software-engineering career deeply shaped my view of how systems should be architected. A coworker of mine had spent a few weeks collecting data from the internet onto a shared filesystem. He was waiting to collect enough data so that he could perform an analysis on it. One day while doing some routine maintenance, I accidentally deleted all of my coworkers data, setting him behind weeks on his project.

I knew I had made a big mistake, but as a new software engineer I didnt know what the consequences would be. Was I going to get fired for being so careless? I sent out an email to the team apologizing profuselyand to my great surprise, everyone was very sympathetic. Ill never forget when a coworker came to my desk, patted my back, and said Congratulations. Youre now a professional software engineer.

In his joking statement lay a deep unspoken truism in software development: we dont know how to make perfect software. Bugs can and do get deployed to production. If the application can write to the database, a bug can write to the database as well. When I set about redesigning our data architecture, this experience profoundly affected me. I knew our new architecture not only had to be scalable, tolerant to machine failure, and easy to reason aboutbut tolerant of human mistakes as well.

My experience re-architecting that system led me down a path that caused me to question everything I thought was true about databases and data management. I came up with an architecture based on immutable data and batch computation, and I was astonished by how much simpler the new system was compared to one based solely on incremental computation. Everything became easier, including operations, evolving the system to support new features, recovering from human mistakes, and doing performance optimization. The approach was so generic that it seemed like it could be used for any data system.

Something confused me though. When I looked at the rest of the industry, I saw that hardly anyone was using similar techniques. Instead, daunting amounts of complexity were embraced in the use of architectures based on huge clusters of incrementally updated databases. So many of the complexities in those architectures were either completely avoided or greatly softened by the approach I had developed.

Over the next few years, I expanded on the approach and formalized it into what I dubbed the Lambda Architecture. When working on a startup called BackType, our team of five built a social media analytics product that provided a diverse set of realtime analytics on over 100 TB of data. Our small team also managed deployment, operations, and monitoring of the system on a cluster of hundreds of machines. When we showed people our product, they were astonished that we were a team of only five people. They would often ask How can so few people do so much? My answer was simple: Its not what were doing, but what were not doing. By using the Lambda Architecture, we avoided the complexities that plague traditional architectures. By avoiding those complexities, we became dramatically more productive.

The Big Data movement has only magnified the complexities that have existed in data architectures for decades. Any architecture based primarily on large databases that are updated incrementally will suffer from these complexities, causing bugs, burdensome operations, and hampered productivity. Although SQL and NoSQL databases are often painted as opposites or as duals of each other, at a fundamental level they are really the same. They encourage this same architecture with its inevitable complexities. Complexity is a vicious beast, and it will bite you regardless of whether you acknowledge it or not.

This book is the result of my desire to spread the knowledge of the Lambda Architecture and how it avoids the complexities of traditional architectures. It is the book I wish I had when I started working with Big Data. I hope you treat this book as a journeya journey to challenge what you thought you knew about data systems, and to discover that working with Big Data can be elegant, simple, and fun.

N ATHAN M ARZ

Acknowledgments

This book would not have been possible without the help and support of so many individuals around the world. I must start with my parents, who instilled in me from a young age a love of learning and exploring the world around me. They always encouraged me in all my career pursuits.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Big Data: Principles and best practices of scalable realtime data systems»

Look at similar books to Big Data: Principles and best practices of scalable realtime data systems. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Big Data: Principles and best practices of scalable realtime data systems»

Discussion, reviews of the book Big Data: Principles and best practices of scalable realtime data systems and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.