• Complain

Narkhede Neha - Kafka: Real-Time Data and Stream Processing at Scale

Here you can read online Narkhede Neha - Kafka: Real-Time Data and Stream Processing at Scale full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Sebastopol, year: 2017, publisher: OReilly Media, Incorporated, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Narkhede Neha Kafka: Real-Time Data and Stream Processing at Scale

Kafka: Real-Time Data and Stream Processing at Scale: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Kafka: Real-Time Data and Stream Processing at Scale" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Table of Contents; Foreword; Preface; Who Should Read This Book; Conventions Used in This Book; Using Code Examples; OReilly Safari; How to Contact Us; Acknowledgments; Chapter 1. Meet Kafka; Publish/Subscribe Messaging; How It Starts; Individual Queue Systems; Enter Kafka; Messages and Batches; Schemas; Topics and Partitions; Producers and Consumers; Brokers and Clusters; Multiple Clusters; Why Kafka?; Multiple Producers; Multiple Consumers; Disk-Based Retention; Scalable; High Performance; The Data Ecosystem; Use Cases; Kafkas Origin; LinkedIns Problem; The Birth of Kafka; Open Source.

Narkhede Neha: author's other books


Who wrote Kafka: Real-Time Data and Stream Processing at Scale? Find out the surname, the name of the author of the book and a list of all author's works by series.

Kafka: Real-Time Data and Stream Processing at Scale — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Kafka: Real-Time Data and Stream Processing at Scale" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Kafka: The Definitive Guide

by Neha Narkhede , Gwen Shapira , and Todd Palino

Copyright 2017 Neha Narkhede, Gwen Shapira, Todd Palino. All rights reserved.

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

  • Editor: Shannon Cutt
  • Production Editor: Shiny Kalapurakkel
  • Copyeditor: Christina Edwards
  • Proofreader: Amanda Kersey
  • Indexer: WordCo Indexing Services, Inc.
  • Interior Designer: David Futato
  • Cover Designer: Karen Montgomery
  • Illustrator: Rebecca Demarest
  • July 2017: First Edition
Revision History for the First Edition
  • 2017-07-07: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781491936160 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. Kafka: The Definitive Guide, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-93616-0

[M]

Foreword

Its an exciting time for Apache Kafka. Kafka is being used by tens of thousands of organizations, including over a third of the Fortune 500 companies. Its among the fastest growing open source projects and has spawned an immense ecosystem around it. Its at the heart of a movement towards managing and processing streams of data.

So where did Kafka come from? Why did we build it? And what exactly is it?

Kafka got its start as an internal infrastructure system we built at LinkedIn. Our observation was really simple: there were lots of databases and other systems built to store data, but what was missing in our architecture was something that would help us to handle the continuous flow of data. Prior to building Kafka, we experimented with all kinds of off the shelf options; from messaging systems to log aggregation and ETL tools, but none of them gave us what we wanted.

We eventually decided to build something from scratch. Our idea was that instead of focusing on holding piles of data like our relational databases, key-value stores, search indexes, or caches, we would focus on treating data as a continually evolving and ever growing stream, and build a data systemand indeed a data architectureoriented around that idea.

This idea turned out to be even more broadly applicable than we expected. Though Kafka got its start powering real-time applications and data flow behind the scenes of a social network, you can now see it at the heart of next-generation architectures in every industry imaginable. Big retailers are re-working their fundamental business processes around continuous data streams; car companies are collecting and processing real-time data streams from internet-connected cars; and banks are rethinking their fundamental processes and systems around Kafka as well.

So what is this Kafka thing all about? How does it compare to the systems you already know and use?

Weve come to think of Kafka as a streaming platform: a system that lets you publish and subscribe to streams of data, store them, and process them, and that is exactly what Apache Kafka is built to be. Getting used to this way of thinking about data might be a little different than what youre used to, but it turns out to be an incredibly powerful abstraction for building applications and architectures. Kafka is often compared to a couple of existing technology categories: enterprise messaging systems, big data systems like Hadoop, and data integration or ETL tools. Each of these comparisons has some validity but also falls a little short.

Kafka is like a messaging system in that it lets you publish and subscribe to streams of messages. In this way, it is similar to products like ActiveMQ, RabbitMQ, IBMs MQSeries, and other products. But even with these similarities, Kafka has a number of core differences from traditional messaging systems that make it another kind of animal entirely. Here are the big three differences: first, it works as a modern distributed system that runs as a cluster and can scale to handle all the applications in even the most massive of companies. Rather than running dozens of individual messaging brokers, hand wired to different apps, this lets you have a central platform that can scale elastically to handle all the streams of data in a company. Secondly, Kafka is a true storage system built to store data for as long as you might like. This has huge advantages in using it as a connecting layer as it provides real delivery guaranteesits data is replicated, persistent, and can be kept around as long as you like. Finally, the world of stream processing raises the level of abstraction quite significantly. Messaging systems mostly just hand out messages. The stream processing capabilities in Kafka let you compute derived streams and datasets dynamically off of your streams with far less code. These differences make Kafka enough of its own thing that it doesnt really make sense to think of it as yet another queue.

Another view on Kafkaand one of our motivating lenses in designing and building itwas to think of it as a kind of real-time version of Hadoop. Hadoop lets you store and periodically process file data at a very large scale. Kafka lets you store and continuously process streams of data, also at a large scale. At a technical level, there are definitely similarities, and many people see the emerging area of stream processing as a superset of the kind of batch processing people have done with Hadoop and its various processing layers. What this comparison misses is that the use cases that continuous, low-latency processing opens up are quite different from those that naturally fall on a batch processing system. Whereas Hadoop and big data targeted analytics applications, often in the data warehousing space, the low latency nature of Kafka makes it applicable for the kind of core applications that directly power a business. This makes sense: events in a business are happening all the time and the ability to react to them as they occur makes it much easier to build services that directly power the operation of the business, feed back into customer experiences, and so on.

The final area Kafka gets compared to is ETL or data integration tools. After all, these tools move data around, and Kafka moves data around. There is some validity to this as well, but I think the core difference is that Kafka has inverted the problem. Rather than a tool for scraping data out of one system and inserting it into another, Kafka is a platform oriented around real-time streams of events. This means that not only can it connect off-the-shelf applications and data systems, it can power custom applications built to trigger off of these same data streams. We think this architecture centered around streams of events is a really important thing. In some ways these flows of data are the most central aspect of a modern digital company, as important as the cash flows youd see in a financial statement.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Kafka: Real-Time Data and Stream Processing at Scale»

Look at similar books to Kafka: Real-Time Data and Stream Processing at Scale. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Kafka: Real-Time Data and Stream Processing at Scale»

Discussion, reviews of the book Kafka: Real-Time Data and Stream Processing at Scale and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.