• Complain

Hari Shreedharan - Using Flume: Flexible, Scalable, and Reliable Data Streaming

Here you can read online Hari Shreedharan - Using Flume: Flexible, Scalable, and Reliable Data Streaming full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2014, publisher: OReilly Media, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Hari Shreedharan Using Flume: Flexible, Scalable, and Reliable Data Streaming
  • Book:
    Using Flume: Flexible, Scalable, and Reliable Data Streaming
  • Author:
  • Publisher:
    OReilly Media
  • Genre:
  • Year:
    2014
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Using Flume: Flexible, Scalable, and Reliable Data Streaming: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Using Flume: Flexible, Scalable, and Reliable Data Streaming" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, youll learn Flumes rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems.

Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. Youll learn about Flumes design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub.

  • Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers
  • Dive into key Flume components, including sources that accept data and sinks that write and deliver it
  • Write custom plugins to customize the way Flume receives, modifies, formats, and writes data
  • Explore APIs for sending data to Flume agents from your own applications
  • Plan and deploy Flume in a scalable and flexible wayand monitor your cluster once its running

Hari Shreedharan: author's other books


Who wrote Using Flume: Flexible, Scalable, and Reliable Data Streaming? Find out the surname, the name of the author of the book and a list of all author's works by series.

Using Flume: Flexible, Scalable, and Reliable Data Streaming — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Using Flume: Flexible, Scalable, and Reliable Data Streaming" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Using Flume

Hari Shreedharan

Using Flume

by Hari Shreedharan

Copyright 2015 Hari Shreedharan. All rights reserved.

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (https://www.safaribooksonline.com/). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

  • Editor: Ann Spencer
  • Production Editor: Kara Ebrahim
  • Copyeditor: Charles Roumeliotis
  • Proofreader: Rachel Head
  • Indexer: Meghan Jones
  • Interior Designer: David Futato
  • Cover Designer: Ellie Volckhausen
  • Illustrator: Rebecca Demarest
  • October 2014: First Edition
Revision History for the First Edition
  • 2014-09-15: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781449368302 for release details.

The OReilly logo is a registered trademarks of OReilly Media, Inc. Using Flume, the cover image of a burbot, and related trade dress are trademarks of OReilly Media, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and OReilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps.

While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-449-36830-2

[LSI]

Foreword

The past few years have seen tremendous growth in the development and adoption of Big Data technologies. Hadoop and related platforms are powering the next wave of data analytics over increasingly large amounts of data. The data produced today will be dwarfed by what is expected tomorrow, growing at an ever-increasing rate as the digital revolution engulfs all aspects of our existence. The barrier to entry in this new age of massive data volumes is of course the obvious one: how do you get all this data in your cluster to begin with? Clearly, this data is produced in a wide spectrum of sources spread across the enterprise, and has an interesting mix of interaction, machine, sensor, and social data among others. Any operator who has dealt with similar challenges would no doubt agree that it is nontrivial, if not downright hard, to build a system that can route this data into your clusters in a cost-effective manner.

Apache Flume is exactly built to handle this challenge.

Back in 2011 when Flume went into Incubation at The Apache Software Foundation, it was a project built by Cloudera engineers to address large-scale log data aggregation on Hadoop. Being a popular project from the beginning, it had seen a large number of new requirements ranging from event-ordering to guaranteed-delivery semantics, that came up over its initial releases. Given its popularity and high demand for complex requirements, we decided to refactor the project entirely and make it simpler, more powerful in its applicability and manageability, and allow for easy extensions where necessary. Hari and I were in the Incubator project along with a handful of other engineers who were working around the clock with the Flume community to drive this vision and implementation forward. From that time until now, Flume has graduated into its own top-level Apache project, made several stable releases, and has grown significantly rich in functionality.

Today, Flume is actively deployed and in use across the world in large numbers of data centers, sometimes spanning continental boundaries. It continues to effectively provide a super-resilient, fault-tolerant, reliable, fast, and efficient mechanism to move massive amounts of data from a variety of sources over to destination systems such as HBase, HDFS, etc. A well-planned Flume topology operates with minimal or no intervention, practically running itself indefinitely. It provides contextual routing and is able to work through downtimes, network outages, and other unpredictable/unplanned interruptions by providing the capacity to reliably store and retransmit messages when connectivity is restored. It does all of this out of the box, and yet provides the flexibility to customize any component within its implementation using fairly stable and intuitive interfaces that are widely in use.

In Using Flume, Hari provides an overview of various components within Flume, diving into details where necessary. Operators will find this book immensely valuable for understanding how to easily set up and deploy Flume pipelines. Developers will find it a handy reference to build or customize components within Flume, and to better understand its architecture and component designs. Above all, this book will give you the necessary insights for setting up continuous ingestion for HDFS and HBase the two most popular storage systems today.

With Flume deployed, you can be sure that data no matter where its produced in your enterprise, or how large its volume is will make it safely and timely into your Big Data platforms. And you can then focus your energy on getting the right insights out of your data. Good luck!

Arvind Prabhakar, CTO, StreamSets

Preface

Today, developers are able to write and deploy applications on a large number of servers in the cloud very easily. These applications are producing more data than ever, which when stored and analyzed gives valuable insights that can improve the applications themselves and the businesses that the applications are a part of. The data generated by such applications is often analyzed using systems like Hadoop and HBase.

Analyzing this data is really possible only if you can get the data into these systems from frontend servers. Often, the validity of such analysis becomes less valid as the data becomes older. To get the data into the processing system in near real time, systems like Apache Flume are used. Apache Flume is a system for moving large amounts of data from large numbers of data producers to systems that store, index, or analyze that data. Such systems also decouple the producers from the consumers of the data, making it easy to change either side without the other knowing about it. In addition to decoupling, they also provide failure isolation and an added buffer between the producer and the storage system. The data producers will not know about the storage or indexing system being inaccessible until all of the Flume buffers also fill up this provides an additional buffer, which might be enough for the storage system to come back online and clear up the backlog of events in the Flume buffers.

In this book, we will discuss in detail why systems like Flume are needed, the internals of a Flume agent, and how to configure and deploy Flume agents. We will also discuss the various ways in which Flume deployments can be customized and how to write plug-ins for Flume.

gives a basic introduction to Apache Hadoop and Apache HBase. This chapter is only meant to introduce the reader to Hadoop and HBase and give some details of their internals. This can be skipped if the reader is already familiar with Hadoop and HBase.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Using Flume: Flexible, Scalable, and Reliable Data Streaming»

Look at similar books to Using Flume: Flexible, Scalable, and Reliable Data Streaming. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Using Flume: Flexible, Scalable, and Reliable Data Streaming»

Discussion, reviews of the book Using Flume: Flexible, Scalable, and Reliable Data Streaming and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.