• Complain

Steve Hoffman - Apache Flume: Distributed Log Collection for Hadoop

Here you can read online Steve Hoffman - Apache Flume: Distributed Log Collection for Hadoop full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2013, publisher: Packt Publishing, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Steve Hoffman Apache Flume: Distributed Log Collection for Hadoop
  • Book:
    Apache Flume: Distributed Log Collection for Hadoop
  • Author:
  • Publisher:
    Packt Publishing
  • Genre:
  • Year:
    2013
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Apache Flume: Distributed Log Collection for Hadoop: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Apache Flume: Distributed Log Collection for Hadoop" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Stream data to Hadoop using Apache Flume

Overview

  • Integrate Flume with your data sources
  • Transcode your data en-route in Flume
  • Route and separate your data using regular expression matching
  • Configure failover paths and load-balancing to remove single points of failure
  • Utilize Gzip Compression for files written to HDFS

In Detail

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoops HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with many failover and recovery mechanisms.

Apache Flume: Distributed Log Collection for Hadoop covers problems with HDFS and streaming data/logs, and how Flume can resolve these problems. This book explains the generalized architecture of Flume, which includes moving data to/from databases, NO-SQL-ish data stores, as well as optimizing performance. This book includes real-world scenarios on Flume implementation.

Apache Flume: Distributed Log Collection for Hadoop starts with an architectural overview of Flume and then discusses each component in detail. It guides you through the complete installation process and compilation of Flume.

It will give you a heads-up on how to use channels and channel selectors. For each architectural component (Sources, Channels, Sinks, Channel Processors, Sink Groups, and so on) the various implementations will be covered in detail along with configuration options. You can use it to customize Flume to your specific needs. There are pointers given on writing custom implementations as well that would help you learn and implement them.

  • By the end, you should be able to construct a series of Flume agents to transport your streaming data and logs from your systems into Hadoop in near real time.
  • What you will learn from this book

    • Understand the Flume architecture
    • Download and install open source Flume from Apache
    • Discover when to use a memory or file-backed channel
    • Understand and configure the Hadoop File System (HDFS) sink
    • Learn how to use sink groups to create redundant data flows
    • Configure and use various sources for ingesting data
    • Inspect data records and route to different or multiple destinations based on payload content
    • Transform data en-route to Hadoop
    • Monitor your data flows

    Approach

    A starter guide that covers Apache Flume in detail.

    Who this book is written for

    Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators.

    Steve Hoffman: author's other books


    Who wrote Apache Flume: Distributed Log Collection for Hadoop? Find out the surname, the name of the author of the book and a list of all author's works by series.

    Apache Flume: Distributed Log Collection for Hadoop — read online for free the complete book (whole text) full work

    Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Apache Flume: Distributed Log Collection for Hadoop" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

    Light

    Font size:

    Reset

    Interval:

    Bookmark:

    Make
    Apache Flume: Distributed Log Collection for Hadoop

    Apache Flume: Distributed Log Collection for Hadoop

    Copyright 2013 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: July 2013

    Production Reference: 1090713

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78216-791-4

    www.packtpub.com

    Cover Image by Abhishek Pandey (<>)

    Credits

    Author

    Steve Hoffman

    Reviewers

    Subash D'Souza

    Stefan Will

    Acquisition Editor

    Kunal Parikh

    Commissioning Editor

    Sharvari Tawde

    Technical Editors

    Jalasha D'costa

    Mausam Kothari

    Project Coordinator

    Sherin Padayatty

    Proofreader

    Aaron Nash

    Indexer

    Monica Ajmera Mehta

    Graphics

    Valentina D'silva

    Abhinash Sahu

    Production Coordinator

    Kirtee Shingan

    Cover Work

    Kirtee Shingan

    About the Author

    Steve Hoffman has 30 years of software development experience and holds a B.S. in computer engineering from the University of Illinois Urbana-Champaign and a M.S. in computer science from the DePaul University. He is currently a Principal Engineer at Orbitz Worldwide.

    More information on Steve can be found at http://bit.ly/bacoboy or on Twitter @bacoboy.

    This is Steve's first book.

    I'd like to dedicate this book to my loving wife Tracy. Her dedication to perusing what you love is unmatched and it inspires me to follow her excellent lead in all things.

    I'd also like to thank Packt Publishing for the opportunity to write this book and my reviewers and editors for their hard work in making it a reality.

    Finally, I want to wish a fond farewell to my brother Richard who passed away recently. No book has enough pages to describe in detail just how much we will all miss him. Good travels brother.

    About the Reviewers

    Subash D'Souza is a professional software developer with strong expertise in crunching big data using Hadoop/HBase with Hive/Pig. He has worked with Perl/PHP/Python, primarily for coding and MySQL/Oracle as the backend, for several years prior to moving into Hadoop fulltime. He has worked on scaling for load, code development, and optimization for speed. He also has experience optimizing SQL queries for database interactions. His specialties include Hadoop, HBase, Hive, Pig, Sqoop, Flume, Oozie, Scaling, Web Data Mining, PHP, Perl, Python, Oracle, SQL Server, and MySQL Replication/Clustering.

    I would like to thank my wife, Theresa for her kind words of support and encouragement.

    Stefan Will is a computer scientist with a degree in machine learning and pattern recognition from the University of Bonn. For over a decade has worked for several startup companies in Silicon Valley and Raleigh, North Carolina, in the area of search and analytics. Presently, he leads the development of the search backend and the Hadoop-based product analytics platform at Zendesk, the customer service software provider.

    www.PacktPub.com
    Support files, eBooks, discount offers and more

    You might want to visit www.PacktPub.com for support files and downloads related to your book.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at > for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    httpPacktLibPacktPubcom Do you need instant solutions to your IT - photo 1

    http://PacktLib.PacktPub.com

    Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

    Why Subscribe?
    • Fully searchable across every book published by Packt
    • Copy and paste, print and bookmark content
    • On demand and accessible via web browser
    Free Access for Packt account holders

    If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

    Preface

    Hadoop is a great open source tool for sifting tons of unstructured data into something manageable, so that your business can gain better insight into your customers, needs. It is cheap (can be mostly free), scales horizontally as long as you have space and power in your data center, and can handle problems your traditional data warehouse would be crushed under. That said, a little known secret is that your Hadoop cluster requires you to feed it with data; otherwise, you just have a very expensive heat generator. You will quickly find, once you get past the playing around phase with Hadoop, that you will need a tool to automatically feed data into your cluster. In the past, you had to come up with a solution for this problem, but no more! Flume started as a project out of Cloudera when their integration engineers had to keep writing tools over and over again for their customers to import data automatically. Today the project lives with the Apache Foundation, is under active development, and boasts users who have been using it in their production environments for years.

    In this book I hope to get you up and running quickly with an architectural overview of Flume and a quick start guide. After that well deep-dive into the details on many of the more useful Flume components, including the very important File Channel for persistence of in-flight data records and the HDFS Sink for buffering and writing data into HDFS , the Hadoop Distributed File System . Since Flume comes with a wide variety of modules, chances are that the only tool youll need to get started is a text editor for the configuration file.

    By the end of the book, you should know enough to build out a highly available, fault tolerant, streaming data pipeline feeding your Hadoop cluster.

    What this book covers

    , Overview and Architecture , introduces the reader to Flume and the problem space that it is trying to address (specifically with regard to Hadoop). An architectural overview is given of the various components to be covered in the later chapters.

    Next page
    Light

    Font size:

    Reset

    Interval:

    Bookmark:

    Make

    Similar books «Apache Flume: Distributed Log Collection for Hadoop»

    Look at similar books to Apache Flume: Distributed Log Collection for Hadoop. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


    Reviews about «Apache Flume: Distributed Log Collection for Hadoop»

    Discussion, reviews of the book Apache Flume: Distributed Log Collection for Hadoop and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.