Steve Hoffman - Apache Flume: Distributed Log Collection for Hadoop
Here you can read online Steve Hoffman - Apache Flume: Distributed Log Collection for Hadoop full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2013, publisher: Packt Publishing, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:
Romance novel
Science fiction
Adventure
Detective
Science
History
Home and family
Prose
Art
Politics
Computer
Non-fiction
Religion
Business
Children
Humor
Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.
- Book:Apache Flume: Distributed Log Collection for Hadoop
- Author:
- Publisher:Packt Publishing
- Genre:
- Year:2013
- Rating:3 / 5
- Favourites:Add to favourites
- Your mark:
Apache Flume: Distributed Log Collection for Hadoop: summary, description and annotation
We offer to read an annotation, description, summary or preface (depends on what the author of the book "Apache Flume: Distributed Log Collection for Hadoop" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.
Stream data to Hadoop using Apache Flume
Overview
- Integrate Flume with your data sources
- Transcode your data en-route in Flume
- Route and separate your data using regular expression matching
- Configure failover paths and load-balancing to remove single points of failure
- Utilize Gzip Compression for files written to HDFS
In Detail
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoops HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with many failover and recovery mechanisms.
Apache Flume: Distributed Log Collection for Hadoop covers problems with HDFS and streaming data/logs, and how Flume can resolve these problems. This book explains the generalized architecture of Flume, which includes moving data to/from databases, NO-SQL-ish data stores, as well as optimizing performance. This book includes real-world scenarios on Flume implementation.
Apache Flume: Distributed Log Collection for Hadoop starts with an architectural overview of Flume and then discusses each component in detail. It guides you through the complete installation process and compilation of Flume.
It will give you a heads-up on how to use channels and channel selectors. For each architectural component (Sources, Channels, Sinks, Channel Processors, Sink Groups, and so on) the various implementations will be covered in detail along with configuration options. You can use it to customize Flume to your specific needs. There are pointers given on writing custom implementations as well that would help you learn and implement them.
What you will learn from this book
- Understand the Flume architecture
- Download and install open source Flume from Apache
- Discover when to use a memory or file-backed channel
- Understand and configure the Hadoop File System (HDFS) sink
- Learn how to use sink groups to create redundant data flows
- Configure and use various sources for ingesting data
- Inspect data records and route to different or multiple destinations based on payload content
- Transform data en-route to Hadoop
- Monitor your data flows
Approach
A starter guide that covers Apache Flume in detail.
Who this book is written for
Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators.
Steve Hoffman: author's other books
Who wrote Apache Flume: Distributed Log Collection for Hadoop? Find out the surname, the name of the author of the book and a list of all author's works by series.