• Complain

Garry Turkington - Learning Hadoop 2

Here you can read online Garry Turkington - Learning Hadoop 2 full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2014, publisher: Packt Publishing, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Garry Turkington Learning Hadoop 2

Learning Hadoop 2: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Learning Hadoop 2" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2

About This Book
  • Construct state-of-the-art applications using higher-level interfaces and tools beyond the traditional MapReduce approach
  • Use the unique features of Hadoop 2 to model and analyze Twitters global stream of user generated data
  • Develop a prototype on a local cluster and deploy to the cloud (Amazon Web Services)
Who This Book Is For

If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.

In Detail

This book introduces you to the world of building data-processing applications with the wide variety of tools supported by Hadoop 2. Starting with the core components of the frameworkHDFS and YARNthis book will guide you through how to build applications using a variety of approaches.

You will learn how YARN completely changes the relationship between MapReduce and Hadoop and allows the latter to support more varied processing approaches and a broader array of applications. These include real-time processing with Apache Samza and iterative computation with Apache Spark. Next up, we discuss Apache Pig and the dataflow data model it provides. You will discover how to use Pig to analyze a Twitter dataset.

With this book, you will be able to make your life easier by using tools such as Apache Hive, Apache Oozie, Hadoop Streaming, Apache Crunch, and Kite SDK. The last part of this book discusses the likely future direction of major Hadoop components and how to get involved with the Hadoop community.

Garry Turkington: author's other books


Who wrote Learning Hadoop 2? Find out the surname, the name of the author of the book and a list of all author's works by series.

Learning Hadoop 2 — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Learning Hadoop 2" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Learning Hadoop 2

Table of Contents
Learning Hadoop 2

Learning Hadoop 2

Copyright 2015 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: February 2015

Production reference: 1060215

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78328-551-8

www.packtpub.com

Credits

Authors

Garry Turkington

Gabriele Modena

Reviewers

Atdhe Buja

Amit Gurdasani

Jakob Homan

James Lampton

Davide Setti

Valerie Parham-Thompson

Commissioning Editor

Edward Gordon

Acquisition Editor

Joanne Fitzpatrick

Content Development Editor

Vaibhav Pawar

Technical Editors

Indrajit A. Das

Menza Mathew

Copy Editors

Roshni Banerjee

Sarang Chari

Pranjali Chury

Project Coordinator

Kranti Berde

Proofreaders

Simran Bhogal

Martin Diver

Lawrence A. Herman

Paul Hindle

Indexer

Hemangini Bari

Graphics

Abhinash Sahu

Production Coordinator

Nitesh Thakur

Cover Work

Nitesh Thakur

About the Authors

Garry Turkington has over 15 years of industry experience, most of which has been focused on the design and implementation of large-scale distributed systems. In his current role as the CTO at Improve Digital, he is primarily responsible for the realization of systems that store, process, and extract value from the company's large data volumes. Before joining Improve Digital, he spent time at Amazon.co.uk, where he led several software development teams, building systems that process the Amazon catalog data for every item worldwide. Prior to this, he spent a decade in various government positions in both the UK and the USA.

He has BSc and PhD degrees in Computer Science from Queens University Belfast in Northern Ireland, and a Master's degree in Engineering in Systems Engineering from Stevens Institute of Technology in the USA. He is the author of Hadoop Beginners Guide , published by Packt Publishing in 2013, and is a committer on the Apache Samza project.

I would like to thank my wife Lea and mother Sarah for their support and patience through the writing of another book and my daughter Maya for frequently cheering me up and asking me hard questions. I would also like to thank Gabriele for being such an amazing co-author on this project.

Gabriele Modena is a data scientist at Improve Digital. In his current position, he uses Hadoop to manage, process, and analyze behavioral and machine-generated data. Gabriele enjoys using statistical and computational methods to look for patterns in large amounts of data. Prior to his current job in ad tech he held a number of positions in Academia and Industry where he did research in machine learning and artificial intelligence.

He holds a BSc degree in Computer Science from the University of Trento, Italy and a Research MSc degree in Artificial Intelligence: Learning Systems, from the University of Amsterdam in the Netherlands.

First and foremost, I want to thank Laura for her support, constant encouragement and endless patience putting up with far too many "can't do, I'm working on the Hadoop book". She is my rock and I dedicate this book to her.

A special thank you goes to Amit, Atdhe, Davide, Jakob, James and Valerie, whose invaluable feedback and commentary made this work possible.

Finally, I'd like to thank my co-author, Garry, for bringing me on board with this project; it has been a pleasure working together.

About the Reviewers

Atdhe Buja is a certified ethical hacker, DBA (MCITP, OCA11g), and developer with good management skills. He is a DBA at the Agency for Information Society / Ministry of Public Administration, where he also manages some projects of e-governance and has more than 10 years' experience working on SQL Server.

Atdhe is a regular columnist for UBT News. Currently, he holds an MSc degree in computer science and engineering and has a bachelor's degree in management and information. He specializes in and is certified in many technologies, such as SQL Server (all versions), Oracle 11 g , CEH, Windows Server, MS Project, SCOM 2012 R2, BizTalk, and integration business processes.

He was the reviewer of the book, Microsoft SQL Server 2012 with Hadoop , published by Packt Publishing. His capabilities go beyond the aforementioned knowledge!

I thank Donika and my family for all the encouragement and support.

Amit Gurdasani is a software engineer at Amazon. He architects distributed systems to process product catalogue data. Prior to building high-throughput systems at Amazon, he was working on the entire software stack, both as a systems-level developer at Ericsson and IBM as well as an application developer at Manhattan Associates. He maintains a strong interest in bulk data processing, data streaming, and service-oriented software architectures.

Jakob Homan has been involved with big data and the Apache Hadoop ecosystem for more than 5 years. He is a Hadoop committer as well as a committer for the Apache Giraph, Spark, Kafka, and Tajo projects, and is a PMC member. He has worked in bringing all these systems to scale at Yahoo! and LinkedIn.

James Lampton is a seasoned practitioner of all things data (big or small) with 10 years of hands-on experience in building and using large-scale data storage and processing platforms. He is a believer in holistic approaches to solving problems using the right tool for the right job. His favorite tools include Python, Java, Hadoop, Pig, Storm, and SQL (which sometimes I like and sometimes I don't). He has recently completed his PhD from the University of Maryland with the release of Pig Squeal: a mechanism for running Pig scripts on Storm.

I would like to thank my spouse, Andrea, and my son, Henry, for giving me time to read work-related things at home. I would also like to thank Garry, Gabriele, and the folks at Packt Publishing for the opportunity to review this manuscript and for their patience and understanding, as my free time was consumed when writing my dissertation.

Davide Setti , after graduating in physics from the University of Trento, joined the SoNet research unit at the Fondazione Bruno Kessler in Trento, where he applied large-scale data analysis techniques to understand people's behaviors in social networks and large collaborative projects such as Wikipedia.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Learning Hadoop 2»

Look at similar books to Learning Hadoop 2. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Learning Hadoop 2»

Discussion, reviews of the book Learning Hadoop 2 and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.