LitArk » Books » Computer

Khaled Tannir - Optimizing Hadoop for MapReduce

Here you can read online Khaled Tannir - Optimizing Hadoop for MapReduce full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2014, publisher: Packt Publishing, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Optimizing Hadoop for MapReduce
Author:
Khaled Tannir
Publisher:
Packt Publishing
Genre:
Books / Computer
Year:
2014
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Optimizing Hadoop for MapReduce: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Optimizing Hadoop for MapReduce" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Learn how to configure your Hadoop cluster to run optimal MapReduce jobs

Overview

Optimize your MapReduce job performance
Identify your Hadoop clusters weaknesses
Tune your MapReduce configuration

In Detail

MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation.

This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your clusters node resources to run MapReduce jobs optimally.

This book details the Hadoop MapReduce job performance optimization process. Through a number of clear and practical steps, it will help you to fully utilize your clusters node resources.

Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster. You will also learn about optimizing map and reduce tasks by using Combiners and compression.

The book ends with best practices and recommendations on how to use your Hadoop cluster optimally.

What you will learn from this book

Learn about the factors that affect MapReduce performance
Utilize the Hadoop MapReduce performance counters to identify resource bottlenecks
Size your Hadoop clusters nodes
Set the number of mappers and reducers correctly
Optimize mapper and reducer task throughput and code size using compression and Combiners
Understand the various tuning properties and best practices to optimize clusters

Approach

This book is an example-based tutorial that deals with optimizing MapReduce job performance.

Who this book is written for

If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.

Khaled Tannir: author's other books

Who wrote Optimizing Hadoop for MapReduce? Find out the surname, the name of the author of the book and a list of all author's works by series.

Optimizing Hadoop for MapReduce — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Optimizing Hadoop for MapReduce" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Optimizing Hadoop for MapReduce

Optimizing Hadoop for MapReduce

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either expressed or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: February 2014

Production Reference: 1140214

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78328-565-5

www.packtpub.com

Cover Image by Khaled Tannir (<>)

Credits

Author

Khaled Tannir

Reviewers

Wodzimierz Bzyl

Craig Henderson

Mark Kerzner

Acquisition Editor

Joanne Fitzpatrick

Commissioning Editor

Manasi Pandire

Technical Editors

Mario D'Souza

Rosmy George

Pramod Kumavat

Arwa Manasawala

Adrian Raposo

Copy Editors

Kirti Pai

Laxmi Subramanian

Project Coordinator

Aboli Ambardekar

Proofreaders

Simran Bhogal

Ameesha Green

Indexer

Rekha Nair

Graphics

Yuvraj Mannari

Production Coordinators

Manu Joseph

Alwin Roy

Cover Work

Alwin Roy

About the Author

Khaled Tannir has been working with computers since 1980. He began programming with the legendary Sinclair Zx81 and later with Commodore home computer products (Vic 20, Commodore 64, Commodore 128D, and Amiga 500).

He has a Bachelor's degree in Electronics, a Master's degree in System Information Architectures, in which he graduated with a professional thesis, and completed his education with a Master of Research degree.

He is a Microsoft Certified Solution Developer (MCSD) and has more than 20 years of technical experience leading the development and implementation of software solutions and giving technical presentations. He now works as an independent IT consultant and has worked as an infrastructure engineer, senior developer, and enterprise/solution architect for many companies in France and Canada.

With significant experience in Microsoft .Net, Microsoft Server Systems, and Oracle Java technologies, he has extensive skills in online/offline applications design, system conversions, and multilingual applications in both domains: Internet and Desktops.

He is always researching new technologies, learning about them, and looking for new adventures in France, North America, and the Middle-east. He owns an IT and electronics laboratory with many servers, monitors, open electronic boards such as Arduino, Netduino, RaspBerry Pi, and .Net Gadgeteer, and some smartphone devices based on Windows Phone, Android, and iOS operating systems.

In 2012, he contributed to the EGC 2012 (International Complex Data Mining forum at Bordeaux University, France) and presented, in a workshop session, his work on "how to optimize data distribution in a cloud computing environment". This work aims to define an approach to optimize the use of data mining algorithms such as k-means and Apriori in a cloud computing environment.

He is the author of RavenDB 2.x Beginner's Guide , Packt Publishing .

He aims to get a PhD in Cloud Computing and Big Data and wants to learn more and more about these technologies.

He enjoys taking landscape and night time photos, travelling, playing video games, creating funny electronic gadgets with Arduino/.Net Gadgeteer, and of course, spending time with his wife and family.

You can reach him at <>.

Acknowledgments

All praise is due to Allah, the Lord of the Worlds. First, I must thank Allah for giving me the ability to think and write.

Next, I would like to thank my wife, Laila, for her big support, encouragement, and patience throughout this project. Also, I would like to thank my family in Canada and Lebanon for their support during the writing of this book.

I would like to thank everyone at Packt Publishing for their help and guidance, and for giving me the opportunity to share my experience and knowledge in technology with others in the Hadoop and MapReduce community.

Thank you as well to the technical reviewers, who provided great feedback to ensure that every tiny technical detail was accurate and rich in content.

About the Reviewers

Wodzimierz Bzyl works at the University of Gdask, Poland. His current interests include web-related technologies and NoSQL databases. He has a passion for new technologies and introduces his students to them. He enjoys contributing to open source software and spending time trekking in the Tatra mountains.

Craig Henderson graduated in 1995 with a degree in Computing for Real-time Systems and has spent his career working on large-scale data processing and distributed systems. He is the author of an open source C++ MapReduce library for single server application scalability, which is available at https://github.com/cdmh/mapreduce, and he currently researches image and video processing techniques for person identification.

Mark Kerzner holds degrees in Law, Mathematics, and Computer Science. He has been designing software for many years and Hadoop-based systems since 2008. He is the President of SHMsoft, a provider of Hadoop applications for various verticals, a co-founder of the Hadoop Illuminated training and consulting, and also the co-author of the open source book, Hadoop Illuminated . He has also authored and co-authored other books and patents.

I would like to acknowledge the help of my colleagues, in particular Sujee Maniyam, and last but not least, my multitalented family.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at > for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Optimizing Hadoop for MapReduce»

Look at similar books to Optimizing Hadoop for MapReduce. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

White

Hadoop

Phillips Chris

Programming Elastic MapReduce

Parsian

Data algorithms recipes for scaling up with Hadoop and Spark

LazyProgrammer

Big Data, MapReduce, Hadoop, and Spark with Python

Mahmoud Parsian

Data Algorithms: Recipes for Scaling Up with Hadoop and Spark

Shiva Achari

Hadoop Essentials

Tom White

Hadoop: The Definitive Guide

Arun C. Murthy

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2

Vignesh Prajapati

Big Data Analytics with R and Hadoop

Srinath Perera

Hadoop MapReduce Cookbook

Donald Miner

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

Yifeng Jiang

HBase Administration Cookbook

Reviews about «Optimizing Hadoop for MapReduce»

Discussion, reviews of the book Optimizing Hadoop for MapReduce and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.