LitArk » Books » Home and family

Dai Daniel - Programming Pig

Here you can read online Dai Daniel - Programming Pig full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Sebastopol;CA, year: 2017;2016, publisher: OReilly Media, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Programming Pig
Author:
Dai Daniel / Gates Alan
Publisher:
OReilly Media
Genre:
Books / Home and family
Year:
2017;2016
City:
Sebastopol;CA
Rating:
3 / 5
Favourites:
Add to favourites
Your mark:
- 60
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Programming Pig: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Programming Pig" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Table of Contents; Preface; Data Addiction; Who Should Read This Book; Conventions Used in This Book; Code Examples in This Book; Using Code Examples; Safari Books Online; How to Contact Us; Acknowledgments; Chapter 1. Introduction; What Is Pig?; Pig on Hadoop; MapReduces hello world; Pig Latin, a Parallel Dataflow Language; Comparing query and dataflow languages; How Pig differs from MapReduce; What Is Pig Useful For?; Pig Philosophy; Pigs History; Chapter 2. Installing and Running Pig; Downloading and Installing Pig; Downloading the Pig Package from Apache; Downloading Pig from Cloudera.;Introduces new users to Pig{8212}the open source engine for executing parallel data flows on Hadoop{8212}and provides experienced users with comprehensive coverage on key features such as the Pig Latin scripting language, the Grunt shell and User Defined Functions (UDFs) for extending Pig.

Dai Daniel: author's other books

Who wrote Programming Pig? Find out the surname, the name of the author of the book and a list of all author's works by series.

Programming Pig — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Programming Pig" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Programming Pig, Second Edition

by Alan Gates and Daniel Dai

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

Editor: Marie Beaugureau	Indexer: Lucie Haskins
Production Editor: Nicholas Adams	Interior Designer: David Futato
Copyeditor: Rachel Head	Cover Designer: Randy Comer
Proofreader: Kim Cofer	Illustrator: Rebecca Demarest

November 2016: Second Edition

Revision History for the Second Edition

2016-11-08: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781491937099 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. Programming Pig, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-93709-9

[LSI]

Dedication

To my wife, Barbara, and our boys, Adam and Joel. Their support, encouragement, and sacrificed Saturdays have made this book possible.

Alan

To my wife Jenny, my older son Ethan, and my younger son Charlie who was delivered during the writing of the book.

Daniel

Preface

Data is addictive. Our ability to collect and store it has grown massively in the last several decades, yet our appetite for ever more data shows no sign of being satiated. Scientists want to be able to store more data in order to build better mathematical models of the world. Marketers want better data to understand their customers desires and buying habits. Financial analysts want to better understand the workings of their markets. And everybody wants to keep all their digital photographs, movies, emails, etc.

Before the computer and Internet revolutions, the US Library of Congress was one of the largest collections of data in the world. It is estimated that its printed collections contain approximately 10 terabytes (TB) of information. Today, large Internet companies collect that much data on a daily basis. And it is not just Internet applications that are producing data at prodigious rates. For example, the Large Synoptic Survey Telescope (LSST) under construction in Chile is expected to produce 15 TB of data every day.

Part of the reason for the massive growth in available data is our ability to collect much more data. Every time someone clicks a websites links, the web server can record information about what page the user was on and which link he clicked. Every time a car drives over a sensor in the highway, its speed can be recorded. But much of the reason is also our ability to store that data. Ten years ago, telescopes took pictures of the sky every night. But they could not store the collected data at the same level of detail that will be possible when the LSST is operational. The extra data was being thrown away because there was nowhere to put it. The ability to collect and store vast quantities of data only feeds our data addiction.

One of the most commonly used tools for storing and processing data in computer systems over the last few decades has been the relational database management system (RDBMS). But as datasets have grown large, only the more sophisticated (and hence more expensive) RDBMSs have been able to reach the scale many users now desire. At the same time, many engineers and scientists involved in processing the data have realized that they do not need everything offered by an RDBMS. These systems are powerful and have many features, but many data owners who need to process terabytes or petabytes of data need only a subset of those features.

The high cost and unneeded features of RDBMSs have led to the development of many alternative data-processing systems. One such alternative system is Apache Hadoop. Hadoop is an open source project started by Doug Cutting. Over the past several years, Yahoo! and a number of other web companies have driven the development of Hadoop, which was based on papers published by Google describing how its engineers were dealing with the challenge of storing and processing the massive amounts of data they were collecting. Hadoop is installed on a cluster of machines and provides a means to tie together storage and processing in that cluster. For a history of the project, see Hadoop: The Definitive Guide, by Tom White (OReilly).

The development of new data-processing systems such as Hadoop has spurred the porting of existing tools and languages and the construction of new tools, such as Apache Pig. Tools like Pig provide a higher level of abstraction for data users, giving them access to the power and flexibility of Hadoop without requiring them to write extensive data-processing applications in low-level Java code.

Who Should Read This Book

This book is intended for Pig programmers, new and old. Those who have never used Pig will find introductory material on how to run Pig and to get them started writing Pig Latin scripts. For seasoned Pig users, this book covers almost every feature of Pig: different modes it can be run in, complete coverage of the Pig Latin language, and how to extend Pig with your own user-defined functions (UDFs). Even those who have been using Pig for a long time are likely to discover features they have not used before.

Some knowledge of Hadoop will be useful for readers and Pig users. If youre not already familiar with it or want a quick refresher, walks through a very simple example of a Hadoop job.

Small snippets of Java, Python, and SQL are used in parts of this book. Knowledge of these languages is not required to use Pig, but knowledge of Python and Java will be necessary for some of the more advanced features. Those with a SQL background may find to be a helpful starting point in understanding the similarities and differences between Pig Latin and SQL.

Whats New in This Edition

The second edition covers Pig 0.10 through Pig 0.16, which is the latest version at the time of writing. For features introduced before 0.10, we will not call out the initial version of the feature. For newer features introduced after 0.10, we will point out the version in which the feature was introduced.

Pig runs on both Hadoop 1 and Hadoop 2 for all the versions covered in the book. To simplify our discussion, we assume Hadoop 2 is the target platform and will point out the difference for Hadoop 1 whenever applicable in this edition.

The second edition has two new chapters: Pig on Tez (). Other chapters have also been updated with the latest additions to Pig and information on existing features not covered in the first edition. These include but are not limited to:

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Programming Pig»

Look at similar books to Programming Pig. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Whitington

PDF Explained

White

Hadoop

Phillips Chris

Programming Elastic MapReduce

Hillebrand Julian

Mastering RStudio: develop, communicate, and collaborate with R: harness the power of RStudio to create web applications, R packages, markdown reports and pretty data visualizations

Avkash Chauhan

Learning Cloudera Impala

Gates

Programming Pig

Mark Grover

Hadoop Application Architectures

Garry Turkington

Learning Hadoop 2

Arun C. Murthy

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2

Srinath Perera

Hadoop MapReduce Cookbook

Edward Capriolo

Programming Hive

Alan Gates

Programming Pig

Reviews about «Programming Pig»

Discussion, reviews of the book Programming Pig and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.