Using R to Unlock the Value of Big Data
Big Data Analytics with Oracle R Enterprise and Oracle R Connector for Hadoop
Mark Hornick
Tom Plunkett
New York Chicago San Francisco
Athens London Madrid Mexico City
Milan New Delhi Singapore Sydney Toronto
Copyright 2013 by McGraw-Hill Education (Publisher). All rights reserved. Printed in the United States of America. Except as permitted under the Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of publisher, with the exception that the program listings may be entered, stored, and executed in a computer system, but they may not be reproduced for publication.
ISBN: 978-0-07-182627-3
MHID: 0-07-182627-0
e-book conversion by Cenveo Publisher Services
Version 1.0
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. All other trademarks are the property of their respective owners, and McGraw-Hill Education makes no claim of ownership by the mention of products that contain these marks.
Screen displays of copyrighted Oracle software programs have been reproduced herein with the permission of Oracle Corporation and/or its affiliates.
Information has been obtained by McGraw-Hill Education from sources believed to be reliable. However, because of the possibility of human or mechanical error by our sources, McGraw-Hill Education, or others, McGraw-Hill Education does not guarantee the accuracy, adequacy, or completeness of any information and is not responsible for any errors or omissions or the results obtained from the use of such information.
Oracle Corporation does not make any representations or warranties as to the accuracy, adequacy, or completeness of any information contained in this Work, and is not responsible for any errors or omissions.
TERMS OF USE
This is a copyrighted work and McGraw-Hill Education (McGraw-Hill) and its licensors reserve all rights in and to the work. Use of this work is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without McGraw-Hills prior consent. You may use the work for your own noncommercial and personal use; any other use of the work is strictly prohibited. Your right to use the work may be terminated if you fail to comply with these terms.
THE WORK IS PROVIDED AS IS. McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. McGraw-Hill and its licensors do not warrant or guarantee that the functions contained in the work will meet your requirements or that its operation will be uninterrupted or error free. Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inaccuracy, error or omission, regardless of cause, in the work or for any damages resulting therefrom. McGraw-Hill has no responsibility for the content of any information accessed through the work. Under no circumstances shall McGraw-Hill and/or its licensors be liable for any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inability to use the work, even if any of them has been advised of the possibility of such damages. This limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or otherwise.
About the Authors
Mark Hornick is a Director in the Oracle Database Advanced Analytics group focusing on Oracle R Enterprise (ORE), Oracle R Connector for Hadoop (ORCH), and Oracle R Distribution (ORD). He also works with internal and external customers in the application of R for scalable applications in Oracle Database, Exadata, and the Big Data Appliance, also engaging in SAS-to-R conversion and performance benchmarking. Mark is co-author of Java Data Mining: Strategy, Standard, and Practice. He joined Oracles Data Mining Technologies group in 1999 through the acquisition of Thinking Machines Corp. Mark was a founding member of and currently serves as an Oracle Advisor to the IOUG Business Intelligence Warehousing and Analytics (BIWA) SIG. He has conducted training sessions on R, ORE, and ORCH in the US, EMEA, and APAC, and has presented at conferences including Oracle OpenWorld, Collaborate, BIWA Summit, and the R user conference useR! Mark holds a bachelors degree from Rutgers University and a masters degree from Brown University, both in computer science.
Tom Plunkett is a Senior Sales Consultant with Oracle. Tom also teaches graduate-level computer science courses for Virginia Tech as an adjunct instructor and distance learning instructor. Tom helped win several industry awards for a big data project that Oracle and the Frederick National Laboratory for Cancer Research collaborated on to analyze relationships between genomes and cancer subtypes, including the 2012 Government Big Data Solution Award, ACT-IAC finalist for best pilot/start-up project, and was nominated for a 2013 Computer World Honor Award for Innovation. Tom has spoken internationally at over forty conferences on the subject of Big Data since leading a team that won a Big Data project from the Office of the Secretary of Defense in 2009. Tom is the lead author of several books, including Oracle Big Data Handbook and Oracle Exalogic Elastic Cloud Handbook. Previously, Tom worked for IBM and practiced patent law for Fliesler Meyer. Tom has a BA and a JD from George Mason University, and an MS in computer science from Virginia Tech.
Thanks are due to Jean-Pierre Dijcks and Dan McClary for their technical editing and input during the writing of this book.
Using R to Unlock the Value of Big Data
Big Data Analytics with Oracle R Enterprise and Oracle R Connector for Hadoop
T he focus of this book is on analyzing data with R while making it scalable using Oracles R technologies. Initial sections provide an introduction to open source R and issues with traditional R and database interaction. Subsequent sections cover Oracles strategic R offerings: Oracle R Enterprise 1.3, Oracle R Distribution, ROracle, and Oracle R Connector for Hadoop 2.0.
Oracles R product offerings complement Oracles other products in the Big Data space. This book is based on an expanded and updated chapter from the companion book, Oracle Big Data Handbook, which provides comprehensive details on Oracles Big Data strategy and product offerings. Among other changes, this work includes a section of exercises that is not contained within