Understanding Big Data
About the Authors
Paul C. Zikopoulos, B.A., M.B.A., is the Director of Technical Professionals for IBM Software Groups Information Management division and additionally leads the World Wide Database Competitive and Big Data SWAT teams. Paul is an internationally recognized award-winning writer and speaker with more than 18 years of experience in Information Management. Paul has written more than 350 magazine articles and 14 books on database technologies, including DB2 pureScale: Risk Free Agile Scaling (McGraw-Hill, 2010); Break Free with DB2 9.7: A Tour of Cost-Slashing New Features (McGraw-Hill, 2010); Information on Demand: Introduction to DB2 9.5 New Features (McGraw-Hill, 2007); DB2 Fundamentals Certification for Dummies (For Dummies, 2001); DB2 for Windows for Dummies (For Dummies, 2001), and more. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Clusters) and a DB2 Certified Solutions Expert (BI and DBA). In his spare time, he enjoys all sorts of sporting activities, including running with his dog, Chachi; avoiding punches in his MMA training; trying to figure out why his golf handicap has unexplainably decided to go up; and trying to understand the world according to Chlo, his daughter. You can reach him at paulz_ibm@msn.com. Also, keep up with Pauls take on Big Data by following him on Twitter @BigData_paulz.
Chris Eaton, B.Sc., is a worldwide technical specialist for IBMs Information Management products focused on Database Technology, Big Data, and Workload Optimization. Chris has been working with DB2 on the Linux, UNIX, and Windows platform for more than 19 years in numerous roles, from support, to development, to product management. Chris has spent his career listening to clients and working to make DB2 a better product. He is the author of several books in the data management space, including The High Availability Guide to DB2 (IBM Press, 2004), IBM DB2 9 New Features (McGraw-Hill, 2007), and Break Free with DB2 9.7: A Tour of Cost-Slashing New Features (McGraw-Hill, 2010). Chris is also an international award-winning speaker, having presented at data management conferences across the globe, and he has one of the most popular DB2 blogs located on IT Toolbox at http://it.toolbox.com/blogs/db2luw .
Dirk deRoos, B.Sc., B.A., is a member of the IBM World-Wide Technical Sales Team, specializing in the IBM Big Data Platform. Dirk joined IBM 11 years ago and previously worked in the Toronto DB2 Development lab as its Information Architect. Dirk has a Bachelors degree in Computer Science and a Bachelor of Arts degree (Honors in English) from the University of New Brunswick.
Thomas Deutsch, B.A, M.B.A., serves as a Program Director in IBMs Big Data business. Tom has spent the last couple of years helping customers with Apache Hadoop, identifying architecture fit, and managing early stage projects in multiple customer engagements. He played a formative role in the transition of Hadoop-based technologies from IBM Research to IBM Software Group, and he continues to be involved with IBM Research Big Data activities and the transition of research to commercial products. Prior to this role, Tom worked in the CTO offices Information Management division. In that role, Tom worked with a team focused on emerging technologies and helped customers adopt IBMs innovative Enterprise Mashups and Cloud offerings. Tom came to IBM through the FileNet acquisition, where he had responsibility for FileNets flagship Content Management product and spearheaded FileNet product initiatives with other IBM software segments including the Lotus and InfoSphere segments. With more than 20 years in the industry and a veteran of two startups, Tom is an expert on the technical, strategic, and business information management issues facing the enterprise today. Tom earned a Bachelors degree from the Fordham University in New York and an MBA from the Maryland University College.
George Lapis, MS CS, is a Big Data Solutions Architect at IBMs Silicon Valley Research and Development Lab. He has worked in the database software area for more than 30 years. He was a founding member of R* and Starburst research projects at IBMs Almaden Research Center in Silicon Valley, as well as a member of the compiler development team for several releases of DB2. His expertise lies mostly in compiler technology and implementation. About ten years ago, George moved from research to development, where he led the compiler development team in his current lab location, specifically working on the development of DB2s SQL/XML and XQuery capabilities. George also spent several years in a customer enablement role for the Optim Database toolset and more recently in IBMs Big Data business. In his current role, George is leading the tools development team for IBMs InfoSphere BigInsights platform. George has co-authored several database patents and has contributed to numerous papers. Hes a certified DB2 DBA and Hadoop Administrator.
About the Technical Editor
Steven Sit, B.Sc., MS, is a Program Director in IBMs Silicon Valley Research and Development Lab where the IBMs Big Data platform is developed and engineered. Steven and his team help IBMs customers and partners evaluate, prototype, and implement Big Data solutions as well as build Big Data deployment patterns. For the past 17 years, Steven has held key positions in a number of IBM projects, including business intelligence, database tooling, and text search. Steven holds a Bachelors degree in Computer Science (University of Western Ontario) and a Masters of Computer Science degree (Syracuse University).
Understanding Big Data
Analytics for Enterprise Class Hadoop and Streaming Data
Paul C. Zikopoulos
Chris Eaton
Dirk deRoos
Thomas Deutsch
George Lapis
Copyright 2012 by The McGraw-Hill Companies, Inc. All rights reserved. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher.
ISBN: 978-0-07-179054-3
MHID: 0-07-179054-3
The material in this eBook also appears in the print version of this title: ISBN: 978-0-07-179053-6, MHID: 0-07-179053-5.
All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such designations appear in this book, they have been printed with initial caps.
McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. To contact a representative please e-mail us at bulksales@mcgraw-hill.com.
Information has been obtained by McGraw-Hill from sources believed to be reliable. However, because of the possibility of human or mechanical error by our sources, McGraw-Hill, or others, McGraw-Hill does not guarantee the accuracy, adequacy, or completeness of any information and is not responsible for any errors or omissions or the results obtained from the use of such information.
Next page