Copyright
Acquiring Editor: Andrea Dierna
Development Editor: Heather Scherer
Project Manager: Punithavathy Govindaradjane
Designer: Mark Rogers
Morgan Kaufmann is an imprint of Elsevier
225 Wyman Street, Waltham, MA 02451, USA
Copyright 2014 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
Celko, Joe.
Joe Celko's complete guide to NoSQL : what every SQL professional needs to know about nonrelational databases / Joe Celko.
pages cm
Includes bibliographical references and index.
ISBN 978-0-12-407192-6 (alk. paper)
1. Non-relational databases. 2. NoSQL. 3. SQL (Computer program language) I. Title. II. Title: Complete guide to NoSQL.
QA76.9.D32C44 2014
005.75dc23
2013028696
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
ISBN: 978-0-12-407192-6
Printed and bound in the United States of America
14 15 16 17 18 10 9 8 7 6 5 4 3 2 1
For information on all MK publications visit our website at www.mkp.com
Dedication
In praise of Joe Celkos Complete Guide to NoSQL: What Every SQL Professional Needs to Know about Nonrelational Databases
For those of you who have problems that just dont fit the SQL mold, or who want to simply increase your knowledge of data management in general, you can do worse than Joe Celkos books in general, and NoSQL in particular.
Jeff Garbus, Owner, Soaring Eagle Consulting
About the Author
Joe Celko served 10 years on the ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 standards.
Mr. Celko is the author of a series of books on SQL and RDBMS for Elsevier/Morgan Kaufmann. He is an independent consultant based in Austin, TX. He has written over 1,200 columns in the computer trade and academic presses, mostly dealing with data and databases.
Introduction
Nothing is more difficult than to introduce a new order, because the innovator has for enemies all those who have done well under the old conditions and lukewarm defenders in those who may do well under the new. Niccolo Machiavelli
I have done a series of books for the Elsevier/Morgan Kaufmann imprint over the last few decades. They have almost all been about SQL and RDBMS. This book is an overview of what is being called Big Data, new SQL, or NoSQL in the trade press; we geeks love buzzwords! The first columnist or blogger to invent a meme that catches on will have a place in Wikipedia and might even get a book deal out of it.
Since SQL is the de facto dominate database model on Earth, anything different has to be positioned as a challenger. But what buzzwords can we use? We have had petabytes of data in SQL for years, so Big Data does not seem right. SQL has been evolving with a new ANSI/ISO standard being issued every five or so years, rather than the old SQL suddenly changing into new SQL overnight. That last meme makes me think of New Coke and does not inspire confidence and success.
Among the current crop of buzzwords, I like NoSQL the best because I read it as N. O. SQL, a shorthand for not only SQL instead of no SQL, as it is often read. This implies that the last 40-plus years of database technology have done no good. Not true! Too often SQL people, me especially, become the proverbial kid with a hammer who thinks every problem is a nail when we are doing IT. But it takes more than a hammer to build a house.
Some of the database tools we can use have been around for decades and even predate RDBMS. Some of the tools are new because technology made them possible. When you open your toolbox, consider all of the options and how they fit the task.
This survey book takes a quick look at the old technologies that you might not know or have forgotten. Then we get to the new stuff and why it exists. I am not so interested in hardware or going into particular software in depth. For one thing, I do not have the space and you can get a book with a narrow focus for yourself and your projects. Think of this book as a department-store catalog where you can go to get ideas and learn a few new buzzwords.
Please send corrections and comments to jcelko212@earthlink.net and look for feedback on the companion website ().
The following is a quick breakdown of what you can expect to find in this book:
A queue of jobs being read into a mainframe computer is still how the bulk of commercial data processing is done. Even transaction processing models finish with a batch job to load the databases with their new ETL tools. We need to understand both of these models and how they can be used with new technologies.
Columnar databases use traditional structured data and often run some version of SQL; the difference is in how they store the data. The traditional row-oriented approach is replaced by putting data in columns that can be assembled back into the familiar rows of an RDBMS model. Since columns are drawn from one and only one data type and domain, they can be compressed and distributed over storage systems, such as RAID.
Graph databases are based on graph theory, a branch of discrete mathematics. They model relationships among entities rather than doing computations and aggregations of the values of the attributes and retrievals based on those values.