Andreas Meier and Michael Kaufmann
SQL & NoSQL Databases Models, Languages, Consistency Options and Architectures for Big Data Management
Andreas Meier
Department fr Informatik, Universitt Fribourg, Fribourg, Switzerland
Michael Kaufmann
Departement fr Informatik, Hochschule Luzern, Rotkreuz, Switzerland
ISBN 978-3-658-24548-1 e-ISBN 978-3-658-24549-8
https://doi.org/10.1007/978-3-658-24549-8
Library of Congress Control Number: 2019935851
Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer Vieweg imprint is published by the registered company Springer Fachmedien Wiesbaden GmbH part of Springer Nature
The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany
Foreword
The term database has long since become part of peoples everyday vocabulary, for managers and clerks as well as students of most subjects. They use it to describe a logically organized collection of electronically stored data that can be directly searched and viewed. However, they are generally more than happy to leave the whys and hows of its inner workings to the experts.
Users of databases are rarely aware of the immaterial and concrete business values contained in any individual database. This applies as much to a car importers spare parts inventory as to the IT solution containing all customer depots at a bank or the patient information system of a hospital. Yet failure of these systems, or even cumulative errors, can threaten the very existence of the respective company or institution. For that reason, it is important for a much larger audience than just the database specialists to be well-informed about what is going on. Anyone involved with databases should understand what these tools are effectively able to do and which conditions must be created and maintained for them to do so.
Probably the most important aspect concerning databases involves (a) the distinction between their administration and the data stored in them (user data) and (b) the economic magnitude of these two areas. Database administration consists of various technical and administrative factors, from computers, database systems, and additional storage to the experts setting up and maintaining all these componentsthe aforementioned database specialists. It is crucial to keep in mind that the administration is by far the smaller part of standard database operation, constituting only about a quarter of the entire efforts.
Most of the work and expenses concerning databases lie in gathering, maintaining, and utilizing the user data. This includes the labor costs for all employees who enter data into the database, revise it, retrieve information from the database, or create files using this information. In the above examples, this means warehouse employees, bank tellers, or hospital personnel in a wide variety of fieldsusually for several years.
In order to be able to properly evaluate the importance of the tasks connected with data maintenance and utilization on the one hand and database administration on the other hand, it is vital to understand and internalize this difference in the effort required for each of them. Database administration starts with the design of the database, which already touches on many specialized topics such as determining the consistency checks for data manipulation or regulating data redundancies, which are as undesirable on the logical level as they are essential on the storage level. The development of database solutions is always targeted at their later use, so ill-considered decisions in the development process may have a permanent impact on everyday operations. Finding ideal solutions, such as the golden mean between too strict and too flexible when determining consistency conditions, may require some experience. Unduly strict conditions will interfere with regular operations, while excessively lax rules will entail a need for repeated expensive data repairs.
To avoid such issues, it is invaluable that anyone concerned with database development and operation, whether in management or as a database specialist, gain systematic insight into this field of computer sciences. The table of contents gives an overview of the wide variety of topics covered in this book. The title already shows that, in addition to an in-depth explanation of the field of conventional databases (relational model, SQL), the book also provides highly educational information about current advancements and related fields, the keywords being NoSQL or post-relational and Big Data. I am confident that the newest edition of this book will, once again, be well received by both students and professionalsits authors are quite familiar with both groups.
Carl August Zehnder
Preface
It is remarkable how stable some concepts are in the field of databases. Information technology is generally known to be subject to rapid development, bringing forth new technologies at an unbelievable pace. However, this is only superficially the case. Many aspects of computer science do not essentially change at all. This includes not only the basics, such as the functional principles of universal computing machines, processors, compilers, operating systems, databases and information systems, and distributed systems, but also computer language technologies such as C, TCP/IP, or HTML, which are decades old but in many ways provide a stable fundament of the global, earth-spanning information system known as the World Wide Web. Likewise, the SQL language has been in use for over four decades and will remain so in the foreseeable future. The theory of relational database systems was initiated in the 1970s by Codd (relation model and normal forms), Chen (entity and relationship model) and Chamberlin and Boyce (SEQUEL). However, these technologies have a major impact on the practice of data management today. Especially, with the Big Data revolution and the widespread use of data science methods for decision support, relational databases, and the use of SQL for data analysis are actually becoming more important. Even though sophisticated statistics and machine learning are enhancing the possibilities for knowledge extraction from data, many if not most data analyses for decision support rely on descriptive statistics using SQL for grouped aggregation. In that sense, although SQL database technology is quite mature, it is more relevant today than ever.