Preface
Pethuru Raj, Ph.D ; Ganesh C. Deka
In the IoT-driven Bigdata world, the traditional SQL database is incapable of accommodating multistructured data. As the data size becomes massive, the common interactive processing of data becomes difficult. Hence the concept of NoSQL databases is drawing a huge support from both practitioners and academicians these days. NoSQL databases are capable of tackling massive volumes of polystructured data. These specially crafted databases aka NoSQL databases are intrinsically designed to meet the huge data storage, processing, and analysis requirements of Bigdata. This edited book titled A Deep Dive Into NoSQL Databases: The Use Cases and Applications is a collection of highly curated chapters on diverse topics in NoSQL databases, contributed by accomplished academicians, researchers, and database professionals from IT companies.
The introductory chapter of the book titled A Detailed Analysis of NoSQL and NewSQL Databases for Bigdata Analytics and Distributed Computing is primarily prepared to tell all about the various NoSQL and NewSQL databases and how they come handy in augmenting, accelerating, and automating the highly complicated phenomenon of next-generation data analytics.
titled NewSQL Databases and Scalable In-Memory Analytics deliberates upon the prospects of in-memory NewSQL databases in Bigdata analytics.
Indexing of a huge number of webpages requires a cluster with several petabytes of disk space. Since the NoSQL databases are highly scalable, use of NoSQL database for storing the web crawler data is increasing along with the surging popularity of NoSQL databases. titled NoSQL Web Crawler Application discusses about the prospects and application of NoSQL databases in web crawler application to store and analyze data collected by the web crawler.
titled NoSQL Security discusses the various security threats of NoSQL databases, security architecture of NoSQL databases, and the steps that can be taken to secure the NoSQL database.
Amazon, Facebook, Google, and other reputed IT corporate houses use the world wide web as a large distributed data repository. This large data repository on the web cannot be processed with traditional RDBMS systems. titled Comparative Study of Different In-Memory (No/New) SQL Databases discusses about the application of in-memory databases with advanced data processing techniques for distributed data repository.
titled NoSQL Hands On explains how to Download , Install , and Use some of the mostly used open source NoSQL database.
Trillions of digitized objects and connected devices and millions of Polyglot-persistent software services are interacting with one another locally as well as over variety of networks . Apache Hadoop Ecosystem technologies and tools are the best way forward to squeeze out the relevant knowledge from these interconnected devices and software services. titled The Hadoop Ecosystem Technologies and Tools details about the emerging technologies and platforms under the Apache Hadoop Ecosystem .
titled Biological Big Data Analytics presents various Bigdata analytics tools for bioinformatics systems and some use cases of healthcare information system (HIS).
titled NoSQL Polyglot Persistence is about the application of Polyglot persistence in e-Commerce and Healthcare. This chapter also discusses the prospects of NoSQL database on Polyglot persistent software development and the research trends in NoSQL Polyglot persistent.
The salient feature of the book also includes lots of references for advance reading for researchers, use cases for practitioners, and hands-on for students and practitioners. In a nutshell, this book is stuffed with a number of elegantly written chapters describing the ways and means of surmounting the above-mentioned challenges. This book will be useful for researchers , PG as well as UG students of Computer Science, and practitioner s.
Chapter One
A Detailed Analysis of NoSQL and NewSQL Databases for Bigdata Analytics and Distributed Computing
Pethuru Raj Site Reliability Engineering (SRE) Division, Reliance Jio Infocomm. Ltd. (RJIL), Bangalore, India
Abstract
In the recent past, different database management systems are emerging and evolving fast in order to systematically and spontaneously tackle the growing varieties and vagaries of data structures, schemas, sizes, speeds, and scopes. That is, all kinds of data have to be carefully and consciously collected, cleansed, and crunched in order to squeeze out actionable and timely insights out of exponentially exploding data heaps. Data is turning out be a strategic asset for any growing and glowing organization across the world in order to devise perfect and precise strategies and roll out correct activities in time with all the clarity, continuity, and confidence. The noteworthy point here is that you cannot throw out any data (internal as well as external) as data can potentially emit viable and venerable information and insights when subjected to decisive and deeper investigations. Not only for data transformation, ingestion, mining, processing, and analytics but also for effective data engineering, management, governance, representation, exchange, persistence, and science, we need efficient technologies, tools, and tips. As we are heading toward the dreamt knowledge era, the role and responsibility of new-generation database systems are bound to escalate in the days ahead. This chapter is primarily prepared to tell all about the various NoSQL and NewSQL databases and how they come handy in augmenting, accelerating, and automating the highly complicated phenomenon of next-generation data analytics.
Keywords
Big data analytics; Hadoop; Spark; In-memory computing; NoSQL databases; NewSQL databases; HBase
1 The Emergence of the Digital Era
The aspect of automation is productively pervasive and penetrative these days. With IT being recognized as the most prominent and dominant enabler of every business vertical, it is being made to consciously embark on the highly fruitful automation spree. For example, the cloud space is being continuously stuffed and sandwiched with a bevy of automation technologies and tools to make elastic, programmable, and workload-aware IT infrastructures. There are competent techniques and tools for realizing virtualized IT environments quickly and easily. Similarly, there are advanced software solutions for beneficially integrating different and distributed cloud environments. Cloud orchestration tools are very popular for setting up cloud environments in an automated and augmented manner. Cloud broker solutions are fast emerging and evolving in order to pinpoint perfect cloud solutions and services in the increasingly software-defined multicloud era. In addition, there are other abstractions and articulations for autonomic, self-servicing, and federated clouds.