Summary of changes
This section describes the technical changes made in this edition of the paper and in previous editions. This edition might also include minor corrections and editorial changes that are not identified.
Summary of Changes
for Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale
as created or updated on August 27, 2021.
August 2021, Minor updates
This revision includes the following new and changed information.
New and Changed information highlights
Updated CES protocol support in Hadoop environment. See .
Updated links from IBM Knowledge Center to IBM Documentation.
April 2021, Minor updates
This revision includes the following new and changed information.
New and Changed information highlights
Updated support for Data encryption at rest and in transit:
.
.
March 2021, Minor updates
This revision includes the following new and changed information.
New and Changed information highlights
Added support for non-HA NameNode and collocation of Hadoop services on the DataNode. Refer to the following sections:
Updated the following figures:
Minor updates denoted with change bars
Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale
This IBM Redpaper publication provides guidance on building an enterprise-grade data lake by using IBM Spectrum Scale and Cloudera Data Platform (CDP) Private Cloud Base for performing in-place Cloudera Hadoop or Cloudera Spark-based analytics. It also covers the benefits of the integrated solution and gives guidance about the types of deployment models and considerations during the implementation of these models.
Note: In January 2019, the Cloudera and Hortonworks merger completed. In June of 2019, IBM and Cloudera expanded partnership to include the entire Cloudera portfolio. CDP Private Cloud Base combines the best of Cloudera Distribution Hadoop (CDH) and Hortonworks Data Platform (HDP) functions and services. |
Cloudera Data Platform Private Cloud Base
CDP Private Cloud Base is the on-premises version of CDP. This new product combines the best of Cloudera Enterprise Data Hub and Hortonworks Data Platform Enterprise along with new features and enhancements across the stack. This unified distribution is a scalable and customizable platform where you can securely run many types of workloads.
CDP Private Cloud Base supports various hybrid solutions where compute tasks are separated from data storage and where data can be accessed from remote clusters, including workloads that are created by using CDP Private Cloud Experiences. This hybrid approach provides a foundation for containerized applications by managing storage, table schema, authentication, authorization, and governance.
CDP Private Cloud Base consists of various components, such as Apache Spark, Apache Hive 3, and Apache HBase, along with many other components for specialized workloads. You can select any combination of these services to create clusters that address your business requirements and workloads. Several pre-configured packages of services are also available for common workloads.
With CDP Private Cloud Base supporting a separation of compute and storage design, integrating with IBM Spectrum Scale provides the end-to-end solution to support the high demand workloads across different protocols. It also gives the ability to grow compute and storage requirements separately when doing analytics and AI in the same namespace.
IBM Spectrum Scale and Elastic Storage System
IBM Spectrum Scale is an industry-leading software for file and object storage. It can be deployed as a software-defined storage management solution that effectively meets the demands of AI, big data, analytics, and high-performance computing workloads. It has market leading performance and scalability, and a wealth of sophisticated data management capabilities.
IBM Elastic Storage System (ESS) is a fully integrated and tested Spectrum Scale storage building block that provides superb enterprise performance, reliability, availability, and serviceability. ESS is an optimum way to deploy Spectrum Scale storage for most Spectrum Scale use cases.
Integrated solution overview
CDP Private Cloud extends cloud-native speed, simplicity, and economics for the connected data lifecycle to the data center. It enables IT to respond to business needs faster and deliver rock-solid service levels so that users can be more productive with data.
CDP Private Cloud Base brings business value to enterprises by analyzing their disparate data sources and deriving actionable insights from them. This analytics journey typically starts with consolidation of different data silos to form an Active Archive . The Active Archive is then used to get a single view of the customer and perform further predictive analytics on them.
With IBM Spectrum Scale, clients can build highly scalable and globally distributed data lakes to form their Active Archives. IBM Spectrum Scale becomes the storage layer for your CDP Private Cloud Base environment as an alternative to native Hadoop Distributed File System (HDFS). It supports the access of the data by using HDFS Remote Procedure Calls (RPC) and is not apparent to the applications that use CDP Private Cloud Base. With IBM Spectrum Scale, you get more flexible deployment models for your storage system that help you optimize infrastructure costs.
IBM Spectrum Scale and CDP Private Cloud Base were first certified with IBM Spectrum Scale V5.1 and CDP 7. Since the first certification, CDP Private Cloud Base and IBM signed an agreement to certify both the products on an ongoing basis for their new releases and keep the certification current. (For more information about certified software levels, see .) This certification is for IBM Spectrum Scale software and applies to all deployment models of IBM Spectrum Scale, including IBM Elastic Storage System.
Benefits of integration
The following top benefits are realized by using IBM Spectrum Scale with CDP Private Cloud Base:
Extreme scalability with parallel file system architecture
IBM Spectrum Scale is a parallel architecture. With a parallel architecture, no single metadata node can become a bottleneck. Every node in the cluster can serve both data and metadata, which enables a single IBM Spectrum Scale file system to store billions of files. This architecture enables clients to grow their CDP Private Cloud Base environments seamlessly as the data grows. Also, one of the key value propositions of IBM Spectrum Scale, especially with IBM Elastic Storage System (ESS), is running diverse and demanding workloads, plus the ability to tier down to Active Archive.
A global namespace that can span multiple Hadoop clusters and geographical areas
Using IBM Spectrum Scale global namespace, clients can create active, remote data copies and enable real-time, global collaboration. This namespace enables global organizations to form data lakes across the globe, and host their distributed data under one namespace.
IBM Spectrum Scale also enables multiple Hadoop clusters to access a single file system while still providing all the required data isolation semantics.