LitArk » Books » Politics

Quinto - Next-generation big data a practical guide to Apache Kudu, Impala, and Spark

Here you can read online Quinto - Next-generation big data a practical guide to Apache Kudu, Impala, and Spark full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Berkeley;CA;New York;Place of publication not identified, year: 2018, publisher: Apress, Distributed to the book trade worldwide by Springer Science+Business Media New York, genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Next-generation big data a practical guide to Apache Kudu, Impala, and Spark
Author:
Quinto / Butch
Publisher:
Apress, Distributed to the book trade worldwide by Springer Science+Business Media New York
Genre:
Books / Politics
Year:
2018
City:
Berkeley;CA;New York;Place of publication not identified
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Next-generation big data a practical guide to Apache Kudu, Impala, and Spark: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Next-generation big data a practical guide to Apache Kudu, Impala, and Spark" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies. --

Quinto: author's other books

Who wrote Next-generation big data a practical guide to Apache Kudu, Impala, and Spark? Find out the surname, the name of the author of the book and a list of all author's works by series.

Next-generation big data a practical guide to Apache Kudu, Impala, and Spark — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Next-generation big data a practical guide to Apache Kudu, Impala, and Spark" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Butch Quinto 2018

Butch Quinto Next-Generation Big Data

1. Next-Generation Big Data

Butch Quinto 1

(1)

Plumpton, Victoria, Australia

Despite all the excitement around big data, the large majority of mission-critical data is still stored in relational database management systems . This fact is supported by recent studies online and confirmed by my own professional experience working on numerous big data and business intelligence projects. Despite widespread interest in unstructured and semi-structured data, structured data still represents a significant percentage of data under management for most organizations, from the largest corporations and government agencies to small businesses and technology start-ups. Use cases that deals with unstructured and semi-structured data, while valuable and interesting, are few and far between. Unless you work for a company that does a lot of unstructured data processing such as Google, Facebook, or Apple, you are most likely working with structured data.

Big data has matured since the introduction of Hadoop more than 10 years ago. Take away all the hype, and it is evident that structured data processing and analysis has become the next-generation killer use case for big data. Most big data, business intelligence, and advanced analytic use cases deal with structured data. In fact, some of the most popular advances in big data such as Apache Impala, Apache Phoenix, and Apache Kudu as well as Apache Sparks recent emphasis on Spark SQL and DataFrames API are all about providing capabilities for structured data processing and analysis. This is largely due to big data finally being accepted as part of the enterprise. As big data platforms improved and gained new capabilities, they have become suitable alternatives to expensive data warehouse platforms and relational database management systems for storing, processing, and analyzing mission-critical structured data.

About This Book

This book is for business intelligence and data warehouse professionals who are interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu , Apache Impala , and Apache Spark . Experienced big data professionals who would like to learn more about Kudu and other advanced enterprise topics such as real-time data ingestion and complex event processing, Internet of Things (IoT) , distributed in-memory computing, big data in the cloud, big data governance and management, real-time data visualization, data wrangling, data warehouse optimization, and big data warehousing will also benefit from this book.

I assume readers will have basic knowledge of the various components of Hadoop. Some knowledge of relational database management systems, business intelligence, and data warehousing is also helpful. Some programming experience is required if you want to run the sample code provided. I focus on three main Hadoop components: Apache Spark, Apache Impala, and Apache Kudu.

Apache Spark

Apache Spark is the next-generation data processing framework with advanced in-memory capabilities and a directed acyclic graph (DAG) engine. It can handle interactive, real-time, and batch workloads with built-in machine learning, graph processing, streaming, and SQL support. Spark was developed to address the limitation of MapReduce. Spark can be 10100x faster than MapReduce in most data processing tasks. It has APIs for Scala, Java, Python, and R. Spark is one of the most popular Apache projects and is currently used by some of the largest and innovative companies in the world. I discuss Apache Spark in Chapter .

Apache Impala

Apache Impala is a massively parallel processing (MPP) SQL engine designed to run on Hadoop platforms . The project was started by Cloudera and eventually donated to the Apache Software Foundation. Impala rivals traditional data warehouse platforms in terms of performance and scalability and was designed for business intelligence and OLAP workloads. Impala is compatible with some of the most popular BI and data visualization tools such as Tableau, Qlik, Zoomdata, Power BI, and MicroStrategy to mention a few. I cover Apache Impala in Chapter .

Apache Kudu

Apache Kudu is a new mutable columnar storage engine designed to handle fast data inserts and updates and efficient table scans, enabling real-time data processing and analytic workloads. When used together with Impala, Kudu is ideal for Big Data Warehousing, EDW modernization, Internet of Things (IoT) , real-time visualization, complex event processing, and feature store for machine learning. As a storage engine, Kudus performance and scalability rivals other columnar storage format such as Parquet and ORC. It also performs significantly faster than Apache Phoenix with HBase. I discuss Kudu in Chapter .

Navigating This Book

This book is structured in easy-to-digest chapters that focus on one or two key concepts at a time. Chapters can be read in any order depending on your interest. The chapters are filled with practical examples and step-by-step instructions. Along the way, youll find plenty of practical information on best practices and advice that will steer you to the right direction in your big data journey.

Chapter Next-Generation Big Data provides a brief introduction about the contents of this book.

Chapter Introduction to Kudu provides an introduction to Apache Kudu, starting with a discussion of Kudus architecture. I talk about various topics such as how to access Kudu from Impala, Spark, and Python, C++ and Java using the client API. I provide details on how to administer, configure, and monitor Kudu, including backup and recovery and high availability options for Kudu. I also discuss Kudus strength and limitations, including practical workarounds and advice.

Chapter Introduction to Impala provides an introduction to Apache Impala. I discuss Impalas technical architecture and capabilities with easy-to-follow examples. I cover details on how to perform system administration, monitoring, and performance tuning.

Chapter High Performance Data Analysis with Impala and Kudu covers Impala and Kudu integration with practical examples and real-world advice on how to leverage both components to deliver a high performance environment for data analysis. I discuss Impala and Kudus strength and limitations, including practical workarounds and advice.

Chapter Introduction to Spark provides an introduction to Apache Spark. I cover Sparks architecture and capabilities, with practical explanations and easy-to-follow examples to help you get started with Spark development right away.

Chapter High Performance Data Processing with Spark and Kudu covers Spark and Kudu integration with practical examples and real-world advice on how to use both components for large-scale data processing and analysis.

Chapter Batch and Real-Time Data Ingestion and Processing covers batch and real-time data ingestion and processing using native and third-party commercial tools such as Flume, Kafka, Spark Streaming, StreamSets, Talend, Pentaho, and Cask. I provide step-by-step examples on how to implement complex event processing and the Internet of Things (IoT).

Chapter Big Data Warehousing covers designing and implementing star and snowflake dimensional models with Impala and Kudu. I talk about how to utilize Impala and Kudu for data warehousing including its strengths and limitations. I also discuss EDW modernization use cases such as data consolidation, data archiving, and analytics and ETL offloading.

Chapter Big Data Visualization and Data Wrangling discusses real-time data visualization and wrangling tools designed for extremely large data sets with easy-to-follow examples and advice.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Next-generation big data a practical guide to Apache Kudu, Impala, and Spark»

Look at similar books to Next-generation big data a practical guide to Apache Kudu, Impala, and Spark. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Hans Hultgren

Modeling the Agile Data Warehouse with Data Vault: Volume 1

DasGupta

Practical Big Data Analytics: Hands-on techniques to implement enterprise analytics and machine learning using Hadoop, Spark, NoSQL and R

Spaggiari

Getting started with Kudu: perform fast analytics on fast data

Butch Quinto

Next-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More

Greg Foss

Practical Data Science with SAP: Machine Learning Techniques for Enterprise Data

Butch Quinto

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

Jean-Marc Spaggiari

Getting Started with Kudu: Perform Fast Analytics on Fast Data

Reza Rad

Microsoft SQL Server 2014 Business Intelligence Development Beginner's Guide

John Russell

Getting Started with Impala: Interactive SQL for Apache Hadoop

Rick van der Lans

Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses

Krish Krishnan

Data warehousing in the age of big data

Stephan Kudyba

Data Mining and Business Intelligence: A Guide to Productivity

Reviews about «Next-generation big data a practical guide to Apache Kudu, Impala, and Spark»

Discussion, reviews of the book Next-generation big data a practical guide to Apache Kudu, Impala, and Spark and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.