• Complain

Samiya Khan - Big Data and Analytics

Here you can read online Samiya Khan - Big Data and Analytics full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Chennai, year: 2021, publisher: Notion Press, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Samiya Khan Big Data and Analytics
  • Book:
    Big Data and Analytics
  • Author:
  • Publisher:
    Notion Press
  • Genre:
  • Year:
    2021
  • City:
    Chennai
  • Rating:
    5 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 100
    • 1
    • 2
    • 3
    • 4
    • 5

Big Data and Analytics: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Big Data and Analytics" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Big data is a state-of-the-art technology that revolutionizes system design and decision-making. On the other hand, Hadoop is a distributed framework that allows the effective management of big data. This book combines theoretical and practical facets of big data technology. The first few chapters provide a theoretical introduction to big data and Hadoop, with individual chapters covering different components of the Hadoop ecosystem. The rest of the book provides lab tutorials, giving basic working knowledge of the different components and how they can synergistically be used to develop a big data application.Key features of the book include It provides a background of the big data problem and introduces Hadoop in light of how it solves it. It covers all the processes of the big data lifecycle and the different components of Hadoop that serve these processes. It offers dedicated lab tutorials for installation and demonstration of the different components of the Hadoop ecosystem.

Samiya Khan: author's other books


Who wrote Big Data and Analytics? Find out the surname, the name of the author of the book and a list of all author's works by series.

Big Data and Analytics — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Big Data and Analytics" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make

Notion Press Media Pvt Ltd No 50 Chettiyar Agaram Main Road Vanagaram - photo 1Picture 2
Notion Press Media Pvt Ltd No. 50, Chettiyar Agaram Main Road,
Vanagaram, Chennai, Tamil Nadu 600 095 First Published by Notion Press 2022
Copyright Samiya Khan 2022
All Rights Reserved. eISBN 979-8-88530-488-7 This book has been published with all efforts taken to make the material error-free after the consent of the author. However, the author and the publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. While every effort has been made to avoid any mistake or omission, this publication is being sold on the condition and understanding that neither the author nor the publishers or printers would be liable in any manner to any person by reason of any mistake or omission in this publication or for any action taken or omitted to be taken or advice rendered or accepted on the basis of this work. For any defect in printing or binding the publishers will be liable only to replace the defective copy by another copy of this work then available.

PREFACE Big data is at the foundation of all of the megatrends that are happening today, from social to mobile to the cloud to gaming. Chris Lynch Data deluge has been one of the biggest concerns for the scientific community in recent times. As a result of mass digitization across domains and borders, the data pool became huge. Thus, the traditional systems for storage and processing were overwhelmed by this humongous data reservoir. Organizations were forced to decide whether to discard this data or archive it. A decision in favor of maintaining a data reserve can be attributed to the fact that this data is capable of steering many organization-level decisions and revolutionizing how problem-solving and decision-making are performed.

In a popular article by Chris Anderson, Editor, Wired Magazine, it was explained how the scientific method, in itself, has become obsolete in the big data era. The foundation of a scientific method for solving problems lies in proposing a hypothetical solution to a problem and collecting data to test the proposed solution. However, in the present time and age, we have lots of data, and instead of looking for solutions to problems, we are looking for problems that can be solved using the data heap available to us. Big data has changed the equation most remarkably. Since data is the heart and soul of this technology, its applicability is extensive. Therefore, Big data is being used in varied streams, from Geo-Statistics to Bioinformatics.

Personalized medicine and smart cities have been the most accepted real-life endeavors of big data technologies. As we traverse the journey of big data from the inception of this concept to its present-day position in the market, Hadoop has been a constant feature on the big data technologies list. Furthermore, Hadoop has also evolved with the changing demands and market understanding with solutions such as Spark to expand its capabilities and become a complete big data framework for application development. This book is a beginners guide to big data and Hadoop. It has been organized in such a manner that it describes the big data problem and need for Hadoop. Henceforth, all the components of the Hadoop ecosystem are individually covered.

Chapter 1 to Chapter 9 provides theory lessons, while Chapter 10 to Chapter 20 are lab tutorials that can be aligned with theory chapters to better understand the subject. The theory and practical coverage of the book shall help the reader connect the dots between the different processes of the big data lifecycle, facilitating the development of comprehensive solutions for complex big data problems. First, Chapter 1 introduces the big data problem and gives an eagles eye view of Hadoop, a distributed programming framework. Besides this, it also provides a comparison between Hadoop and RDBMS. Chapter 2 dwells deeper into the Hadoop Ecosystem and HDFS Architecture. There are two Hadoop versions available, Hadoop 1.x and Hadoop 2.x.

This chapter points out the differences between these two versions and describes the key components of the Hadoop Ecosystem. Storage and processing are the two main functions of Hadoop. In order to understand how Hadoop performs these functions, Cluster Architecture and YARN are explained in detail. In addition to this, basic Hadoop functions like data placement, reading, and writing are explained in detail with the help of sequence diagrams. Finally, it also mentions the cluster modes in which Hadoop operates and the configuration files that can be changed to modify the operational characteristics of the system. MapReduce programming paradigm is implemented using Hadoop.

First, the basics of this programming paradigm and its comparison with the conventional form of programming are described in Chapter 3. Then, the workflows for application execution and job submission are diagrammatically explained. Finally, a working example of MapReduce code is provided to better understand the concept. This chapter aims to help the reader understand how the MapReduce programming paradigm can be implemented in Hadoop using Java programming language. Chapter 4 builds on the foundations of MapReduce programming laid in the previous chapter. Advanced programming concepts like Input Splits, Partitioner, Combiner, Counters, and Input Formats are discussed in this chapter.

In addition, Map and Reduce Side Joins are also explained, along with a discussion on which type of Join operation must be implemented in what kind of a programming scenario. Lastly, this chapter demonstrates the need and use of MRUnit Testing Framework, a debugging tool commonly used for MapReduce code. Companies like Yahoo and Facebook use Hadoop at the backend to perform analysis of their data. However, their data analysts are not well versed with NoSQL and usually have little to no expert knowledge of advanced programming languages. As a result, abstraction tools like Pig and Hive have been developed to reduce the programming effort required for performing common data analytical tasks. Chapter 5 is dedicated to Pig, providing a complete theoretical background of the tool to the reader.

Topics like key characteristics, performance issues, limitations, and applications are covered. Besides this, the basics of Pig scripting are also elucidated. Although Pig and Hive are two tools solving the same purpose, they are two dissimilar solutions developed by different companies. The differences between Pig and Hive are elaborated upon in Chapter 6. Moreover, Hive is also compared with traditional RDBMS to provide a holistic view to the reader. Finally, Hive architecture, components, limitations, and scripting are explained to provide the reader with enough knowledge about the tool to get started with the practical aspects of the same.

Chapter 7 is a detailed description of NoSQL Databases and their classification. HBase is the NoSQL solution available as part of the Hadoop ecosystem. However, users can integrate other NoSQL solutions with Hadoop to achieve better performance. Because of this reason, NoSQL, as a complete topic, is discussed. The chapter focuses on HBase, covering its basic concepts, uses components, and storage architecture. Zookeeper is additionally covered as part of this chapter because of its inseparable association with HBase processes.

Lastly, basic working knowledge of HBase is provided to help the reader get started with its practical facets. Oozie is the top-most level component of the Hadoop ecosystem that allows developers to specify workflows or set of Hadoop tasks that need to be performed sequentially to perform a specific task. If a task requires repeated execution, then specifying the workflow and instructing Oozie to time the execution of the workflow as per development requirements can prove to be extremely performance-effective. Chapter 8 explains Oozie and its functional component, highlighting how jobs can be scheduled using this component of Hadoop. Hadoop is an effective framework for distributed data processing. However, it lacks the statistical and analytical capabilities required to perform complex big data issues.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Big Data and Analytics»

Look at similar books to Big Data and Analytics. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Big Data and Analytics»

Discussion, reviews of the book Big Data and Analytics and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.