Building Big Data Applications
Table of Contents
Copyright
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright 2020 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publishers permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-12-815746-6
For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Mara Conner
Acquisition Editor: Mara Conner
Editorial Project Manager: Joanna Collett
Production Project Manager: Punithavathy Govindaradjane
Cover Designer: Mark Rogers
Typeset by TNQ Technologies
Dedication
Dedicated to all my teachers
Preface
In the world that we live in today it is very easy to manifest and analyze data at any given instance. Space a very insightful analytics is worth every executive's time to make decisions that impact the organization today and tomorrow. Space this analytics is what we call Big Data analytics since the year 2010, and our teams have been struggling to understand how to integrate data with the right metadata and master data in order to produce a meaningful platform that can be used to produce these insightful analytics.
Not only is the commercial space interested in this we also have scientific research and engineering teams very much wanting to study the data and build applications on top off at. The effort's taken to produce Big Data applications have been sporadic when measured in terms of success why is that a question that is being asked by folks across the industry. In my experience of working in this specific space, what I have realized is that we are still working with data which is lost in terms of volumes come on and it is produced very fast on demand by any consumer leading to metadata integration issues. This metadata integration issue can be handled if we make it an enterprise solution, and all renters in the space need not necessarily worry about their integration with a Big Data platform. This integration is handled through integration tools that have been built for data integration and transformation. Another interesting perspective is that while the data is voluminous and it is produced very fast it can be integrated and harvested as any enterprise data segment. We require the new data architecture to be flexible, and scalable to accommodate new additions, updates, and integrations in order to be successful in building a foundation platform. This data architecture will differ from the third normal and star schema forms that we built the data warehouse from. The new architecture will require more integration and just in time additions which are more represented by NoSQL database architecture's and how architectures do. How do we get this go to success factor? And how do we make the enterprise realize that new approaches are needed to ensure success and accomplishing the tipping point on a successful implementation.
Our executives are always known for asking questions about the lineage of data and its traceability. These questions today can be handled in the data architecture and engineering provided we as an enterprise take a few minutes to step back and analyze why our past journeys journeys were not successful enough, and how we can be impactful in the future journey delivering the Big Data application. The hidden secret here is resting in the farm off governance within the enterprise. Governance, it is not about measuring people it is about ensuring that all processes have been followed and completed as requirements and that all specifics are in place for delivering on demand lineage and traceability.
In writing this book there are specific points that have been discussed about the architecture and governance required to ensure success in Big Data applications. The goal of the book is to share the secrets that have been leveraged by different segments of people in their big data application projects and the risks that they had to overcome to become successful.
The chapters in the book present different types of scenarios that we all encounter, and in this process the goals of reproducibility and repeatability for ensuring experimental success has been demonstrated. If you ever wondered what the foundational difference in building a Big Data application is the foundational difference is that the datasets can be harvested and an experimental stage can be repeated if all of the steps are documented and implemented as specified into requirements. Any team that wants to become successful in the new world needs to remember that we have to follow governance and implement governance in order to become measurable. Measuring process completion is mandatory to become successful and as you read it in the book revisit this point and draw the highlights from.
In developing this book there are several discussions that I have had with teams from both commercial enterprises as well as research organizations and thank all contributors for that time and insights and sharing the endeavors, it did take time to ensure that all the relevant people across these teams were sought out and tipping point of failure what discussed in order to understand the risks that could be identified and avoided in the journey. There are several reference points that has been added to chapters and while the book is not all encompassing by any means it does provide any team that wants to understand how to build a Big Data application choices of how success can be accomplished as well as case studies that vendors have shared showcasing how companies have implemented technologies to build the final solution.