Table of Contents
List of Tables
- Chapter 1
- Chapter 2
- Chapter 3
- Chapter 4
- Chapter 5
- Chapter 6
- Chapter 7
- Chapter 10
List of Illustrations
- Introduction
- Chapter 1
- Chapter 2
- Chapter 3
- Chapter 4
- Chapter 5
- Chapter 6
- Chapter 7
- Chapter 8
- Chapter 9
- Chapter 10
Guide
Pages
Foundations of Data Intensive Applications
Large Scale Data Analytics under the Hood
Supun Kamburugamuve
Saliya Ekanayake
Copyright 2021 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
ISBN: 978-1-119-71302-9
ISBN: 978-1-119-71303-6 (ebk)
ISBN: 978-1-119-71301-2 (ebk)
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com
. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission
.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com
.
Library of Congress Control Number: 2021942305
Trademarks: WILEY and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
Cover images: Makstorm/Getty Images
Cover design: Wiley
To my wife Chathuri, son Seth, daughter Nethuki, and our parents.
Supun Kamburugamuve
To my wife Kalani, and two sons, Neth and Senuth, and our parents.
Saliya Ekanayake
About the Authors
Supun Kamburugamuve has a PhD in computer science from Indiana University Bloomington. For his thesis, he researched improving the performance of data-intensive applications with Professor Geoffrey C. Fox. Supun created Twister2 and co-created Cylon projects that are aimed at high-performance data-intensive applications. His research work is published in recognized conferences and journals. Supun is an elected member of the Apache Software Foundation and has contributed to many open source projects including Apache Web Services projects and Apache Heron. Before joining Indiana University, Supun worked on middleware systems and was a key member of the WSO2 ESB project, which is a widely used open source enterprise integration solution. Supun has presented his ideas and findings at research conferences and technical conferences including Strata NY, Big Data Conference, and ApacheCon.
Saliya Ekanayake is a senior software engineer at Microsoft. He is part of the Cloud Accelerated Systems & Technologies (CAST) group that is developing high-performance machine learning systems. Before joining Microsoft, Saliya was a postdoctoral fellow at Berkeley Lab, specializing in improving the performance of large-scale machine learning systems. He holds a PhD in computer science from Indiana University Bloomington, where his research contributed to the development of SPIDAL, a scalable, parallel, and interoperable data analytics library that outperformed existing big data systems on several machine learning applications. After his PhD, Saliya also worked on designing large-scale graph analytics systems and algorithms at Virginia Tech. His work has been published in recognized conferences and journals, with more than 20 publications to his name. Saliya is also an Apache committer for the Apache Synapse project.
About the Editor
Thomas Wiggins is a freelance proofreader and editor. He holds a BA in fine arts and theatre/drama from Indiana University, as well as an MS in media arts and science from Indiana University/Purdue University Indianapolis. For the past nine years, Mr. Wiggins has done proofreading work on scientific papers submitted to conferences and journals around the world, as well as offering his services pro bono for amateur writers. In 2011, he helped in the creation of e-humanity.org, a federal grant-funded online repository for the Native Tribal collections of several museums, including the Smithsonian. He currently is an employee of Cook Inc.
Acknowledgments
This book presents the ideas and work of countless software engineers and researchers over many years. We thank them for their hard work that helped us to write this book. The open source software community has made data-intensive applications popular and easily accessible to the public. We would like to thank the Apache Software Foundation for producing some of the best open source communities that have built wonderful frameworks for data-intensive applications. Many other open source communities are building these amazing products; some notables ones are Pandas, Numpy, PyTorch, Tensorflow, OpenMPI, and Kubernetes.
Our thanks also go out to members of the digital science center at Indiana University, whose work has influenced the content of this research. We both had the privilege to work with our thesis advisor, Distinguished Professor Geoffrey C. Fox at Indiana University Bloomington, who has been a key driving force behind high-performance data-intensive computing. The work we did with him was a great inspiration for the book.
We would like to thank Chathura Widanage, Niranda Perera, Pulasthi Wickramasinghe, Ahmet Uyar, Gurhan Gundez, Kannan Govindarajan, and Selahattin Akkas at the Digital Science Center of Indiana University. The work we did and the software we developed together were a great motivation for the book. We would like to thank Thejaka Kanewala for the wonderful conversations we had on data-intensive applications. We would like to thank Thejaka Kanewala for the wonderful conversations we had on data-intensive applications and Jaliya Ekanayake for the feedback on the book.
Next page