Felipe Gutierrez
Spring Cloud Data Flow
Native Cloud Orchestration Services for Microservice Applications on Modern Runtimes
1st ed.
Logo of the publisher
Felipe Gutierrez
Albuquerque, NM, USA
Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the books product page, located at www.apress.com/9781484212400 . For more detailed information, please visit http://www.apress.com/source-code .
ISBN 978-1-4842-1240-0 e-ISBN 978-1-4842-1239-4
https://doi.org/10.1007/978-1-4842-1239-4
Felipe Gutierrez 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Distributed to the book trade worldwide by Apress Media, LLC, 1 New York Plaza, New York, NY 10004, U.S.A. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
In memory of my grandparents: Juan Gutierrez and Baciliza Rivero and Simon Cruz and Elvira Galindo. I love you and miss you guys!
Acknowledgments
I would like to express all my gratitude to the Apress team: Steve Anglin for accepting my proposal, Mark Powers for keeping me on track and for his patience, and the rest of the Apress team involved in this project. Thanks to everybody for making this possible.
Thanks to my technical reviewer, Manuel Jordan, for all the details and effort in his reviews, and the entire Spring team for creating this amazing technology.
I want to dedicate this book to my grandparents: Juan Gutierrez and Baciliza Rivero on my dads side, and Simon Cruz and Elvira Galindo on my moms side. Thanks for being part of my life. I miss you so much.
Felipe Gutierrez
Table of Contents
Part I: Introductions
Part II: Spring Cloud Data Flow: Internals
About the Author
Felipe Gutierrez
is a cloud solutions software architect, with a bachelors degree and a masters degree in computer science from Instituto Tecnlogico y de Estudios Superiores de Monterrey, Ciudad de Mexico. With more than 25 years of IT experience, he has developed programs for companies in multiple vertical industries, including government, retail, health care, education, and banking. Currently, he works as a senior cloud application architect for IBM, specializing in Red Hat OpenShift, IBM Cloud, app modernization, Cloud Foundry, Spring Framework, Spring Cloud Native applications, Groovy, and RabbitMQ, among other technologies. He has worked as a solutions architect for companies like VMware, Nokia, Apple, Redbox, and Qualcomm. Felipe is the author of Introducing Spring Framework (Apress, 2014), Pro Spring Boot (Apress, 2016), and Spring Boot Messaging (Apress, 2017).
About the Technical Reviewer
Manuel Jordan Elera
is an autodidactic developer and researcher who enjoys learning new technologies for his own experiments and creating new integrations. Manuel won the Springy AwardCommunity Champion and Spring Champion 2013. In his little free time, he reads the Bible and composes music on his guitar. Manuel is known as dr_pompeii. He has tech-reviewed numerous books for Apress, including Pro Spring, 4th Edition (2014), Practical Spring LDAP (2013), Pro JPA 2, Second Edition (2013), and Pro Spring Security (2013). You can read his 13 detailed tutorials about many Spring technologies and contact him through his blog at www.manueljordanelera.blogspot.com , and follow him on his Twitter account, @dr_pompeii .
Part I Introductions
Felipe Gutierrez 2021
F. Gutierrez Spring Cloud Data Flow https://doi.org/10.1007/978-1-4842-1239-4_1
1. Cloud and Big Data
The digital universe consists of an estimated 44 zettabytes of data. A zettabyte is 1 million petabytes, or 1 billion terabytes, or 1 trillion gigabytes. In 2019, Google processed approximately 3.7 million queries, YouTube recorded 4.5 million viewed videos, and Facebook registered 1 million logins every 60 seconds. Imagine the computer power to process all these requests, data ingestion, and data manipulation. Common sense tells us that the big IT companies use a lot of hardware to preserve data. A lot of storage needs to be incorporated to prevent limits of capacity.
How does an IT company deal with challenges like data overload, rising costs, or skill gaps? In recent years, big IT companies have heavily invested in developing strategies that use enterprise data warehouses (EDW) to serve as central data systems that report, extract, transform, and load (ETL) processes from different sources. Today, both users and devices (thermostats, light bulbs, security cameras, coffee machines, doorbells, seat sensors, etc.) ingest data.
Companies such as Dell, Intel, and Clouderato name a fewwork together to create hardware and storage solutions that help other companies grow and become faster and more scalable.
A Little Data Science
When we talk about data science , a team of scientists with PhD degrees comes to mind. They probably earn big bucks, and they dont rest because companies depend on them. What is a data scientists actual educational experience?
A few years ago, computing journals revealed that Spark and Scala skyrocketed in companies that wanted to apply data science with the addition of tools such as Hadoop, Kafka, Hive, Pig, Cassandra, D3, and Tableau.
Python has become one of the main programming languages for machine learning techniques, alongside R, Scala, and Java.