Hands-On Machine Learning Recommender Systems with Apache Spark
Build a real Artificial Intelligence solution with real data
Change the world with Machine Learning
Nesto.TV and ConsultantsNetwork.com
Ernesto Lee, MS
http://www.Nesto.TV
http://www.LearningVoyage.com
Fort Lauderdale, Florida
Panama City, Panama
Hands-On Machine Learning Recommender Systems with Apache Spark
Build a real Artificial Intelligence solution with real data
Copyright 2020 Consultants Network
All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law. For permission requests, write to the publisher, addressed Attention: Permissions Coordinator, at the address below.
Nesto.TV
1918 Harrison Street, Suite 215
Hollywood, FL 33020
www.Nesto.TV
Ordering Information:
Quantity sales. Special discounts are available on quantity purchases by corporations, associations, and others. For details, contact the publisher at the address above.
Orders by U.S. trade bookstores and wholesalers.
Printed in the United States of America
[TITLE] : [SUBTITLE] / Ernesto Lee, Nesto.TV
ISBN [ISBN NUMBER HERE]
1. The main category of the book Software. 2. Subject category Programming
First Edition
http://www.learningvoyage.com
http://www.consultantsnetwork.com
http://www.nesto.tv
Authors
Ernesto Lee
Uzair Syed
Reviewers
Eric Johnson, Addison Jones, Larry Watkins
Project Team Leader
Ernesto Lee
Technical Editor
Ernesto Lee
Editorial Team Leader
Ernesto Lee
FORWARD
Those with the ability to solve problems and think from a solution-oriented approach will always be able to thrive and grow in our industry. While the products that we work on over the years will inevitably become stale and overcome by newer technologies, the drive to be better today than we were yesterday will always keep us moving in the right direction.
I am sure that you will find this book to be more reference than theoretical. As a result, it is intended to be used as guide that shows you exactly HOW to perform tasks while at the same time providing context. I hope you find this book to be useful.
Ernesto Lee
WHO WE ARE
Nesto.TV and Learning Voyage
Ernesto Lee : Holds a Masters Degree in Software Systems Engineering from Virginia Tech and a Bachelors Degree in Physics from Old Dominion University. He is presently completing his Doctorate Degree from Nova Southeastern University. Ernesto is responsible for working with organizations to enable them to realize the full business benefits of artificial intelligence and big data in solving complex business problems
Ernesto has been involved in several largescale projects; he has consulted for several large companies in different domains like Healthcare, Banking, Manufacturing, and Retail .
Ernesto and his team have extensive experience and expertise in implementing business solutions for customers that leverage the right technology. Aside from being a full time student, he is the founder of LearningVoyage.com and Nesto.TV were he focuses on EdTech solutions in Machine Learning, Blockchain, Microservices, and Information Security.
Acknowledgement
Shirley L. Jones, Tyrone V. Lee, Devita Vanae Evans gone but never forgotten
Write for Us
Nesto.TV and Learning Voyage continue to look for authors with both technical expertise and the ability to explain. We are currently looking for authors in the Machine Learning, Blockchain, Microservices, and Cybersecurity space but we are open to entertaining products in interesting verticals.
We are interested in working with you if you are an expert in your field first and foremost and you have the ability to deliver quality, original work. We definitely work with you to make sure that your project is as successful as it can be but that all starts with you contacting us at:
support@LearningVoyage.com
Feel free to write with queries but to maximize our interaction, please provide:
- Contact information
- A table of contents (of course)
- Resume
- Why is it that you are qualified to write this book
- Why would this book sell in the market
- A writing sample
Table of Contents
CHAPTER 1: INTRODUCTION TO BIG DATA & AI
Theory
This chapter is intended to provide a comprehensive introduction to recommender systems using Apache Spark / Machine Learning. Before we begin with recommender systems using Apache Spark, lets have a brief overview of Big Data. To better understand Spark, we should know a little bit of history before the advent of Spark. We shall be looking at a quick introduction to Hadoop and MapReduce before we look at Spark.
An Overview of Big Data
Quick Introduction to Hadoop
Apache Hadoop is an open source distributed framework that allows storage and processing of large data (Big Data) sets across a cluster of commodity machines. Hadoop overcomes the traditional limitations of storing and computing of data by distributing the data over cluster of commodity machines making it scalable and cost-effective.
The idea of Hadoop was originated when Google released a white paper about the Google File System (GFS) - a computing model built by Google which was designed to provide efficient, reliable access to data using large clusters of commodity hardware. The model was then adopted by Doug Cutting and Mike Cafarella for their search engine called Nutch. Hadoop was then developed to support distribution for the Nutch search engine project by Doug Cutting and Mike Cafarella. It is often asked, what does the name Hadoop mean? There is no significance for the name and it is not an acronym either. Hadoop is the name that Doug Cuttings son gave to his yellow stuffed elephant. The name is very unique, and easy to remember. Not only does the name Hadoop have no real significance but also its sub-projects tend to have such names which are based on names of animals like Pig for the same reasons. They are unique, not used anywhere else and are easy to remember.
Why Hadoop?
Companies today have been realizing that there is lot of information in unstructured documents spread across the network. A lot of data is available in the form of spreadsheets, text files, e-mails, logs, PDFs and other data formats that contain valuable information which can help discover new trends, designing new products, improving existing products, knowing customers better and many other reasons. Data is increasing at a staggering rate, beyond limits never before seen and there are no signs of slowing down. To deal with such data, we need a reliable and low-cost tool to meaningfully process it. Therefore, we use Hadoop. Hadoop helps us process all this Big Data which is present in a variety of formats reliably, faster, with more flexible and in a cost-effective way.
Let us see why Hadoop is so popular and what it has in store for you.
Scalable: Hadoop is scalable, meaning; you can just start from a single node server and eventually increase more nodes as you need more storage and more computing power.
Fault-Tolerant: Hadoop helps prevent the loss of data. All the data which is stored in Hadoop Distributed File System is broken into blocks and stored with a default replication factor of 3. While processing data, if a node goes offline, the process continues as the data still exists in other nodes.
Next page