• Complain

Alberto Boschetti - Large Scale Machine Learning with Python

Here you can read online Alberto Boschetti - Large Scale Machine Learning with Python full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2016, publisher: Packt Publishing, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Alberto Boschetti Large Scale Machine Learning with Python

Large Scale Machine Learning with Python: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Large Scale Machine Learning with Python" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: First Steps to Scalability -- Explaining scalability in detail -- Making large scale examples -- Introducing Python -- Scale up with Python -- Scale out with Python -- Python for large scale machine learning -- Choosing between Python 2 and Python 3 -- Installing Python -- Step-by-step installation -- The installation of packages -- Package upgrades -- Scientific distributions -- Introducing Jupyter/IPython -- Python packages -- NumPy -- SciPy -- Pandas -- Scikit-learn -- The matplotlib package -- Gensim -- H2O -- XGBoost -- Theano -- TensorFlow -- The sknn library -- Theanets -- Keras -- Other useful packages to install on your system -- Summary -- Chapter 2: Scalable Learning in Scikit-learn -- Out-of-core learning -- Subsampling as a viable option -- Optimizing one instance at a time -- Building an out-of-core learning system -- Streaming data from sources -- Datasets to try the real thing yourself -- The first example - streaming the bike-sharing dataset -- Using pandas I/O tools -- Working with databases -- Paying attention to the ordering of instances -- Stochastic learning -- Batch gradient descent -- Stochastic gradient descent -- The Scikit-learn SGD implementation -- Defining SGD learning parameters -- Feature management with data streams -- Describing the target -- The hashing trick -- Other basic transformations -- Testing and validation in a stream -- Trying SGD in action -- Summary -- Chapter 3: Fast SVM Implementations -- Datasets to experiment with on your own -- The bike-sharing dataset -- The covertype dataset -- Support Vector Machines -- Hinge loss and its variants -- Understanding the Scikit-learn SVM implementation -- Pursuing nonlinear SVMs by subsampling

Achieving SVM at scale with SGD -- Feature selection by regularization -- Including non-linearity in SGD -- Trying explicit high-dimensional mappings -- Hyperparameter tuning -- Other alternatives for SVM fast learning -- Nonlinear and faster with Vowpal Wabbit -- Installing VW -- Understanding the VW data format -- Python integration -- A few examples using reductions for SVM and neural nets -- Faster bike-sharing -- The covertype dataset crunched by VW -- Summary -- Chapter 4: Neural Networks and Deep Learning -- The neural network architecture -- What and how neural networks learn -- Choosing the right architecture -- The input layer -- The hidden layer -- The output layer -- Neural networks in action -- Parallelization for sknn -- Neural networks and regularization -- Neural networks and hyperparameter optimization -- Neural networks and decision boundaries -- Deep learning at scale with H2O -- Large scale deep learning with H2O -- Gridsearch on H2O -- Deep learning and unsupervised pretraining -- Deep learning with theanets -- Autoencoders and unsupervised learning -- Autoencoders -- Summary -- Chapter 5: Deep Learning with TensorFlow -- TensorFlow installation -- TensorFlow operations -- GPU computing -- Linear regression with SGD -- A neural network from scratch in TensorFlow -- Machine learning on TensorFlow with SkFlow -- Deep learning with large files - incremental learning -- Keras and TensorFlow installation -- Convolutional Neural Networks in TensorFlow through Keras -- The convolution layer -- The pooling layer -- The fully connected layer -- CNNs with an incremental approach -- GPU Computing -- Summary -- Chapter 6: Classification and Regression Trees at Scale -- Bootstrap aggregation -- Random forest and extremely randomized forest -- Fast parameter optimization with randomized search -- Extremely randomized trees and large datasets

CART and boosting -- Gradient Boosting Machines -- max_depth -- learning_rate -- Subsample -- Faster GBM with warm_start -- Training and storing GBM models -- XGBoost -- XGBoost regression -- XGBoost and variable importance -- XGBoost streaming large datasets -- XGBoost model persistence -- Out-of-core CART with H2O -- Random forest and gridsearch on H2O -- Stochastic gradient boosting and gridsearch on H2O -- Summary -- Chapter 7: Unsupervised Learning at Scale -- Unsupervised methods -- Feature decomposition - PCA -- Randomized PCA -- Incremental PCA -- Sparse PCA -- PCA with H2O -- Clustering - K-means -- Initialization methods -- K-means assumptions -- Selection of the best K -- Scaling K-means - mini-batch -- K-means with H2O -- LDA -- Scaling LDA - memory, CPUs, and machines -- Summary -- Chapter 8: Distributed Environments - Hadoop and Spark -- From a standalone machine to a bunch of nodes -- Why do we need a distributed framework? -- Setting up the VM -- VirtualBox -- Vagrant -- Using the VM -- The Hadoop ecosystem -- Architecture -- HDFS -- MapReduce -- YARN -- Spark -- pySpark -- Summary -- Chapter 9: Practical Machine Learning with Spark -- Setting up the VM for this chapter -- Sharing variables across cluster nodes -- Broadcast read-only variables -- Accumulators write-only variables -- Broadcast and accumulators together - an example -- Data preprocessing in Spark -- JSON files and Spark DataFrames -- Dealing with missing data -- Grouping and creating tables in-memory -- Writing the preprocessed DataFrame or RDD to disk -- Working with Spark DataFrames -- Machine learning with Spark -- Spark on the KDD99 dataset -- Reading the dataset -- Feature engineering -- Training a learner -- Evaluating a learners performance -- The power of the ML pipeline -- Manual tuning -- Cross-validation -- Final cleanup -- Summary

Appendix: Introduction to GPUs and Theano -- GPU computing -- Theano - parallel computing on the GPU -- Installing Theano -- Index Read more...
Abstract: Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: First Steps to Scalability -- Explaining scalability in detail -- Making large scale examples -- Introducing Python -- Scale up with Python -- Scale out with Python -- Python for large scale machine learning -- Choosing between Python 2 and Python 3 -- Installing Python -- Step-by-step installation -- The installation of packages -- Package upgrades -- Scientific distributions -- Introducing Jupyter/IPython -- Python packages -- NumPy -- SciPy -- Pandas -- Scikit-learn -- The matplotlib package -- Gensim -- H2O -- XGBoost -- Theano -- TensorFlow -- The sknn library -- Theanets -- Keras -- Other useful packages to install on your system -- Summary -- Chapter 2: Scalable Learning in Scikit-learn -- Out-of-core learning -- Subsampling as a viable option -- Optimizing one instance at a time -- Building an out-of-core learning system -- Streaming data from sources -- Datasets to try the real thing yourself -- The first example - streaming the bike-sharing dataset -- Using pandas I/O tools -- Working with databases -- Paying attention to the ordering of instances -- Stochastic learning -- Batch gradient descent -- Stochastic gradient descent -- The Scikit-learn SGD implementation -- Defining SGD learning parameters -- Feature management with data streams -- Describing the target -- The hashing trick -- Other basic transformations -- Testing and validation in a stream -- Trying SGD in action -- Summary -- Chapter 3: Fast SVM Implementations -- Datasets to experiment with on your own -- The bike-sharing dataset -- The covertype dataset -- Support Vector Machines -- Hinge loss and its variants -- Understanding the Scikit-learn SVM implementation -- Pursuing nonlinear SVMs by subsampling

Achieving SVM at scale with SGD -- Feature selection by regularization -- Including non-linearity in SGD -- Trying explicit high-dimensional mappings -- Hyperparameter tuning -- Other alternatives for SVM fast learning -- Nonlinear and faster with Vowpal Wabbit -- Installing VW -- Understanding the VW data format -- Python integration -- A few examples using reductions for SVM and neural nets -- Faster bike-sharing -- The covertype dataset crunched by VW -- Summary -- Chapter 4: Neural Networks and Deep Learning -- The neural network architecture -- What and how neural networks learn -- Choosing the right architecture -- The input layer -- The hidden layer -- The output layer -- Neural networks in action -- Parallelization for sknn -- Neural networks and regularization -- Neural networks and hyperparameter optimization -- Neural networks and decision boundaries -- Deep learning at scale with H2O -- Large scale deep learning with H2O -- Gridsearch on H2O -- Deep learning and unsupervised pretraining -- Deep learning with theanets -- Autoencoders and unsupervised learning -- Autoencoders -- Summary -- Chapter 5: Deep Learning with TensorFlow -- TensorFlow installation -- TensorFlow operations -- GPU computing -- Linear regression with SGD -- A neural network from scratch in TensorFlow -- Machine learning on TensorFlow with SkFlow -- Deep learning with large files - incremental learning -- Keras and TensorFlow installation -- Convolutional Neural Networks in TensorFlow through Keras -- The convolution layer -- The pooling layer -- The fully connected layer -- CNNs with an incremental approach -- GPU Computing -- Summary -- Chapter 6: Classification and Regression Trees at Scale -- Bootstrap aggregation -- Random forest and extremely randomized forest -- Fast parameter optimization with randomized search -- Extremely randomized trees and large datasets

CART and boosting -- Gradient Boosting Machines -- max_depth -- learning_rate -- Subsample -- Faster GBM with warm_start -- Training and storing GBM models -- XGBoost -- XGBoost regression -- XGBoost and variable importance -- XGBoost streaming large datasets -- XGBoost model persistence -- Out-of-core CART with H2O -- Random forest and gridsearch on H2O -- Stochastic gradient boosting and gridsearch on H2O -- Summary -- Chapter 7: Unsupervised Learning at Scale -- Unsupervised methods -- Feature decomposition - PCA -- Randomized PCA -- Incremental PCA -- Sparse PCA -- PCA with H2O -- Clustering - K-means -- Initialization methods -- K-means assumptions -- Selection of the best K -- Scaling K-means - mini-batch -- K-means with H2O -- LDA -- Scaling LDA - memory, CPUs, and machines -- Summary -- Chapter 8: Distributed Environments - Hadoop and Spark -- From a standalone machine to a bunch of nodes -- Why do we need a distributed framework? -- Setting up the VM -- VirtualBox -- Vagrant -- Using the VM -- The Hadoop ecosystem -- Architecture -- HDFS -- MapReduce -- YARN -- Spark -- pySpark -- Summary -- Chapter 9: Practical Machine Learning with Spark -- Setting up the VM for this chapter -- Sharing variables across cluster nodes -- Broadcast read-only variables -- Accumulators write-only variables -- Broadcast and accumulators together - an example -- Data preprocessing in Spark -- JSON files and Spark DataFrames -- Dealing with missing data -- Grouping and creating tables in-memory -- Writing the preprocessed DataFrame or RDD to disk -- Working with Spark DataFrames -- Machine learning with Spark -- Spark on the KDD99 dataset -- Reading the dataset -- Feature engineering -- Training a learner -- Evaluating a learners performance -- The power of the ML pipeline -- Manual tuning -- Cross-validation -- Final cleanup -- Summary

Appendix: Introduction to GPUs and Theano -- GPU computing -- Theano - parallel computing on the GPU -- Installing Theano -- Index

Alberto Boschetti: author's other books


Who wrote Large Scale Machine Learning with Python? Find out the surname, the name of the author of the book and a list of all author's works by series.

Large Scale Machine Learning with Python — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Large Scale Machine Learning with Python" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Large Scale Machine Learning with Python

Large Scale Machine Learning with Python

Copyright 2016 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2016

Production reference: 1270716

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78588-721-5

www.packtpub.com

Credits

Authors

Bastiaan Sjardin

Luca Massaron

Alberto Boschetti

Reviewers

Oleg Okun

Kai Londenberg

Commissioning Editor

Akram Hussain

Acquisition Editor

Sonali Vernekar

Content Development Editor

Sumeet Sawant

Technical Editor

Manthan Raja

Copy Editor

Tasneem Fatehi

Project Coordinator

Shweta H Birwatkar

Proofreader

Safis Editing

Indexer

Mariammal Chettiyar

Graphics

Disha Haria

Kirk D'Penha

Production Coordinator

Arvindkumar Gupta

Cover Work

Arvindkumar Gupta

About the Authors

Bastiaan Sjardin is a data scientist and founder with a background in artificial intelligence and mathematics. He has a MSc degree in cognitive science obtained at the University of Leiden together with on campus courses at Massachusetts Institute of Technology (MIT). In the past 5 years, he has worked on a wide range of data science and artificial intelligence projects. He is a frequent community TA at Coursera in the social network analysis course from the University of Michigan and the practical machine learning course from Johns Hopkins University. His programming languages of choice are Python and R. Currently, he is the cofounder of Quandbee (http://www.quandbee.com/), a company providing machine learning and artificial intelligence applications at scale.

Luca Massaron is a data scientist and marketing research director who is specialized in multivariate statistical analysis, machine learning, and customer insight, with over a decade of experience in solving real-world problems and generating value for stakeholders by applying reasoning, statistics, data mining, and algorithms. From being a pioneer of Web audience analysis in Italy to achieving the rank of a top ten Kaggler, he has always been very passionate about everything regarding data and its analysis, and also about demonstrating the potential of data-driven knowledge discovery to both experts and non-experts. Favoring simplicity over unnecessary sophistication, he believes that a lot can be achieved in data science just by doing the essentials.

I would like to thank Yukiko and Amelia for their continued support, help, and loving patience.

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a PhD in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges that span from natural language processing (NLP) and machine learning to distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.

About the Reviewer

Oleg Okun is a machine learning expert and an author/editor of four books, numerous journal articles, and conference papers. He has been working for more than a quarter of a century. During this time, Oleg was employed in both academia and industry in his mother country, Belarus, and abroad (Finland, Sweden, and Germany). His work experience includes document image analysis, fingerprint biometrics, bioinformatics, online/offline marketing analytics, and credit-scoring analytics. He is interested in all aspects of distributed machine learning and the Internet of Things. Oleg currently lives and works in Hamburg, Germany, and is about to start a new job as a chief architect of intelligent systems. His favorite programming languages are Python, R, and Scala.

I would like to express my deepest gratitude to my parents for everything that they have done for me.

Kai Londenberg is a data science and big data expert with many years of professional experience. Currently, he is working as a data scientist at the Volkswagen Data Lab. Before that, he had the pleasure of being the lead data scientist at Searchmetrics, where Luca Massaron was a member of his team. Kai enjoys working with cutting-edge technologies, and while he is a pragmatic machine learning practitioner and software developer at work, he always enjoys staying up-to-date with the latest technologies and research in machine learning, AI, and related fields. You can find him on LinkedIn at https://www.linkedin.com/in/kailondenberg.

www.PacktPub.com
eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at > for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

httpswww2packtpubcombookssubscriptionpacktlib Do you need instant - photo 1

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?
  • Fully searchable across every book published by Packt
  • Copy and paste, print, and bookmark content
  • On demand and accessible via a web browser
Preface

"The nice thing about having a brain is that one can learn, that ignorance can be supplanted by knowledge, and that small bits of knowledge can gradually pile up into substantial heaps."

-- Douglas Hofstadter

Machine learning is often referred to as the part of artificial intelligence that actually works . Its aim is to find a function based on an existing set of data (training set) in order to predict outcomes of a previously unseen dataset (test set) with the highest possible correctness. This occurs either in the form of labels and classes (classification problems) or in the form of a continuous value (regression problems). Tangible examples of machine learning in real-life applications range from predicting future stock prices to classifying the gender of an author from a set of documents. Throughout this book, the most important machine learning concepts, together with methods suitable for larger datasets, will be made clear to the reader, thanks to practical examples in Python. We will look at supervised learning (classification & regression), as well as unsupervised learning (such as Principal Component Analysis (PCA), clustering, and topic modeling) that have been found to be applicable to larger datasets.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Large Scale Machine Learning with Python»

Look at similar books to Large Scale Machine Learning with Python. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Large Scale Machine Learning with Python»

Discussion, reviews of the book Large Scale Machine Learning with Python and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.