• Complain

Antonio Gulli - A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning

Here you can read online Antonio Gulli - A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2015, publisher: CreateSpace Independent Publishing Platform, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

No cover
  • Book:
    A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning
  • Author:
  • Publisher:
    CreateSpace Independent Publishing Platform
  • Genre:
  • Year:
    2015
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

BigData and Machine Learning in Python and Spark

Antonio Gulli: author's other books


Who wrote A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning? Find out the surname, the name of the author of the book and a list of all author's works by series.

A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make

A collection of Data Science Interview Questions Solved in Python and Spark

Hands-on Big Data

and Machine Learning

(volume I)

Antonio Gulli

Copyright 2015 Antonio Gulli

All rights reserved.

ISBN : 1517216710

ISBN-13: 978-1517216719

Data Science is the sixth of a series of 25 Chapters devoted to algorithms, problem solving, machine learning, big data and C++/Python programming.

DEDICATION

To Lorenzo, Leonardo, Aurora and Francesca

I heard there was a secret chord
That David played and it pleased the Lord
But you don't really care for music, do ya?

[Leonard Cohen, 1984]

La tua bocca si apre al sorriso e la tua mano ad aiutare gli altri

ACKNOWLEDGMENTS

Thanks to Eric, Francesco, Michele, Dario, Domenico, Carla, Antonio, Ettore, Federica, Laura, Antonella, Susana, and Antonello for their friendship.

Table of Contents

What are the most important machine learning techniques?
Solution

In his famous essay Computing Machinery and Intelligence Alan Turing asked a fundamental question " Can machines do what we (as thinking entities) can do? " Machine learning is not about thinking but more about a related activity: Learning or better, according to Arthur Samuel, the " Field of study that gives computers the ability to learn without being explicitly programmed ".

Machine learning techniques are typically classified into two categories:

In supervised learning pairs of examples made up by (inputs, desired output) are available and the computer learns a model according to which given an input, a desired output with a minimal error is predicted. Classification, Neural Networks and Regression are all examples of supervised learning. For all techniques we assume that there is an oracle or a teacher that can teach to computers what to do in order for them to apply the learned lessons on new unseen data.

In unsupervised learning computers have no teachers and they are left alone in searching for structures, patterns and anomalies in data. Clustering and Density Estimations are typical examples of unsupervised machine learning.

Let us now review the main machine learning techniques:

In classification the teacher presents pairs of (inputs, target classes) and the computer learns to attribute classes to new unseen data. Nave Bayesian, SVM, Decision Trees and Neural Networks are all classification methodologies. The first two are discussed in this volume, while the remaining ones will be part of the next volume.

In Regression the teacher presents pairs of (inputs, continuous targets) and computers learn how to predict continuous values on new and unseen data. Linear and Logistic regression are examples which will be discussed in the present volume. Decision Trees, SVM and Neural Networks can also be used for Regression.

In Associative rule learning c omputers are presented with a large set of observations, all being made up of multiple variables. The task is then to learn relations between variables such us A & B C (if A and B happen, then C will also happen).

In Clustering computers learn how to partition observations in various subsets, so that each partition will be made up of similar observations according to some well-defined metric. Algorithms like K-Means and DBSCAN belong also to this class.

In Density estimation computers learn how to find statistical values that describe data. Algorithms like Expectation Maximization belong also to this class.

Why is it important to have a robust set of metrics for machine learning?
Solution

Any machine learning technique should be evaluated by using metrics for analytically assessing the quality of results. For instance: if we need to categorize objects such as people, movies or songs into different classes, precision and recall might be suitable metrics.

Precision is the ratio A collection of Data Science Interview Questions Solved in Python and Spark Hands-on Big Data and Machine Learning - image 1 where A collection of Data Science Interview Questions Solved in Python and Spark Hands-on Big Data and Machine Learning - image 2 is the number of true positives and A collection of Data Science Interview Questions Solved in Python and Spark Hands-on Big Data and Machine Learning - image 3 is the number of false positives. Recall is the ratio A collection of Data Science Interview Questions Solved in Python and Spark Hands-on Big Data and Machine Learning - image 4 where Picture 5 is the number of true positives and Picture 6 is the number of false negatives. True and false are attributes derived by using manually created data. Precision and Recall are typically reported in a 2-d graph known as P/R Curves, where different algorithmic graphs can be compared by reporting the achieved Precision for fixed values of Recall.

In addition, F1 is another frequently used metric, which combines Precision and Recall into a single value:

Scikit-learn provides a comprehensive set of metrics for classification - photo 7

Scikit-learn provides a comprehensive set of metrics for classification, clustering, regression, ranking and pairwise judgment . As an example the code below computes Precision and Recall.

Code

import numpy as np

from sklearn . metrics import precision_recall_curve

y_true = np . array ([ , , , , ])

y_scores = np . array ([ 0.5 , 0.6 , 0.38 , 0.9 , ])

precision , recall , thresholds = precision_recall_curve ( y_true , y_scores )

print precision

print recall

Why are Features extraction and engineering so important in machine learning?
Solution

The Features are the selected variables for making predictions. For instance, suppose youd like to forecast whether tomorrow there will be a sunny day then you will probably pick features like humidity (a numerical value), speed of wind (another numeric value), some historical information (what happened during the last few years), whether or not it is sunny today (a categorical value yes/no) and a few other features. Your choice can dramatically impact on your model for the same algorithm and you need to run multiple experiments in order to find what the right amount of data and what the right features are in order to forecast with minimal error. It is not unusual to have problems represented by thousands of features and combinations of them and a good feature engineer will use tools for stack ranking features according to their contribution in reducing the error for prediction.

Different authors use different names for different features including attributes, variables and predictors. In this book we consistently use features.

Features can be categorica l such as marital status, gender, state of residence, place of birth, or numerical such as age, income, height and weight. This distinction is important because certain algorithms such as linear regression work only with numerical attributes and if categorical features are present, they need to be somehow encoded into numerical values.

In other words, feature engineering is the art of extracting, selecting and transforming essential characteristics representing data. It is sometimes considered less glamourous than machine learning algorithms but in reality any experienced Data Scientist knows that a simple algorithm on a well-chosen set of features performs better than a sophisticated algorithm on a not so good set of features.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning»

Look at similar books to A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning»

Discussion, reviews of the book A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.