• Complain

Unknown - Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis

Here you can read online Unknown - Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2016, publisher: LazyProgrammer, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

No cover
  • Book:
    Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis
  • Author:
  • Publisher:
    LazyProgrammer
  • Genre:
  • Year:
    2016
  • Rating:
    5 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 100
    • 1
    • 2
    • 3
    • 4
    • 5

Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Unknown: author's other books


Who wrote Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis? Find out the surname, the name of the author of the book and a list of all author's works by series.

Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make

Unsupervised Machine Learning in Python

Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis

By: The LazyProgrammer ( http://lazyprogrammer.me )

Introduction

In a real-world environment, you can imagine that a robot or an artificial intelligence wont always have access to the optimal answer, or maybe there isnt an optimal correct answer. Youd want that robot to be able to explore the world on its own, and learn things just by looking for patterns.

Think about the large amounts of data being collected today, by the likes of the NSA, Google, and other organizations. No human could possibly sift through all that data manually. It was reported recently in the Washington Post and Wall Street Journal that the National Security Agency collects so much surveillance data, it is no longer effective.

Could automated pattern discovery solve this problem?

Do you ever wonder how we get the data that we use in our supervised machine learning algorithms?

Kaggle always seems to provide us with a nice CSV, complete with Xs and corresponding Ys.

If you havent been involved in acquiring data yourself, you might not have thought about this, but someone has to make this data!

A lot of the time this involves manual labor. Sometimes, you dont have access to the correct information or it is infeasible or costly to acquire.

You still want to have some idea of the structure of the data.

This is where unsupervised machine learning comes into play.

In this book we are first going to talk about clustering. This is where instead of training on labels, we try to create our own labels. Well do this by grouping together data that looks alike.

The 2 methods of clustering well talk about: k-means clustering and hierarchical clustering.

Next, because in machine learning we like to talk about probability distributions, well go into Gaussian mixture models and kernel density estimation, where we talk about how to learn the probability distribution of a set of data.

One interesting fact is that under certain conditions, Gaussian mixture models and k-means clustering are exactly the same! Well prove how this is the case.

Lastly, well look at the theory behind principal components analysis or PCA. PCA has many useful applications: visualization, dimensionality reduction, denoising, and de-correlation. You will see how it allows us to take a different perspective on latent variables, which first appear when we talk about k-means clustering and GMMs.

All the algorithms well talk about in this course are staples in machine learning and data science, so if you want to know how to automatically find patterns in your data with data mining and pattern extraction , without needing someone to put in manual work to label that data, then this book is for you.

All of the materials required to follow along in this book are free: You just need to able to download and install Python, Numpy, Scipy, Matplotlib, and Sci-kit Learn.

Chapter 1: What is unsupervised learning used for?

In general: unsupervised learning is for learning the structure or the probability distribution of the data. What does this mean specifically?

In this chapter well talk about some specific examples of how you can use unsupervised learning in your data pipeline.

Density Estimation:

You already know that we use the PDF, or probability density function, to tell us the probability of a random variable. Density estimation is the process of taking samples of data of the random variable, and figuring out the probability density function.

Once you learn the distribution of a variable, you can generate your own samples of the variable using that distribution.

At a high level, for example, you could learn the distribution of a Shakespeare play, and then generate text that looks like Shakespeare.

Latent Variables:

A lot of the time, we want to know about the hidden or underlying causes of the data were seeing.

These can be thought of as latent, missing, or hidden variables.

As an example, suppose youre given a set of documents, but you arent told what they are.

You could do clustering on them and discover that there are a few very distinct groups in that set of documents.

Then, when you actually read some of the documents in your dataset, you see that one set of documents is romance novels, this other one is childrens books, and so on.

Some examples of clustering algorithms are: k-means clustering (covered in this book), hierarchical clustering (covered in this book), and affinity propagation. Gaussian mixture models can be thought of as a soft or fuzzy clustering algorithm.

Hierarchical clustering and the dendrogram (the visualization we use on hierarchical clustering output) has been used in biology for constructing phylogenetic trees.

A lot of the time, your data is just so big its infeasible to look at the entire thing yourself, so you need some way of summarizing the data like this.

One view of this is topic modeling, where the latent variable is the topic, and the observed variable is the words.

Dimensionality reduction:

Another way to think of unsupervised machine learning is dimensionality reduction. Some examples of algorithms that do dimensionality reduction are principal components analysis (PCA), singular value decomposition (SVD), t-SNE (t-distributed stochastic neighbor embedding), LLE (locally linear embedding), and more.

A lot of data can be hundred or thousands of dimensions wide. Think of a 28x28 image. Thats 784 dimensions. Humans cant visualize anything past 3 dimensions.

28x28 is a small image. What if we increase the size a little bit, to 32x32, and add color? Color (RGB) has 3 different channels. So thats 3x32x32 = 3072 dimensionality data. 32x32 is still a tiny image! 1080p is 1920 pixels in width and 1080 pixels in height. How many dimensions is that?

If your algorithm run time is dependent on the dimensionality of your input, this could be a problem. A lot of your data is correlated (redundant), so you probably dont need all the dimensions anyway. So how do we reduce dimensionality, while retaining information? We will answer this question later in the book.

Visualization:

Another useful reason to do unsupervised learning is visualization. Sometimes you just need a summary picture of your data to give you a sense of the structure. Dimensionality reduction is useful here because it allows you to first reduce your data to 2 dimensions, at which point you can create a scatter plot.

As youll see in this book, other types of algorithms can help us generate useful pictures too, like the dendrogram in hierarchical clustering.

A surprising but useful application of visualization is that it can tell us when an algorithm is not working, as well see when we do k-means.

Chapter 2: K-Means Clustering

Basic idea: Take a bunch of unlabeled data (data as in a set of vectors) and group them into K clusters.

The input into K-Means is just a matrix X. We usually organize it so that each row is a different sample, and each column is a different feature, or factor, in statistics terminology.

We usually say there are N samples and D features. So X is an N x D matrix.

There are 2 main steps in the K-Means algorithm.

First, we choose K different cluster centers. Usually we just assign these to random points in the dataset.

Next, we go into our main loop. The main loop is where the 2 main steps take place.

1) The first step is to decide which cluster each point belongs to. We do that by looking at every sample, and choosing the closest cluster center. Remember, we just assigned these randomly to begin with.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis»

Look at similar books to Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis»

Discussion, reviews of the book Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.