• Complain

Bhargav Srinivasa-Desikan [Bhargav Srinivasa-Desikan] - Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras

Here you can read online Bhargav Srinivasa-Desikan [Bhargav Srinivasa-Desikan] - Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2018, publisher: Packt Publishing, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Bhargav Srinivasa-Desikan [Bhargav Srinivasa-Desikan] Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras

Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Work with Python and powerful open source tools such as Gensim and spaCy to perform modern text analysis, natural language processing, and computational linguistics algorithms.

Key Features
  • Discover the open source Python text analysis ecosystem, using spaCy, Gensim, scikit-learn, and Keras
  • Hands-on text analysis with Python, featuring natural language processing and computational linguistics algorithms
  • Learn deep learning techniques for text analysis
Book Description

Modern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data.

This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy.

Youll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. Youre then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. Youll learn to tag, parse, and model text using the best tools. Youll gain hands-on knowledge of the best frameworks to use, and youll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning.

This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. Youll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.

What you will learn
  • Why text analysis is important in our modern age
  • Understand NLP terminology and get to know the Python tools and datasets
  • Learn how to pre-process and clean textual data
  • Convert textual data into vector space representations
  • Using spaCy to process text
  • Train your own NLP models for computational linguistics
  • Use statistical learning and Topic Modeling algorithms for text, using Gensim and scikit-learn
  • Employ deep learning techniques for text analysis using Keras
Who This Book Is For

This book is for you if you want to dive in, hands-first, into the interesting world of text analysis and NLP, and youre ready to work with the rich Python ecosystem of tools and datasets waiting for you!

Table of Contents
  1. What is Text Analysis?
  2. Python Tips for Text Analysis
  3. spaCys Language Models
  4. Gensim Vectorizing text and transformations and n-grams
  5. POS-Tagging and its Applications
  6. NER-Tagging and its Applications
  7. Dependency Parsing
  8. Top Models
  9. Advanced Topic Modelling
  10. Clustering and Classifying Text
  11. Similarity Queries and Summarization
  12. Word2Vec, Doc2Vec and Gensim
  13. Deep Learning for text
  14. Keras and spaCy for Deep Learning
  15. Sentiment Analysis and ChatBots

**

About the Author

Bhargav Srivinasa-Desikan is a student researcher working for INRIA in Lille, France. He is part of the MODAL (Models of Data Analysis and Learning) team, and he works on metric learning, predictor aggregation and data visualization. He also contributes to open source machine learning projects, particularly dynamic topic models for Gensim.

Bhargav Srinivasa-Desikan [Bhargav Srinivasa-Desikan]: author's other books


Who wrote Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras? Find out the surname, the name of the author of the book and a list of all author's works by series.

Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
To get the most out of this book

Follow the listed steps and commands to prepare the system environment:

  1. Python:
    1. Most, if not all, OS come installed with Python. It is already available on Windowns, Ubuntu 14.04 onwards, and macOS
    2. If not, please follow the official wiki documentation: https://wiki.python.org/moin/BeginnersGuide/Download
This is a good time to start migrating all of the code to Python 3.6 (http://python3statement.org/). By 2020, a lot of scientific computing packages (such as NumPy) will be dropping support for python 2.
  1. spaCy:
pip install spacy
  1. Gensim:
pip install gensim
  1. Keras:
pip install keras
  1. scikit-learn:
pip install scikit-learn
Word2Vec, Doc2Vec, and Gensim

We have previously talked about vectors a lot throughout the book they are used to understand and represent our textual data in a mathematical form, and the basis of all the machine learning methods we use rely on these representations. We will be taking this one step further, and use machine learning techniques to generate vector representations of words that better encapsulate the meaning of a word. This technique is generally referred to as word embeddings, and Word2Vec and Doc2Vec are two popular variations of these.

  • Word2Vec
  • Doc2Vec
  • Other word embeddings
Topic Models

Until now, we dealt with computational linguistics algorithms and spaCy, and we understood how to use these computational linguistic algorithms to annotate our data, as well as understand sentence structure. While these algorithms helped us understand the finer details of our text, we still didn't get a big picture of our data - what kind of words appear more often than others in our corpus? Can we group our data or find underlying themes? We will be attempting to answer these questions and more in this chapter. Following are the topics we will cover in this chapter:

  • What are topic models?
  • Topic models in Gensim
  • Topic models in scikit-learn
References

[1] Latent Semantic Analysis:
https://en.wikipedia.org/wiki/Latent_semantic_analysis#Latent_semantic_indexing

[2] Gensim:
https://radimrehurek.com/gensim/

[3] Latent Dirichlet Allocation:
http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

[4] Introduction to LDA:
http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/

[5] Explanation of LDA:
https://www.quora.com/What-is-a-good-explanation-of-Latent-Dirichlet-Allocation

[6] Probabilistic Topic Models:
http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf

[7] Jupyter Notebook:
https://github.com/bhargavvader/personal/blob/master/notebooks/text_analysis_tutorial/topic_modelling.ipynb

[8] An Empirical Evaluation of Models of Text Document Similarity:
http://www.socsci.uci.edu/~mdlee/lee_pincombe_welsh_document.PDF

[9] Singular-Value Decomposition:
https://en.wikipedia.org/wiki/Singular-value_decomposition

[10] Indexing by Latent Semantic Analysis:
https://search.proquest.com/openview/a1907164bd88dfc38a4875b73a3f7b3d/1?pq-origsite=gscholar&cbl=1818555

[11] Probabilistic Latent Semantic Indexing:
https://dl.acm.org/citation.cfm?id=312649

[12] NIPS:
https://nips.cc/

[13] Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes:
http://papers.nips.cc/paper/2698-sharing-clusters-among-related-groups-hierarchical-dirichlet-processes.pdf

[14] Dynamic Topic Models:
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/ldaseqmodel.ipynb

[15] NNMF:
https://en.wikipedia.org/wiki/Non-negative_matrix_factorization

[16] Algorithms for NNMF:
http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization

[17] On information and sufficiency:
https://projecteuclid.org/euclid.aoms/1177729694

Why Python?

In Python, we represent text in the form of string [] class. They are an immutable sequence of Unicode code points or characters. It is important to make a careful distinction here, though; in Python 3, all strings are by default Unicode, but in Python 2, the str class is limited to ASCII code, and there is a Unicode class to deal with Unicodes.

Unicode is merely an encoding language or a way we handle text. For example, the Unicode value for the letter Z is U+005A. There are many encoding types, and historically in Python, developers were expected to deal with different encodings on their own, with all the low-level action happening in bytes. In fact, the shift in the way Python handles Unicode has led to a lot of discussions [] within the community. It also remains an important point of contention when we are porting code from Python 2 and Python 3.

We said earlier on that the low-level action was going on in bytes - what does this mean? Bytes are numbers, and these numbers are used to represent different characters or symbols. This is what Unicode or ASCII is - different ways to represent characters. In Python 2, strings are stored as bytes, and in Python 3 by default, it is stored as a Unicode code point.

We will not be going deep into the technicalities of how text is encoded and the problems we encounter when dealing with these encodings but can give the following advice in general when dealing with text and Python - use Python 3 and use Unicode! The reason is mainly that we want to stop using Python 2; it is going to be phased out [] by the scientific computing community, and there makes no sense in still using Python 2 applications and code. Since Python 3 supports Unicode as well, we will be supporting the use of Unicode for all text as well. This would mean remembering to include u before our string starts, which ensures that it is a Unicode string.

While most of the text analysis that we will be doing throughout this book will not feature extensive string manipulation, it is still something we should be comfortable doing, and often we will have troublesome words in our dataset, where we would need to clean things up before starting any kind of text analysis. It may also be important to make our final pretty, and for these kind of tasks , it is worth knowing how to be able to manipulate strings.

The other useful Python knowledge that will help us in text analysis is basic data structures and how to use them - lists remain one of the most used data structures during text analysis and knowing how a dictionary works is also important to us.

The purpose of this chapter is to illustrate some of the functions we can perform with strings, and how we use strings in lists and dictionaries.

But we still haven't explained why we decided Python as the language of our choice - there are a number of text analysis packages in Java as well, and Perl is another programming language with a reputation for being good with text. But what sets Python apart is the community and open source libraries we have access to.

You would have had a taste of this in the previous chapter as well - we talked about Google using TensorFlow and Apple using SciKit-learn, for example. The open source code is reaching the same standards and efficiency as industry code - one of the libraries we will focus on throughout this book, spaCy, is an example of this. Collecting data is also largely done with Python, using libraries such as tweepy (Twitter), urllib (accessing web pages), and beautiful soup (extracting HTML from web pages). More people using a certain ecosystem means it will grow (the Stack Overflow blog post does a good write up regarding this []), and this means that both researchers and industry are increasingly using it, which means it is a good time to jump on the bandwagon!

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras»

Look at similar books to Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras»

Discussion, reviews of the book Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis With Python, Gensim, spaCy, and Keras and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.