Preface
NLTK is one of the most popular and widely used library in the natural language processing (NLP) community. The beauty of NLTK lies in its simplicity, where most of the complex NLP tasks can be implemented using a few lines of code. Start off by learning how to tokenize text into component words. Explore and make use of the WordNet language dictionary. Learn how and when to stem or lemmatize words. Discover various ways to replace words and perform spelling correction. Create your own custom text corpora and corpus readers, including a MongoDB backed corpus. Use part-of-speech taggers to annotate words with their parts of speech. Create and transform chunked phrase trees using partial parsing. Dig into feature extraction for text classification and sentiment analysis. Learn how to do parallel and distributed text processing, and to store word distributions in Redis.
This learning path will teach you all that and more, in a hands-on learn-by-doing manner. Become an expert in using NLTK for Natural Language Processing with this useful companion.
What this learning path covers
, NLTK Essentials , talks about all the preprocessing steps required in any text mining/NLP task. In this module, we discuss tokenization, stemming, stop word removal, and other text cleansing processes in detail and how easy it is to implement these in NLTK.
, Python 3 Text Processing with NLTK 3 Cookbook , explains how to use corpus readers and create custom corpora. It also covers how to use some of the corpora that come with NLTK. It covers the chunking process, also known as partial parsing, which can identify phrases and named entities in a sentence. It also explains how to train your own custom chunker and create specific named entity recognizers.
, Mastering Natural Language Processing with Python , covers how to calculate word frequencies and perform various language modeling techniques. It also talks about the concept and application of Shallow Semantic Analysis (that is, NER) and WSD using Wordnet.
It will help you understand and apply the concepts of Information Retrieval and text summarization.
What you need for this learning path
Module 1:
We need the following software for this module:
Chapter number | Software required (with version) | Free/Proprietary | Download links to the software | Hardware specifications | OS required |
---|
1-5 | Python/Anaconda NLTK | Free | https://www.python.org/ http://continuum.io/downloads http://www.nltk.org/ | Common Unix Printing System | any |
| scikit-learn and gensim | Free | http://scikit-learn.org/stable/ https://radimrehurek.com/gensim/ | Common Unix Printing System | any |
| Scrapy | Free | http://scrapy.org/ | Common Unix Printing System | any |
| NumPy, SciPy, pandas, and matplotlib | Free | http://www.numpy.org/ http://www.scipy.org/ http://pandas.pydata.org/ http://matplotlib.org/ | Common Unix Printing System | any |
| Twitter Python APIs and Facebook python APIs | Free | https://dev.twitter.com/overview/api/twitter-libraries https://developers.facebook.com | Common Unix Printing System | any |
Module 2:
You will need Python 3 and the listed Python packages. For this learning path, the author used Python 3.3.5. To install the packages, you can use pip (https://pypi.python.org/pypi/pip/). The following is the list of the packages in requirements format with the version number used while writing this learning path:
- NLTK>=3.0a4
- pyenchant>=1.6.5
- lockfile>=0.9.1
- numpy>=1.8.0
- scipy>=0.13.0
- scikit-learn>=0.14.1
- execnet>=1.1
- pymongo>=2.6.3
- redis>=2.8.0
- lxml>=3.2.3
- beautifulsoup4>=4.3.2
- python-dateutil>=2.0
- charade>=1.0.3
You will also need NLTK-Trainer, which is available at https://github.com/japerk/nltk-trainer.
Beyond Python, there are a couple recipes that use MongoDB and Redis, both NoSQL databases. These can be downloaded at http://www.mongodb.org/ and http://redis.io/, respectively.
Module 3:
For all the chapters, Python 2.7 or 3.2+ is used. NLTK 3.0 must be installed either on 32-bit machine or 64-bit machine. Operating System required is Windows/Mac/Unix.
Who this learning path is for
If you are an NLP or machine learning enthusiast and an intermediate Python programmer who wants to quickly master NLTK for natural language processing, then this Learning Path will do you a lot of good. Students of linguistics and semantic/sentiment analysis professionals will find it invaluable.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this coursewhat you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.