David Mertz - Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools
Here you can read online David Mertz - Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2021, publisher: Packt Publishing - ebooks Account, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:
Romance novel
Science fiction
Adventure
Detective
Science
History
Home and family
Prose
Art
Politics
Computer
Non-fiction
Religion
Business
Children
Humor
Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.
- Book:Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools
- Author:
- Publisher:Packt Publishing - ebooks Account
- Genre:
- Year:2021
- Rating:4 / 5
- Favourites:Add to favourites
- Your mark:
Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools: summary, description and annotation
We offer to read an annotation, description, summary or preface (depends on what the author of the book "Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.
A comprehensive guide for data scientists to master effective data cleaning tools and techniques
Key Features- Master data cleaning techniques in a language-agnostic manner
- Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series, and image processing
- Work with detailed, commented, well-tested code samples in Python and R
It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in Davids signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results.
The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired.
You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration.
Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals.
By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks.
What you will learn- Identify problem data pertaining to individual data points
- Detect problem data in the systematic shape of the data
- Remediate data integrity and hygiene problems
- Prepare data for analytic and machine learning tasks
- Impute values into missing or unreliable data
- Generate synthetic features that are more amenable to data science, data analysis, or visualization goals.
This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing.
Basic familiarity with statistics, general concepts in machine learning, knowledge of a programming language (Python or R), and some exposure to data science are helpful. A glossary, references, and friendly asides should help bring all readers up to speed.
The text will also be helpful to intermediate and advanced data scientists who want to improve their rigor in data hygiene and wish for a refresher on data preparation issues.
David Mertz: author's other books
Who wrote Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools? Find out the surname, the name of the author of the book and a list of all author's works by series.