• Complain

Jeroen Janssens - Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools

Here you can read online Jeroen Janssens - Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2021, publisher: OReilly Media, Inc, USA, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Jeroen Janssens Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools
  • Book:
    Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools
  • Author:
  • Publisher:
    OReilly Media, Inc, USA
  • Genre:
  • Year:
    2021
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. Youll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 80 tools--useful whether you work with Windows, macOS, or Linux. Youll quickly discover why the command line is an agile, scalable, and extensible technology. Even if youre comfortable processing data with Python or R, youll learn how to greatly improve your data science workflow by leveraging the command lines power. This book is ideal for data scientists, analysts, and engineers; software and machine learning engineers; and system administrators. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on text, CSV, HTM, XML, and JSON files Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow Create reusable command-line tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines Model data with dimensionality reduction, clustering, regression, and classification algorithms

Jeroen Janssens: author's other books


Who wrote Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools? Find out the surname, the name of the author of the book and a list of all author's works by series.

Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Praise for Data Science at the Command Line Traditional computer and data - photo 1
Praise for Data Science at the Command Line

Traditional computer and data science curricula all too often mistake the command line as an obsolete relic instead of teaching it as the modern and vital toolset that it is. Only well into my career did I come to grasp the elegance and power of the command line for easily exploring messy datasets and even creating reproducible data pipelines for work. The first edition of Data Science at the Command Line was one of the most comprehensive and clear references when I was a novice in the art, and now with the second edition, Im again learning new tools and applications from it.

Dan Nguyen, data scientist, former news application developer at ProPublica, and former Lorry I. Lokey Visiting Professor in Professional Journalism at Stanford University

The Unix philosophy of simple tools, each doing one job well, then cleverly piped together, is embodied by the command line. Jeroen expertly discusses how to bring that philosophy into your work in data science, illustrating how the command line is not only the world of file input/output, but also the world of data manipulation, exploration, and even modeling.

Chris H. Wiggins, associate professor in the department of applied physics and applied mathematics at Columbia University, and chief data scientist at The New York Times

This book explains how to integrate common data science tasks into a coherent workflow. Its not just about tactics for breaking down problems, its also about strategies for assembling the pieces of the solution.

John D. Cook, consultant in applied mathematics, statistics, and technical computing

Despite what you may hear, most practical data science is still focused on interesting visualizations and insights derived from flat files. Jeroens book leans into this reality, and helps reduce complexity for data practitioners by showing how time-tested command-line tools can be repurposed for data science.

Paige Bailey, principal product manager code intelligence at Microsoft, GitHub

Its amazing how fast so much data work can be performed at the command line before ever pulling the data into R, Python, or a database. Older technologies like sed and awk are still incredibly powerful and versatile. Until I read Data Science at the Command Line , I had only heard of these tools but never saw their full power. Thanks to Jeroen, its like I now have a secret weapon for working with large data.

Jared Lander, chief data scientist at Lander Analytics, organizer of the New York Open Statistical Programming Meetup, and author of R for Everyone

The command line is an essential tool in every data scientists toolbox, and knowing it well makes it easy to translate questions you have of your data to real-time insights. Jeroen not only explains the basic Unix philosophy of how to chain together single-purpose tools to arrive at simple solutions for complex problems, but also introduces new command-line tools for data cleaning, analysis, visualization, and modeling .

Jake Hofman, senior principal researcher at Microsoft Research, and adjunct assistant professor in the department of applied mathematics at Columbia University

Data Science at the Command Line

by Jeroen Janssens

Copyright 2021 Jeroen Janssens. All rights reserved.

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

  • Acquisitions Editor: Jessica Haberman
  • Development Editor: Sarah Grey
  • Production Editor: Kate Galloway
  • Copyeditor: Arthur Johnson
  • Proofreader: Shannon Turlington
  • Indexer: nSight, Inc.
  • Interior Designer: David Futato
  • Cover Designer: Karen Montgomery
  • Illustrator: Kate Dullea
  • October 2014: First Edition
  • August 2021: Second Edition
Revision History for the Second Edition
  • 2021-08-17: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781492087915 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. Data Science at the Command Line, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

The views expressed in this work are those of the author, and do not represent the publishers views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

Data Science at the Command Line is available under the Creative Commons Attribution NonCommercial-No Derivatives 4.0 International License. The author maintains an online version at https://github.com/jeroenjanssens/data-science-at-the-command-line.

978-1-492-08791-5

[LSI]

Once again to my wife, Esther. Without her continued encouragement, support,
and patience, this second edition would surely have ended up in /dev/null .

Foreword

It was love at first sight.

It must have been around 1981 or 1982 that I got my first taste of Unix.Its command-line shell, which uses the same language for single commands and complex programs, changed my world, and I never looked back.

I was a writer who had discovered the joys of computing, and regular expressions were my gateway drug.Id first tried them in the text editor in HPs RTE operating system, but it was only when I came to Unix and its philosophy of small cooperating tools with the command-line shell as the glue that tied them together that I fully understood their power.Regular expressions in ed, ex, vi (now vim), and emacs were powerful, sure, but it wasnt until I saw how ex scripts unbound became sed, the Unix stream editor, and then AWK, which allowed you to bind programmed actions to regular expressions, and how shell scripts let you build pipelines not only out of the existing tools but out of new ones youd written yourself, that I really got it.Programming is how you speak with computers, how you tell them what you want them to do, not just once, but in ways that persist, in ways that can be varied like human language, with repeatable structure but different verbs and objects.

As a beginner, other forms of programming seemed more like recipes to be followed exactlycareful incantations where you had to get everything rightor like waiting for a teacher to grade an essay youd written.With shell programming, there was no compilation and waiting.It was more like a conversation with a friend.When the friend didnt understand, you could easily try again.Whats more, if you had something simple to say, you could just say it with one word.And there were already words for a whole lot of the things you might want to say.But if there werent, you could easily make up new words.And you could string together the words you learned and the words you made up into gradually more complex sentences, paragraphs, and eventually get to persuasive essays.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools»

Look at similar books to Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools»

Discussion, reviews of the book Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.