• Complain

Janssens - Data science at the command line facing the future with time-tested tools

Here you can read online Janssens - Data science at the command line facing the future with time-tested tools full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Sebastopol;CA, year: 2015;2014, publisher: OReilly Media, Inc., genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Janssens Data science at the command line facing the future with time-tested tools
  • Book:
    Data science at the command line facing the future with time-tested tools
  • Author:
  • Publisher:
    OReilly Media, Inc.
  • Genre:
  • Year:
    2015;2014
  • City:
    Sebastopol;CA
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

Data science at the command line facing the future with time-tested tools: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Data science at the command line facing the future with time-tested tools" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

In this practical guide, youll learn how to leverage the power of the command line for doing data science. By combining small, yet powerful, command-line tools, you can quickly obtain, scrub, explore, and model your data. Even if youre already comfortable processing data with R or Python, being able to integrate the command line into your existing workflow will make you a more efficient and productive data scientist.

Janssens: author's other books


Who wrote Data science at the command line facing the future with time-tested tools? Find out the surname, the name of the author of the book and a list of all author's works by series.

Data science at the command line facing the future with time-tested tools — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Data science at the command line facing the future with time-tested tools" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Data Science at the Command Line

by Jeroen Janssens

Copyright 2015 Jeroen H.M. Janssens. All rights reserved.

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

  • Editors: Mike Loukides, Ann Spencer,
    and Marie Beaugureau
  • Production Editor: Matthew Hacker
  • Copyeditor: Kiel Van Horn
  • Proofreader: Jasmine Kwityn
  • Indexer: Wendy Catalano
  • Interior Designer: David Futato
  • Cover Designer: Ellie Volckhausen
  • Illustrator: Rebecca Demarest
  • October 2014: First Edition
Revision History for the First Edition
  • 2014-09-23: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781491947852 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. Data Science at the Command Line, the cover image of a wreathed hornbill, and related trade dress are trademarks of OReilly Media, Inc.

While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-94785-2

[LSI]

To my wife, Esther. Without her encouragement, support,
and patience, this book would surely have ended up in /dev/null.

Preface

Data science is an exciting field to work in. Its also still very young. Unfortunately, many people, and especially companies, believe that you need new technology in order to tackle the problems posed by data science. However, as this book demonstrates, many things can be accomplished by using the command line instead, and sometimes in a much more efficient way.

Around five years ago, during my PhD program, I gradually switched from using Microsoft Windows to GNU/Linux. Because it was a bit scary at first, I started with having both operating systems installed next to each other (known as dual-boot). The urge to switch back and forth between the two faded and at some point I was even tinkering around with Arch Linux, which allows you to build up your own custom operating system from scratch. All youre given is the command line, and its up to you what you want to make of it. Out of necessity I quickly became comfortable using the command line. Eventually, as spare time got more precious, I settled down with a GNU/Linux distribution known as Ubuntu because of its easy-of-use and large community. Nevertheless, the command line is still where Im getting most of my work done.

It actually hasnt been too long ago that I realized that the command line is not just for installing software, system configuration, and searching files. I started learning about command-line tools such as cut, sort, and sed. These are examples of command-line tools that take data as input, do something to it, and print the result. Ubuntu comes with quite a few of them. Once I understood the potential of combining these small tools, I was hooked.

After my PhD, when I became a data scientist, I wanted to use this approach to do data science as much as possible. Thanks to a couple of new, open source command-line tools including scrape, jq, and json2csv, I was even able to use the command line for tasks such as scraping websites and processing lots of JSON data. In September 2013, I decided to write a blog post titled below), I was able to do just that.

Im sharing this personal story not so much because I think you should know how this book came about, but more because I want you to know that I had to learn about the command line as well. Because the command line is so different from using a graphical user interface, it can be intimidating at first. But if I can learn it, then you can as well. No matter what your current operating system is and no matter how you currently do data science, by the end of this book you will be able to also leverage the power of the command line. If youre already familiar with the command line, or even if youre already dreaming in shell scripts, chances are that youll still discover a few interesting tricks or command-line tools to use for your next data science project.

What to Expect from This Book

In this book, were going to obtain, scrub, explore, and model dataa lot of it. This book is not so much about how to become better at those data science tasks. There are already great resources available that discuss, for example, when to apply which statistical test or how data can be best visualized. Instead, this practical book aims to make you more efficient and more productive by teaching you how to perform those data science tasks at the command line.

While this book discusses over 80 command-line tools, its not the tools themselves that matter most. Some command-line tools have been around for a very long time, while others are fairly new and might eventually be replaced by better ones. There are even command-line tools that are being created as youre reading this. In the past 10 months, I have discovered many amazing command-line tools. Unfortunately, some of them were discovered too late to be included in the book. In short, command-line tools come and go, and thats OK.

What matters most are the underlying ideas of working with tools, pipes, and data. Most of the command-line tools do one thing and do it well. This is part of the Unix philosophy, which makes several appearances throughout the book. Once you become familiar with the command line, and learn how to combine command-line tools, you will have developed an invaluable skilland if you can create new tools, youll be a cut above.

How to Read This Book

In general, youre advised to read this book in a linear fashion. Once a concept or command-line tool has been introduced, chances are that we employ it in a later chapter. For example, in .

Data science is a broad field that intersects with many other fields, such as programming, data visualization, and machine learning. As a result, this book touches on many interesting topics that unfortunately cannot be discussed at full length. Throughout the book, there are suggestions for additional reading. Its not required to read this material in order to follow along with the book, but when you are interested, you can use turn to these suggested readings as jumping-off points.

Who This Book Is For

This book makes just one assumption about you: that you work with data. It doesnt matter which programming language or statistical computing environment youre currently using. The book explains all the necessary concepts from the beginning.

It also doesnt matter whether your operating system is Microsoft Windows, Mac OS X , or some other form of Unix. The book comes with the Data Science Toolbox, which is an easy-to-install virtual environment. It allows you to run the command-line tools and follow along with the code examples in the same environment as this book was written. You dont have to waste time figuring out how to install all the command-line tools and their dependencies.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Data science at the command line facing the future with time-tested tools»

Look at similar books to Data science at the command line facing the future with time-tested tools. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Data science at the command line facing the future with time-tested tools»

Discussion, reviews of the book Data science at the command line facing the future with time-tested tools and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.