Appendix A. List of Command-Line Tools
This is an overview of all the command-line tools discussed in this book. This includes binary executables, interpreted scripts, and Bash builtins and keywords. For each command-line tool, the following information, when available and appropriate, is provided:
The actual command to type at the commandline
A description
The name of the package it belongs to
The version used in the book
The year that version was released
The primary author(s)
A website to find more information
How to install it
How to obtain help
An example usage
All command-line tools listed here are included in the Data Science Toolbox for Data Science at the Command Line. See for instructions on how to set it up. The install commands assume that youre running Ubuntu 14.04. Please note that citing open source software is not trivial, and that some information may be missing or incorrect.
alias
Define or display aliases. Alias is a Bash builtin.
$ help alias$ alias ll='ls -alF'
awk
Pattern scanning and text processing language. Mawk (version 1.3.3) by Mike Brennan (1994). http://invisible-island.net/mawk.
$ sudo apt-get install mawk$ man awk$ seq 5 | awk '{sum+=$1} END {print sum}'15
aws
Manage AWS Services such as EC2 and S3 from the command line. AWS Command Line Interface (version 1.3.24) by Amazon Web Services (2014). http://aws.amazon.com/cli.
$ sudo pip install awscli$ aws help$ aws ec2 describe-regions | head -n 5{ "Regions": [ { "Endpoint": "ec2.eu-west-1.amazonaws.com", "RegionName": "eu-west-1"
bash
GNU Bourne-Again SHell. Bash (version 4.3) by Brian Fox and Chet Ramey (2010). http://www.gnu.org/software/bash.
$ sudo apt-get install bash$ man bash
bc
Evaluate equation from standard input. Bc (version 1.06.95) by Philip A. Nelson (2006). http://www.gnu.org/software/bc.
$ sudo apt-get install bc$ man bc$ echo 'e(1)' | bc -l2.71828182845904523536
bigmler
Access BigMLs prediction API. BigMLer (version 1.12.2) by BigML (2014). http://bigmler.readthedocs.org.
$ sudo pip install bigmler$ bigmler --help
body
Apply an expression to all but the first line. Useful if you want to apply classic command-line tools to CSV files with a header. Body by Jeroen H.M. Janssens (2014). https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git$ echo -e "value\n7\n2\n5\n3" | body sort -nvalue2357
cat
Concatenate files and standard input, and print on standard output. Cat (version 8.21) by Torbjorn Granlund and Richard M. Stallman (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils$ man cat$ cat results-01 results-02 results-03 > results-all
cd
Change the shell working directory. Cd is a Bash builtin.
$ help cd$ cd ~; pwd; cd ..; pwd/home/vagrant/home
chmod
Change file mode bits. We use it to make our command-line tools executable. Chmod (version 8.21) by David MacKenzie and Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils$ man chmod$ chmod u+x experiment.sh
cols
Apply a command to a subset of the columns and merge the result back with the remaining columns. Cols by Jeroen H.M. Janssens (2014). https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git$ < iris.csv cols -C species body tapkee --method pca | header -r x,y,species
cowsay
Generate an ASCII picture of a cow with a message. Useful for when building up a particular pipeline is starting to frustrate you a bit too much. Cowsay (version 3.03+dfsg1) by Tony Monroe (1999).
$ sudo apt-get install cowsay$ man cowsay$ echo 'The command line is awesome!' | cowsay ______________________________< The command line is awesome! > ------------------------------ \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || ||
cp
Copy files and directories. Cp (version 8.21) by Torbjorn Granlund, David MacKenzie , and Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils$ man cp
csvcut
Extract columns from CSV data. Like cut
command-line tool, but for tabular data. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit$ csvcut --help
csvgrep
Filter tabular data to only those rows where certain columns contain a given value or match a regular expression. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit$ csvgrep --help
csvjoin
Merge two or more CSV tables together using a method analogous to a SQL JOIN operation. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit$ csvjoin --help
csvlook
Renders a CSV file to the command line in a readable, fixed-width format. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit$ csvlook --help$ echo -e "a,b\n1,2\n3,4" | csvlook|----+----|| a | b ||----+----|| 1 | 2 || 3 | 4 ||----+----|
csvsort
Sort CSV files. Like the sort
command-line tool, but for tabular data. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit$ csvsort --help
csvsql
Execute SQL queries directly on CSV data or insert CSV into a database. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit$ csvsql --help
csvstack
Stack up the rows from multiple CSV files, optionally adding a grouping value to each row. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit$ csvstack --help
csvstat
Print descriptive statistics for all columns in a CSV file. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit$ csvstat --help
curl
Download data from a URL. cURL (version 7.35.0) by Daniel Stenberg (2012). http://curl.haxx.se.
$ sudo apt-get install curl$ man curl
curlicue
Perform OAuth dance for curl
. Curlicue by Decklin Foster (2014). https://github.com/decklin/curlicue.
$ git clone https://github.com/decklin/curlicue.git
cut
Remove sections from each line of files. Cut (version 8.21) by David M. Ihnat, David MacKenzie, and Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils$ man cut
display
Display an image or image sequence on any X server. Can read image data from standard input. Display (version 8:6.7.7.10) by ImageMagick Studio LLC (2009). http://www.imagemagick.org.
$ sudo apt-get install imagemagick$ man display