LitArk » Books » Children

Forsyth - Probability and Statistics for Computer Science

Here you can read online Forsyth - Probability and Statistics for Computer Science full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Cham, year: 2018, publisher: Springer International Publishing : Imprint : Springer, genre: Children. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Probability and Statistics for Computer Science
Author:
Forsyth / David
Publisher:
Springer International Publishing : Imprint : Springer
Genre:
Books / Children
Year:
2018
City:
Cham
Rating:
3 / 5
Favourites:
Add to favourites
Your mark:
- 60
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Probability and Statistics for Computer Science: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Probability and Statistics for Computer Science" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

This textbook is aimed at computer science undergraduates late in sophomore or early in junior year, supplying a comprehensive background in qualitative and quantitative data analysis, probability, random variables, and statistical methods, including machine learning. With careful treatment of topics that fill the curricular needs for the course, Probability and Statistics for Computer Science features: A treatment of random variables and expectations dealing primarily with the discrete case. A practical treatment of simulation, showing how many interesting probabilities and expectations can be extracted, with particular emphasis on Markov chains. A clear but crisp account of simple point inference strategies (maximum likelihood; Bayesian inference) in simple contexts. This is extended to cover some confidence intervals, samples and populations for random sampling with replacement, and the simplest hypothesis testing. A chapter dealing with classification, explaining why its useful; how to train SVM classifiers with stochastic gradient descent; and how to use implementations of more advanced methods such as random forests and nearest neighbors. A chapter dealing with regression, explaining how to set up, use and understand linear regression and nearest neighbors regression in practical problems. A chapter dealing with principal components analysis, developing intuition carefully, and including numerous practical examples. There is a brief description of multivariate scaling via principal coordinate analysis. A chapter dealing with clustering via agglomerative methods and k-means, showing how to build vector quantized features for complex signals. Illustrated throughout, each main chapter includes many worked examples and other pedagogical elements such as boxed Procedures, Definitions, Useful Facts, and Remember This (short tips). Problems and Programming Exercises are at the end of each chapter, with a summary of what the reader should know. Instructor resources include a full set of model solutions for all problems, and an Instructors Manual with accompanying presentation slides.;1 Notation and conventions -- 2 First Tools for Looking at Data -- 3 Looking at Relationships -- 4 Basic ideas in probability -- 5 Random Variables and Expectations -- 6 Useful Probability Distributions -- 7 Samples and Populations -- 8 The Significance of Evidence -- 9 Experiments -- 10 Inferring Probability Models from Data -- 11 Extracting Important Relationships in High Dimensions -- 12 Learning to Classify -- 13 Clustering: Models of High Dimensional Data -- 14 Regression -- 15 Markov Chains and Hidden Markov Models -- 16 Resources.

Forsyth: author's other books

Who wrote Probability and Statistics for Computer Science? Find out the surname, the name of the author of the book and a list of all author's works by series.

Probability and Statistics for Computer Science — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Probability and Statistics for Computer Science" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Part I
Describing Datasets

Springer International Publishing AG 2018

David Forsyth Probability and Statistics for Computer Science

1. First Tools for Looking at Data

David Forsyth 1

(1)

Computer Science Department, University of Illinois at Urbana Champaign, Urbana, IL, USA

The single most important question for a working scientistperhaps the single most useful question anyone can askis: whats going on here? Answering this question requires creative use of different ways to make pictures of datasets, to summarize them, and to expose whatever structure might be there. This is an activity that is sometimes known as Descriptive Statistics. There isnt any fixed recipe for understanding a dataset, but there is a rich variety of tools we can use to get insights.

1.1 Datasets

A dataset is a collection of descriptions of different instances of the same phenomenon. These descriptions could take a variety of forms, but it is important that they are descriptions of the same thing. For example, my grandfather collected the daily rainfall in his garden for many years; we could collect the height of each person in a room; or the number of children in each family on a block; or whether 10 classmates would prefer to be rich or famous. There could be more than one description recorded for each item. For example, when he recorded the contents of the rain gauge each morning, my grandfather could have recorded (say) the temperature and barometric pressure. As another example, one might record the height, weight, blood pressure and body temperature of every patient visiting a doctors office.

The descriptions in a dataset can take a variety of forms. A description could be categorical , meaning that each data item can take a small set of prescribed values. For example, we might record whether each of 100 passers-by preferred to be Rich or Famous. As another example, we could record whether the passers-by are Male or Female. Categorical data could be ordinal , meaning that we can tell whether one data item is larger than another. For example, a dataset giving the number of children in a family for some set of families is categorical, because it uses only non-negative integers, but it is also ordinal, because we can tell whether one family is larger than another.

Some ordinal categorical data appears not to be numerical, but can be assigned a number in a reasonably sensible fashion. For example, many readers will recall being asked by a doctor to rate their pain on a scale of 110a question that is usually relatively easy to answer, but is quite strange when you think about it carefully. As another example, we could ask a set of users to rate the usability of an interface in a range from very bad to very good, and then record that using 2 for very bad, 1 for bad, 0 for neutral, 1 for good, and 2 for very good.

Many interesting datasets involve continuous variables (like, for example, height or weight or body temperature) when you could reasonably expect to encounter any value in a particular range. For example, we might have the heights of all people in a particular room, or the rainfall at a particular place for each day of the year.

You should think of a dataset as a collection of d -tuples (a d -tuple is an ordered list of d elements). Tuples differ from vectors, because we can always add and subtract vectors, but we cannot necessarily add or subtract tuples. We will always write N for the number of tuples in the dataset, and d for the number of elements in each tuple. The number of elements will be the same for every tuple, though sometimes we may not know the value of some elements in some tuples (which means we must figure out how to predict their values, which we will do much later).

Each element of a tuple has its own type. Some elements might be categorical. For example, one dataset we shall see several times has entries for Gender; Grade; Age; Race; Urban/Rural; School; Goals; Grades; Sports; Looks; and Money for 478 children, so d = 11 and N = 478. In this dataset, each entry is categorical data. Clearly, these tuples are not vectors because one cannot add or subtract (say) Gender, or add Age to Grades.

Most of our data will be vectors. We use the same notation for a tuple and for a vector. We write a vector in bold, so x could represent a vector or a tuple (the context will make it obvious which is intended).

The entire data set is { x }. When we need to refer to the i th data item, we write x i . Assume we have N data items, and we wish to make a new dataset out of them; we write the dataset made out of these items as { x i } (the i is to suggest you are taking a set of items and making a dataset out of them).

In this chapter, we will work mainly with continuous data. We will see a variety of methods for plotting and summarizing 1-tuples. We can build these plots from a dataset of d -tuples by extracting the r th element of each d -tuple. All through the book, we will see many datasets downloaded from various web sources, because people are so generous about publishing interesting datasets on the web. In the next chapter, we will look at two-dimensional data, and we look at high dimensional data in Chap.

1.2 Whats Happening? Plotting Data

The very simplest way to present or visualize a dataset is to produce a table. Tables can be helpful, but arent much use for large datasets, because it is difficult to get any sense of what the data means from a table. As a continuous example, Table gives a table of the net worth of a set of people you might meet in a bar (I made this data up). You can scan the table and have a rough sense of what is going on; net worths are quite close to $100,000, and there arent any very big or very small numbers. This sort of information might be useful, for example, in choosing a bar.

Table 1.1

On the left , net worths of people you meet in a bar, in US $; I made this data up, using some information from the US Census

Index	Net worth
	100, 360
	109, 770
	96, 860
	97, 860
	108, 930
	124, 330
	101, 300
	112, 710
	106, 740
	120, 170

Index	Taste score	Index	Taste score
	12.3		34.9
	20.9		57.2
			0.7
	47.9		25.9
	5.6		54.9
	25.9		40.9
	37.3		15.9
	21.9		6.4
	18.1
			38.9

The index column, which tells you which data item is being referred to, is usually not displayed in a table because you can usually assume that the first line is the first item, and so on. On the right , the taste score (Im not making this up; higher is better) for 20 different cheeses. This data is real (i.e. not made up), and it comes from http://lib.stat.cmu.edu/DASL/Datafiles/Cheese.html

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Probability and Statistics for Computer Science»

Look at similar books to Probability and Statistics for Computer Science. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Unknown

Mathematical Statistics

Reagle Derrick Peter

Schaums outline of theory and problems of statistics and econometrics

Ross

Introduction to Probability Models

Kurt

Bayesian statistics the fun way: understanding statistics and probability with Star Wars, LEGO, and Rubber Ducks

Zacks

Examples and Problems in Mathematical Statistics

Miller

Statistics for data science: leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks

Carlton Matthew A.

Probability with Applications in Engineering, Science, and Technology

David Forsyth

Probability and Statistics for Computer Science

Larry Wasserman

All of Statistics

Dr. Hari M. Koduvely

Learning Bayesian Models with R

Erhan Cinlar

Introduction to Stochastic Processes

Vijay K. Rohatgi

Statistical Inference

Reviews about «Probability and Statistics for Computer Science»

Discussion, reviews of the book Probability and Statistics for Computer Science and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.