• Complain

David Forsyth - Probability and Statistics for Computer Science

Here you can read online David Forsyth - Probability and Statistics for Computer Science full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 0, publisher: Springer International Publishing, Cham, genre: Children. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

David Forsyth Probability and Statistics for Computer Science
  • Book:
    Probability and Statistics for Computer Science
  • Author:
  • Publisher:
    Springer International Publishing, Cham
  • Genre:
  • Year:
    0
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Probability and Statistics for Computer Science: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Probability and Statistics for Computer Science" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

David Forsyth: author's other books


Who wrote Probability and Statistics for Computer Science? Find out the surname, the name of the author of the book and a list of all author's works by series.

Probability and Statistics for Computer Science — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Probability and Statistics for Computer Science" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Part I
Describing Datasets
Springer International Publishing AG 2018
David Forsyth Probability and Statistics for Computer Science
1. First Tools for Looking at Data
David Forsyth 1
(1)
Computer Science Department, University of Illinois at Urbana Champaign, Urbana, IL, USA
The single most important question for a working scientistperhaps the single most useful question anyone can askis: whats going on here? Answering this question requires creative use of different ways to make pictures of datasets, to summarize them, and to expose whatever structure might be there. This is an activity that is sometimes known as Descriptive Statistics. There isnt any fixed recipe for understanding a dataset, but there is a rich variety of tools we can use to get insights.
1.1 Datasets
A dataset is a collection of descriptions of different instances of the same phenomenon. These descriptions could take a variety of forms, but it is important that they are descriptions of the same thing. For example, my grandfather collected the daily rainfall in his garden for many years; we could collect the height of each person in a room; or the number of children in each family on a block; or whether 10 classmates would prefer to be rich or famous. There could be more than one description recorded for each item. For example, when he recorded the contents of the rain gauge each morning, my grandfather could have recorded (say) the temperature and barometric pressure. As another example, one might record the height, weight, blood pressure and body temperature of every patient visiting a doctors office.
The descriptions in a dataset can take a variety of forms. A description could be categorical , meaning that each data item can take a small set of prescribed values. For example, we might record whether each of 100 passers-by preferred to be Rich or Famous. As another example, we could record whether the passers-by are Male or Female. Categorical data could be ordinal , meaning that we can tell whether one data item is larger than another. For example, a dataset giving the number of children in a family for some set of families is categorical, because it uses only non-negative integers, but it is also ordinal, because we can tell whether one family is larger than another.
Some ordinal categorical data appears not to be numerical, but can be assigned a number in a reasonably sensible fashion. For example, many readers will recall being asked by a doctor to rate their pain on a scale of 110a question that is usually relatively easy to answer, but is quite strange when you think about it carefully. As another example, we could ask a set of users to rate the usability of an interface in a range from very bad to very good, and then record that using 2 for very bad, 1 for bad, 0 for neutral, 1 for good, and 2 for very good.
Many interesting datasets involve continuous variables (like, for example, height or weight or body temperature) when you could reasonably expect to encounter any value in a particular range. For example, we might have the heights of all people in a particular room, or the rainfall at a particular place for each day of the year.
You should think of a dataset as a collection of d -tuples (a d -tuple is an ordered list of d elements). Tuples differ from vectors, because we can always add and subtract vectors, but we cannot necessarily add or subtract tuples. We will always write N for the number of tuples in the dataset, and d for the number of elements in each tuple. The number of elements will be the same for every tuple, though sometimes we may not know the value of some elements in some tuples (which means we must figure out how to predict their values, which we will do much later).
Each element of a tuple has its own type. Some elements might be categorical. For example, one dataset we shall see several times has entries for Gender; Grade; Age; Race; Urban/Rural; School; Goals; Grades; Sports; Looks; and Money for 478 children, so d = 11 and N = 478. In this dataset, each entry is categorical data. Clearly, these tuples are not vectors because one cannot add or subtract (say) Gender, or add Age to Grades.
Most of our data will be vectors. We use the same notation for a tuple and for a vector. We write a vector in bold, so x could represent a vector or a tuple (the context will make it obvious which is intended).
The entire data set is { x }. When we need to refer to the i th data item, we write x i . Assume we have N data items, and we wish to make a new dataset out of them; we write the dataset made out of these items as { x i } (the i is to suggest you are taking a set of items and making a dataset out of them).
In this chapter, we will work mainly with continuous data. We will see a variety of methods for plotting and summarizing 1-tuples. We can build these plots from a dataset of d -tuples by extracting the r th element of each d -tuple. All through the book, we will see many datasets downloaded from various web sources, because people are so generous about publishing interesting datasets on the web. In the next chapter, we will look at two-dimensional data, and we look at high dimensional data in Chap.
1.2 Whats Happening? Plotting Data
The very simplest way to present or visualize a dataset is to produce a table. Tables can be helpful, but arent much use for large datasets, because it is difficult to get any sense of what the data means from a table. As a continuous example, Table gives a table of the net worth of a set of people you might meet in a bar (I made this data up). You can scan the table and have a rough sense of what is going on; net worths are quite close to $100,000, and there arent any very big or very small numbers. This sort of information might be useful, for example, in choosing a bar.
Table 1.1
On the left , net worths of people you meet in a bar, in US $; I made this data up, using some information from the US Census
Index
Net worth
100, 360
109, 770
96, 860
97, 860
108, 930
124, 330
101, 300
112, 710
106, 740
120, 170
Index
Taste score
Index
Taste score
12.3
34.9
20.9
57.2
0.7
47.9
25.9
5.6
54.9
25.9
40.9
37.3
15.9
21.9
6.4
18.1
38.9
The index column, which tells you which data item is being referred to, is usually not displayed in a table because you can usually assume that the first line is the first item, and so on. On the right , the taste score (Im not making this up; higher is better) for 20 different cheeses. This data is real (i.e. not made up), and it comes from http://lib.stat.cmu.edu/DASL/Datafiles/Cheese.html
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Probability and Statistics for Computer Science»

Look at similar books to Probability and Statistics for Computer Science. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Probability and Statistics for Computer Science»

Discussion, reviews of the book Probability and Statistics for Computer Science and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.