• Complain

Tolosana-Delgado Raimon - Analyzing Compositional Data with R

Here you can read online Tolosana-Delgado Raimon - Analyzing Compositional Data with R full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Berlin;Heidelberg, year: 2013, publisher: Springer, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Tolosana-Delgado Raimon Analyzing Compositional Data with R

Analyzing Compositional Data with R: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Analyzing Compositional Data with R" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Tolosana-Delgado Raimon: author's other books


Who wrote Analyzing Compositional Data with R? Find out the surname, the name of the author of the book and a list of all author's works by series.

Analyzing Compositional Data with R — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Analyzing Compositional Data with R" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
K. Gerald van den Boogaart and Raimon Tolosana-Delgado Use R! Analyzing Compositional Data with R 2013 10.1007/978-3-642-36809-7_1 Springer-Verlag Berlin Heidelberg 2013
1. Introduction
K. Gerald van den Boogaart 1 and Raimon Tolosana-Delgado 1
(1)
Freiberg for Resources Technology Helmholtz Institute, Freiberg, Germany
Abstract
Data describing amounts of components of specimens are compositional if the size of each specimen is constant or irrelevant. Ideally compositional data is given by relative portions summing up to 1 or 100 %. But more often compositional data appear disguised in several ways: different components might be reported in different physical units, different cases might sum up to different totals, and almost never all relevant components are reported. Nevertheless, the constraints of constant sum and relative meaning of the portions have important implications for their statistical analysis, contradicting the typical assumptions of usual uni- and multivariate statistical methods and thus rendering their direct application spurious. A comprehensive statistical methodology, based on a vector space structure of the mathematical simplex, has only been developed very recently, and several software packages are now available to treat compositional data within it. This book is at the same time a textbook on compositional data analysis from a modern perspective and a sort of manual on the R -package compositions: both R and compositions are available for download as free software. This chapter discusses the need of an own statistical methodology for compositions, the historic background of compositional data analysis, and the software needs for compositional data analysis.
1.1 What Are Compositional Data?
1.1.1 Compositions Are Portions of a Total
Traditionally, a dataset has been called compositional if it provides portions of a total: percentages of workers in different sectors, portions of the chemical elements in a mineral, concentration of different cell types in a patients blood, portions of species in an ecosystem or in a trap, concentration of nutrients in a beverage, portions of working time spent on different tasks, portions of types of failures, percentages of votes for political parties, etc.
The individual parts of the composition are called components . Each component has an amount , representing its importance within the whole. Amounts can be measured as absolute values, in amount-type physical values like money, time, volume, mass, energy, molecules, individuals, and events. Or they can be also measured as portions with respect to a total, simply reporting such a quantity divided by a common total.
The sum over the amounts of all components is called the total amount or, short, the total . Portions are the individual amounts divided by this total amount. Depending on the unit chosen for the amounts, the actual portions of the parts in a total can be different. For instance, different numbers for mass percentage, volume percentage, and molar percentage will be obtained for the same physical system.
Most methods from multivariate statistics developed for real valued datasets are misleading or inapplicable for compositional datasets, for various reasons:
  • Independent components mixed together and closed exhibit negative correlations (Chayes, ).
    This negative bias contradicts the usual interpretations of correlation and covariance, where independence is usually related to zero correlation.
  • Covariance between two components depends on which other components are reported in the dataset.
    This disqualifies classical covariance-based tools as objective descriptions of the dependence between just two variables. This is particularly important whenever there is no single, unique, objective choice of the components to include in the composition, because each analyst could reasonably take a different set of components (including the two for which the covariance is computed) and each would get a different covariance estimate, even with different signs. The same comment applies to correlation: this is known as the spurious correlation problem.
  • Variance matrices are always singular due to the constant sum constraints.
    Many multivariate statistical methods rely on a full-rank variance matrix, like multivariate linear models, Mahalanobis distances, factor analysis, minimum determinant variance estimators, multivariate Z-transforms, multivariate densities, linear and quadratic discriminant analysis, and Hotellings T 2 distribution. None will be directly applicable.
  • Components cannot be normally distributed, due to the bounded range of values.
    Many multivariate methods are at least motivated by multivariate normal distributions: covariance matrices, confidence ellipsoids, and principal component analysis are some examples. The normal model is a bad one for (untransformed) compositions, because it is a model unable to describe bounded data.
Compositions thus need an own set of statistical methods and should not be treated with statistical methods made for interval scale data. That early realization motivated Aitchison () to introduce the field of compositional data analysis.
1.1.2 Compositions Are Multivariate by Nature
It is quite evident that our dataset can only be compositional if it has at least two components. Otherwise, we cannot speak of a part in a total . Even in the case that we only report one part, we are in fact implicitly relating it either to a predefined total or to a complementary part. For instance, the votes obtained by the democrats/labor/social party in a district are uninformative without knowing how many got the republicans/tories/conservatives or how many votes were counted altogether; the number of grains of feldspar counted on a sediment must be interpreted relative to the grains of quartz or to the total number of grains in the counting procedure; the investment of a state on Education is on itself meaningless without knowing either how large was the total budget or how much does it spend on defense.
That implies a substantial difference between compositional data and other multivariate datasets. Most multivariate analysis begin with an univariate analysis of the individual variables (the marginals ), whereas each marginal variable of a compositional dataset has no meaning on itself, isolated from the rest.
1.1.3 The Total Sum of a Composition Is Irrelevant
Classically, compositions were defined as vectors of positive components and constant sum. However, in a more recent perspective (Aitchison, ), a dataset of amounts of components of a common total can (and should) be modeled as compositional if the total amount does not matter for the problem under consideration. This may occur for several reasons:
  • When compositional data are actually measured, we often have little control over the total amount.
    A probe of the patients blood, a sample of geological formation, a national economy, or the workforce of a company: in any of these cases, the total size of each individual sample is either irrelevant (the country population, the size of the rock sample, or the enterprise) or predefined (the syringe volume).
  • Often the amounts are incompletely given and do not sum up to the real total.
    Some species of a biological system cannot be caught in traps; we never quantify all possible chemical elements in a rock; we cannot analyze for all possible ingredients of a beverage.
  • Most typically, the totals are not comparable between the different statistical individuals.
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Analyzing Compositional Data with R»

Look at similar books to Analyzing Compositional Data with R. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Analyzing Compositional Data with R»

Discussion, reviews of the book Analyzing Compositional Data with R and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.