Copyright
Elsevier
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK
Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands
First edition 2012
Copyright 2012 by Elsevier Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher
Permissions may be sought directly from Elsevier's Science & Technology Rights
Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: permissions@elsevier.com. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions and selecting
Obtaining permission to use Elsevier material
Notice
No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
Menke, William.
Environmental data analysis with MatLab/William Menke, Joshua Menke.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-12-391886-4 (alk. paper)
1.Environmental sciencesMathematical models.2.Environmental sciencesData processing.3.MATLAB.I.Menke, Joshua E. (Joshua Ephraim), 1976-II.Title.
GE45.M37M46 2012
363.7001'5118dc22
2011014689
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
For information on all Elsevier publications
visit our web site at elsevierdirect.com
Printed and bound in USA
12131410987654321
ISBN: 978-0-12-391886-4
Preface
The first question that I ask an environmental science student who comes seeking my advice on a data analysis problem is Have you looked at your data? Very often, after some beating around the bush, the student answers, Not really. The student goes on to explain that he or she loaded the data into an analysis package provided by an advisor, or downloaded off the web, and it didn't work. The student tells me, Something is wrong with my data! I then ask my second question: Have you ever used the analysis package with a dataset that did work? After some further beating around the bush, the student answers, Not really. At this point, I offer two pieces of advice. The first is to spend time getting familiar with the dataset. Taking into account what the student has been able to tell me, I outline a series of plots, histograms, and tables that will help him or her prise out its general character and some of its nuances. Second, I urge the student to create several simulated datasets with properties similar to those expected of the data and run the data analysis package on them. The student needs to make sure that he or she is operating it correctly and that it returns the right answers. We then make an appointment to review the student's progress in a week or two. Very often the student comes back reporting, The problem wasn't at all what I thought it was!
Then the real works begins, either to solve the problem or if the student has already solved itwhich often he or she hasto get on with the data analysis.
Environmental Data Analysis with MatLab is organized around two principles. The first is that real proficiency in data analysis requires analyzing realistic data on a computer, and not merely working through ultra-simplified examples with pencil and paper. The second is that the skills needed to perform data analysis are best learned in a series of steps that alternate between theory and application and that start simple but rapidly expand as one's toolkit of skills grows. The real world puts many impediments in the way of analyzing dataerrors of all sorts, missing information, inconvenient units of measurements, inscrutable data formats, and more. Consequently, real proficiency is as much about confidence and experience as it is about formal knowledge of techniques. This book teaches a core set of techniques that are widely applicable across all of Environmental Science, and it reinforces them by leading the student through a series of case studies on real-world data that has both the richness and the blemishes inherent in real-world things.
Two fundamental themes are used to tie together many different data analysis techniques:
The first is that measurement error is a fundamental aspect of observation and experiment. Error has a profound influence on the way that knowledge is distilled from data. We use probability theory to develop the concept of covariance, the key tool for quantifying error. We then show how covariance propagates through a chain of calculations leading to a result that possesses uncertainty. Dealing with that uncertainty is as important a part of data analysis as arriving at the result, itself. From , where it is introduced, through the book's end, we are always returning to the idea of the propagation of error.
The second is that many problems are special cases of a linear model linking the observations to the knowledge that we aspire to derive from them. Measurements of the world around us create data, numbers that describe the results of observations and experiments. But measurements, in and of themselves, are of little utility. The purpose of data analysis is to distill them down to a few significant and insightful model parameters. We develop the idea of the linear model in and in subsequent chapters show that very many, seemingly different data analysis techniques are special cases of it. These include curve fitting, Fourier analysis, filtering, factor analysis, empirical function analysis and interpolation. While their uses are varied, they all share a common structure, which when recognized makes understanding them easier. Most important, covariance propagates through them in nearly identical ways.
As the title of this book implies, it relies very heavily on MatLab to connect the theory of data analysis to its practice in the real world. MatLab, a commercial product of The MathWorks, Inc., is a popular scientific computing environment that fully supports data analysis, data visualization, and data file manipulation. It includes a scripting language through which complicated data analysis procedures can be developed, tested, performed, and archived. Environmental Data Analysis with MatLab makes use of scripts in three ways. First, the text includes many short scripts and excerpts from scripts that illustrate how particular data analysis procedures are actually performed. Second, a set of complete scripts and accompanying datasets is provided as a companion to the book. They implement all of the book's figures and case studies. Third, each chapter includes recommended homework problems that further develop the case studies. They require existing scripts to be modified and new scripts to be written.