logo资料库

Modern Multivariate Statistical Techniques(现代多元统计技术).pdf

第1页 / 共756页
第2页 / 共756页
第3页 / 共756页
第4页 / 共756页
第5页 / 共756页
第6页 / 共756页
第7页 / 共756页
第8页 / 共756页
资料共756页,剩余部分请下载后查看
front-matter
fulltext_ch01
fulltext_ch02
fulltext_ch03
fulltext_ch04
fulltext_ch05
fulltext_ch06
fulltext_ch07
fulltext_ch08
fulltext_ch09
fulltext_ch10
fulltext_ch11
fulltext_ch12
fulltext_ch13
fulltext_ch14
fulltext_ch15
fulltext_ch16
fulltext_ch17
back-matter
Springer Texts in Statistics Series Editors: G. Casella S. Fienberg I. Olkin
Springer Texts in Statistics For other titles published in this series, go to www.springer.com/series/417
Alan Julian Izenman Modern Multivariate Statistical Techniques Regression, Classification, and Manifold Learning 123
Alan J. Izenman Department of Statistics Temple University Speakman Hall Philadelphia, PA 19122 USA alan@temple.edu Editorial Board George Casella Department of Statistics University of Florida Gainesville, 8545 USA FL 32611- Stephen Fienberg Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213-3890 USA l Ingram O kin Department of Statistics Stanford University Stanford, CA 94305 USA ISSN: 1431-875X ISBN: 978-0-387-78188-4 DOI: 10.1007/978-0-387-78189-1 e-ISBN: 978-0-387-78189-1 Library of Congress Control Number: 2008928720. c 2008 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper springer.com
This book is dedicated to the memory of my parents, Kitty and Larry, and to my family, Betty-Ann and Kayla
Preface Not so long ago, multivariate analysis consisted solely of linear methods illustrated on small to medium-sized data sets. Moreover, statistical com- puting meant primarily batch processing (often using boxes of punched cards) carried out on a mainframe computer at a remote computer facil- ity. During the 1970s, interactive computing was just beginning to raise its head, and exploratory data analysis was a new idea. In the decades since then, we have witnessed a number of remarkable developments in local computing power and data storage. Huge quantities of data are being col- lected, stored, and efficiently managed, and interactive statistical software packages enable sophisticated data analyses to be carried out effortlessly. These advances enabled new disciplines called data mining and machine learning to be created and developed by researchers in computer science and statistics. As enormous data sets become the norm rather than the exception, sta- tistics as a scientific discipline is changing to keep up with this development. Instead of the traditional heavy reliance on hypothesis testing, attention is now being focused on information or knowledge discovery. Accordingly, some of the recent advances in multivariate analysis include techniques from computer science, artificial intelligence, and machine learning theory. Many of these new techniques are still in their infancy, waiting for statistical theory to catch up. The origins of some of these techniques are purely algorithmic, whereas the more traditional techniques were derived through modeling, optimiza-
viii Preface tion, or probabilistic reasoning. As such algorithmic techniques mature, it becomes necessary to build a solid statistical framework within which to embed them. In some instances, it may not be at all obvious why a partic- ular technique (such as a complex algorithm) works as well as it does: When new ideas are being developed, the most fruitful approach is often to let rigor rest for a while, and let intuition reign — at least in the beginning. New methods may require new concepts and new approaches, in extreme cases even a new language, and it may then be impossible to describe such ideas precisely in the old language. — Inge S. Helland, 2000 It is hoped that this book will be enjoyed by those who wish to under- stand the current state of multivariate statistical analysis in an age of high- speed computation and large data sets. This book mixes new algorithmic techniques for analyzing large multivariate data sets with some of the more classical multivariate techniques. Yet, even the classical methods are not given only standard treatments here; many of them are also derived as spe- cial cases of a common theoretical framework (multivariate reduced-rank regression) rather than separately through different approaches. Another major feature of this book is the novel data sets that are used as examples to illustrate the techniques. I have included as much statistical theory as I believed is necessary to understand the development of ideas, plus details of certain computational algorithms; historical notes on the various topics have also been added wherever possible (usually in the Bibliographical Notes at the end of each chapter) to help the reader gain some perspective on the subject matter. References at the end of the book should be considered as extensive without being exhaustive. Some common abbreviations used in this book should be noted: “iid” means independently and identically distributed; “wrt” means with respect to; and “lhs” and “rhs” mean left- and right-hand side, respectively. Audience This book is directed toward advanced undergraduate students, gradu- ate students, and researchers in statistics, computer science, artificial in- telligence, psychology, neural and cognitive sciences, business, medicine, bioinformatics, and engineering. As prerequisites, readers are expected to have had previous knowledge of probability, statistical theory and methods, multivariable calculus, and linear/matrix algebra. Because vectors and ma- trices play such a major role in multivariate analysis, Chapter 3 gives the matrix notation used in the book and many important advanced concepts in matrix theory. Along with a background in classical statistical theory
Preface ix and methods, it would also be helpful if the reader had some exposure to Bayesian ideas in statistics. There are various types of courses for which this book can be used, in- cluding data mining, machine learning, computational statistics, and for a traditional course in multivariate analysis. Sections of this book have been used at Temple University as the basis of lectures in a one-semester course in applied multivariate analysis to statistics and graduate business students (where technical derivations are skipped and emphasis is placed on the examples and computational algorithms) and a two-semester course in advanced topics in statistics given to graduate students from statistics, computer science, and engineering. I am grateful for their feedback (includ- ing spotting typos and inconsistencies). Although there is enough material in this book for a two-semester course, a one-semester course in traditional multivariate analysis can be drawn from the material in Sections 1.1–1.3, 2.1–2.3, 2.5, 2.6, 3.1–3.5, 5.1–5.7, 6.1– 6.3, 7.1–7.3, 8.1–8.7, 12.1–12.4, 13.1–13.9, 15.4, and 17.1–17.4; additional parts of the book can be used as appropriate. Software Software for computing the techniques described in this book is publicly available either through routines in major computer packages or through download from Internet websites. I have used primarily the R, S-Plus, and Matlab packages in writing this book. In the Software Packages section at the ends of certain chapters, I have listed the relevant R/S-Plus routines for the respective chapter as well as the appropriate toolboxes in Matlab. I have also tried to indicate other major packages wherever relevant. Data Sets The many data sets that illustrate the multivariate techniques presented in this book were obtained from a wide variety of sources and disciplines and will be made available through the book’s website. Disciplines from which the data were obtained include astronomy, bioinformatics, botany, chemo- metrics, criminology, food science, forensic science, genetics, geoscience, medicine, philately, physical anthropology, psychology, soil science, sports, and steganography. Part of the learning process for the reader is to become familiar with the classic data sets that are associated with each technique. In particular, data sets from popular data repositories are used to compare and contrast methodologies. Examples in the book involve small data sets (if a particular point or computation needs clarifying) and large data sets (to see the power of the techniques in question). Exercises At the end of every chapter (except Chapter 1), there is a number of exercises designed to make the reader (a) relate the problem to the text and fill in the technical details omitted in the development of certain techniques,
分享到:
收藏