logo资料库

classification and regression trees.pdf

第1页 / 共384页
第2页 / 共384页
第3页 / 共384页
第4页 / 共384页
第5页 / 共384页
第6页 / 共384页
第7页 / 共384页
第8页 / 共384页
资料共384页,剩余部分请下载后查看
Dedication
Title Page
Copyright Page
PREFACE
Acknowledgements
Chapter 1 - BACKGROUND
1.1 CLASSIFIERS AS PARTITIONS
1.2 USE OF DATA IN CONSTRUCTING CLASSIFIERS
1.3 THE PURPOSES OF CLASSIFICATION ANALYSIS
1.4 ESTIMATING ACCURACY
1.5 THE BAYES RULE AND CURRENT CLASSIFICATION PROCEDURES
Chapter 2 - INTRODUCTION TO TREE CLASSIFICATION
2.1 THE SHIP CLASSIFICATION PROBLEM
2.2 TREE STRUCTURED CLASSIFIERS
2.3 CONSTRUCTION OF THE TREE CLASSIFIER
2.4 INITIAL TREE GROWING METHODOLOGY
2.5 METHODOLOGICAL DEVELOPMENT
2.6 TWO RUNNING EXAMPLES
2.7 THE ADVANTAGES OF THE TREE STRUCTURED APPROACH
Chapter 3 - RIGHT SIZED TREES AND HONEST ESTIMATES
3.1 INTRODUCTION
3.2 GETTING READY TO PRUNE
3.3 MINIMAL COST-COMPLEXITY PRUNING
3.4 THE BEST PRUNED SUBTREE: AN ESTIMATION PROBLEM
3.5 SOME EXAMPLES
APPENDIX
Chapter 4 - SPLITTING RULES
4.1 REDUCING MISCLASSIFICATION COST
4.2 THE TWO-CLASS PROBLEM
4.3 THE MULTICLASS PROBLEM: UNIT COSTS
4.4 PRIORS AND VARIABLE MISCLASSIFICATION COSTS
4.5 TWO EXAMPLES
4.6 CLASS PROBABILITY TREES VIA GINI
APPENDIX
Chapter 5 - STRENGTHENING AND INTERPRETING
5.1 INTRODUCTION
5.2 VARIABLE COMBINATIONS
5.3 SURROGATE SPLITS AND THEIR USES
5.4 ESTIMATING WITHIN-NODE COST
5.5 INTERPRETATION AND EXPLORATION
5.6 COMPUTATIONAL EFFICIENCY
5.7 COMPARISON OF ACCURACY WITH OTHER METHODS
APPENDIX
Chapter 6 - MEDICAL DIAGNOSIS AND PROGNOSIS
6.1 PROGNOSIS AFTER HEART ATTACK
6.2 DIAGNOSING HEART ATTACKS
6.3 IMMUNOSUPPRESSION AND THE DIAGNOSIS OF CANCER
6.4 GAIT ANALYSIS AND THE DETECTION OF OUTLIERS
6.5 RELATED WORK ON COMPUTER-AIDED DIAGNOSIS
Chapter 7 - MASS SPECTRA CLASSIFICATION
7.1 INTRODUCTION
7.2 GENERALIZED TREE CONSTRUCTION
7.3 THE BROMINE TREE: A NONSTANDARD EXAMPLE
Chapter 8 - REGRESSION TREES
8.1 INTRODUCTION
8.2 AN EXAMPLE
8.3 LEAST SQUARES REGRESSION
8.4 TREE STRUCTURED REGRESSION
8.5 PRUNING AND ESTIMATING
8.6 A SIMULATED EXAMPLE
8.7 TWO CROSS-VALIDATION ISSUES
8.8 STANDARD STRUCTURE TREES
8.9 USING SURROGATE SPLITS
8.10 INTERPRETATION
8.11 LEAST ABSOLUTE DEVIATION REGRESSION
8.12 OVERALL CONCLUSIONS
Chapter 9 - BAYES RULES AND PARTITIONS
9.1 BAYES RULE
9.2 BAYES RULE FOR A PARTITION
9.3 RISK REDUCTION SPLITTING RULE
9.4 CATEGORICAL SPLITS
Chapter 10 - OPTIMAL PRUNING
10.1 TREE TERMINOLOGY
10.2 OPTIMALLY PRUNED SUBTREES
10.3 AN EXPLICIT OPTIMAL PRUNING ALGORITHM
Chapter 11 - CONSTRUCTION OF TREES FROM A LEARNING SAMPLE
11.1 ESTIMATED BAYES RULE FOR A PARTITION
11.2 EMPIRICAL RISK REDUCTION SPLITTING RULE
11.3 OPTIMAL PRUNING
11.4 TEST SAMPLES
11.5 CROSS-VALIDATION
11.6 FINAL TREE SELECTION
11.7 BOOTSTRAP ESTIMATE OF OVERALL RISK
11.8 END-CUT PREFERENCE
Chapter 12 - CONSISTENCY
12.1 EMPIRICAL DISTRIBUTIONS
12.2 REGRESSION
12.3 CLASSIFICATION
12.4 PROOFS FOR SECTION 12.1
12.5 PROOFS FOR SECTION 12.2
12.6 PROOFS FOR SECTION 12.3
BIBLIOGRAPHY
NOTATION INDEX
SUBJECT INDEX
Table of Contents Dedication Title Page Copyright Page PREFACE Acknowledgements Chapter 1 - BACKGROUND 1.1 CLASSIFIERS AS PARTITIONS 1.2 USE OF DATA IN CONSTRUCTING CLASSIFIERS 1.3 THE PURPOSES OF CLASSIFICATION ANALYSIS 1.4 ESTIMATING ACCURACY 1.5 THE BAYES RULE AND CURRENT CLASSIFICATION PROCEDURES Chapter 2 - INTRODUCTION TO TREE CLASSIFICATION 2.1 THE SHIP CLASSIFICATION PROBLEM 2.2 TREE STRUCTURED CLASSIFIERS 2.3 CONSTRUCTION OF THE TREE CLASSIFIER 2.4 INITIAL TREE GROWING METHODOLOGY 2.5 METHODOLOGICAL DEVELOPMENT 2.6 TWO RUNNING EXAMPLES 2.7 THE ADVANTAGES OF THE TREE STRUCTURED APPROACH Chapter 3 - RIGHT SIZED TREES AND HONEST ESTIMATES 3.1 INTRODUCTION 3.2 GETTING READY TO PRUNE 3.3 MINIMAL COST-COMPLEXITY PRUNING 3.4 THE BEST PRUNED SUBTREE: AN ESTIMATION PROBLEM 3.5 SOME EXAMPLES APPENDIX Chapter 4 - SPLITTING RULES 4.1 REDUCING MISCLASSIFICATION COST 4.2 THE TWO-CLASS PROBLEM 4.3 THE MULTICLASS PROBLEM: UNIT COSTS 4.4 PRIORS AND VARIABLE MISCLASSIFICATION COSTS 4.5 TWO EXAMPLES 4.6 CLASS PROBABILITY TREES VIA GINI APPENDIX
Chapter 5 - STRENGTHENING AND INTERPRETING 5.1 INTRODUCTION 5.2 VARIABLE COMBINATIONS 5.3 SURROGATE SPLITS AND THEIR USES 5.4 ESTIMATING WITHIN-NODE COST 5.5 INTERPRETATION AND EXPLORATION 5.6 COMPUTATIONAL EFFICIENCY 5.7 COMPARISON OF ACCURACY WITH OTHER METHODS APPENDIX Chapter 6 - MEDICAL DIAGNOSIS AND PROGNOSIS 6.1 PROGNOSIS AFTER HEART ATTACK 6.2 DIAGNOSING HEART ATTACKS 6.3 IMMUNOSUPPRESSION AND THE DIAGNOSIS OF CANCER 6.4 GAIT ANALYSIS AND THE DETECTION OF OUTLIERS 6.5 RELATED WORK ON COMPUTER-AIDED DIAGNOSIS Chapter 7 - MASS SPECTRA CLASSIFICATION 7.1 INTRODUCTION 7.2 GENERALIZED TREE CONSTRUCTION 7.3 THE BROMINE TREE: A NONSTANDARD EXAMPLE Chapter 8 - REGRESSION TREES 8.1 INTRODUCTION 8.2 AN EXAMPLE 8.3 LEAST SQUARES REGRESSION 8.4 TREE STRUCTURED REGRESSION 8.5 PRUNING AND ESTIMATING 8.6 A SIMULATED EXAMPLE 8.7 TWO CROSS-VALIDATION ISSUES 8.8 STANDARD STRUCTURE TREES 8.9 USING SURROGATE SPLITS 8.10 INTERPRETATION 8.11 LEAST ABSOLUTE DEVIATION REGRESSION 8.12 OVERALL CONCLUSIONS Chapter 9 - BAYES RULES AND PARTITIONS 9.1 BAYES RULE 9.2 BAYES RULE FOR A PARTITION 9.3 RISK REDUCTION SPLITTING RULE
9.4 CATEGORICAL SPLITS Chapter 10 - OPTIMAL PRUNING 10.1 TREE TERMINOLOGY 10.2 OPTIMALLY PRUNED SUBTREES 10.3 AN EXPLICIT OPTIMAL PRUNING ALGORITHM Chapter 11 - CONSTRUCTION OF TREES FROM A LEARNING SAMPLE 11.1 ESTIMATED BAYES RULE FOR A PARTITION 11.2 EMPIRICAL RISK REDUCTION SPLITTING RULE 11.3 OPTIMAL PRUNING 11.4 TEST SAMPLES 11.5 CROSS-VALIDATION 11.6 FINAL TREE SELECTION 11.7 BOOTSTRAP ESTIMATE OF OVERALL RISK 11.8 END-CUT PREFERENCE Chapter 12 - CONSISTENCY 12.1 EMPIRICAL DISTRIBUTIONS 12.2 REGRESSION 12.3 CLASSIFICATION 12.4 PROOFS FOR SECTION 12.1 12.5 PROOFS FOR SECTION 12.2 12.6 PROOFS FOR SECTION 12.3 BIBLIOGRAPHY NOTATION INDEX SUBJECT INDEX
Lovingly dedicated to our children Jessica, Rebecca, Kymm; Melanie; Elyse, Adam, Rachel, Stephen; Daniel and Kevin
Library of Congress Cataloging-in-Publication Data Main entry under title: Classification and regession trees. (The Wadsworth statistics/probability series) Bibliography: p. Includes Index. ISBN 0-412-04841-8 1. Discriminant analysis. 2. Regression analysis. 3. Trees (Graph theory) I. Breiman, Leo. II. Title: Regression trees. II. Series. QA278.65.C54 1984 519.5′36—dc20 83-19708 CIP use. publisher. This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe. Visit the CRC Press Web site at www.crcpress.com First CRC Press reprint 1998 © 1984, 1993 by Chapman & Hall No claim to original U.S. Government works International Standard Book Number 0-412-04841-8 Library of Congress Card Number 83-19708 Printed in the United States of America 7 8 9 0 Printed on acid-free paper
PREFACE The tree methodology discussed in this book is a child of the computer age. Unlike many other statistical procedures which were moved from pencil and paper to calculators and then to computers, this use of trees was unthinkable before computers. Binary trees give an interesting and often illuminating way of looking at data in classification or regression problems. They should not be used to the exclusion of other methods. We do not claim that they are always better. They do add a flexible nonparametric tool to the data analyst’s arsenal. Both practical and theoretical sides have been developed in our study of tree methods. The book reflects these two sides. The first eight chapters are largely expository and cover the use of trees as a data analysis method. These were written by Leo Breiman with the exception of Chapter 6 by Richard Olshen. Jerome Friedman developed the software and ran the examples. Chapters 9 through 12 place trees in a more mathematical context and prove some of their fundamental properties. The first three of these chapters were written by Charles Stone and the last was jointly written by Stone and Olshen. Trees, as well as many other powerful data analytic tools (factor analysis, nonmetric scaling, and so forth) were originated by social scientists motivated by the need to cope with actual problems and data. Use of trees in regression dates back to the AID (Automatic Interaction Detection) program developed at the Institute for Social Research, University of Michigan, by Morgan and Sonquist in the early 1960s. The ancestor classification program is THAID, developed at the institute in the early 1970s by Morgan and Messenger. The research and developments described in this book are aimed at strengthening and extending these original methods. Our work on trees began in 1973 when Breiman and Friedman, independently of each other, “reinvented the wheel” and began to use tree methods in classification. Later, they joined forces and were joined in turn by Stone, who contributed significantly to the methodological development. Olshen was an early user of tree methods in medical applications and contributed to their theoretical development. Our blossoming fascination with trees and the number of ideas passing back and forth and being incorporated by Friedman into CART (Classification and Regression Trees) soon gave birth to the idea of a book on the subject. In 1980 conception occurred. While the pregnancy has been rather prolonged, we hope that the baby appears acceptably healthy to the members of our statistical community. The layout of the book is Readers are encouraged to contact Richard Olshen regarding the availability of CART software.
分享到:
收藏