classification and regression trees.pdf

发布时间：2022-06-08 发布人：admin 分类：说明书资料大小：7.39M 资料格式：pdf 举报版权申诉

u014599337-9672886-4744300845394686428.pdf-第1页.png

第1页 / 共384页

u014599337-9672886-4744300845394686428.pdf-第2页.png

第2页 / 共384页

u014599337-9672886-4744300845394686428.pdf-第3页.png

第3页 / 共384页

u014599337-9672886-4744300845394686428.pdf-第4页.png

第4页 / 共384页

u014599337-9672886-4744300845394686428.pdf-第5页.png

第5页 / 共384页

u014599337-9672886-4744300845394686428.pdf-第6页.png

第6页 / 共384页

u014599337-9672886-4744300845394686428.pdf-第7页.png

第7页 / 共384页

u014599337-9672886-4744300845394686428.pdf-第8页.png

第8页 / 共384页

Dedication

Title Page

PREFACE

Acknowledgements

Chapter 1 - BACKGROUND

1.1 CLASSIFIERS AS PARTITIONS

1.2 USE OF DATA IN CONSTRUCTING CLASSIFIERS

1.3 THE PURPOSES OF CLASSIFICATION ANALYSIS

1.4 ESTIMATING ACCURACY

1.5 THE BAYES RULE AND CURRENT CLASSIFICATION PROCEDURES

Chapter 2 - INTRODUCTION TO TREE CLASSIFICATION

2.1 THE SHIP CLASSIFICATION PROBLEM

2.2 TREE STRUCTURED CLASSIFIERS

2.3 CONSTRUCTION OF THE TREE CLASSIFIER

2.4 INITIAL TREE GROWING METHODOLOGY

2.5 METHODOLOGICAL DEVELOPMENT

2.6 TWO RUNNING EXAMPLES

2.7 THE ADVANTAGES OF THE TREE STRUCTURED APPROACH

Chapter 3 - RIGHT SIZED TREES AND HONEST ESTIMATES

3.1 INTRODUCTION

3.2 GETTING READY TO PRUNE

3.3 MINIMAL COST-COMPLEXITY PRUNING

3.4 THE BEST PRUNED SUBTREE: AN ESTIMATION PROBLEM

3.5 SOME EXAMPLES

APPENDIX

Chapter 4 - SPLITTING RULES

4.1 REDUCING MISCLASSIFICATION COST

4.2 THE TWO-CLASS PROBLEM

4.3 THE MULTICLASS PROBLEM: UNIT COSTS

4.4 PRIORS AND VARIABLE MISCLASSIFICATION COSTS

4.5 TWO EXAMPLES

4.6 CLASS PROBABILITY TREES VIA GINI

APPENDIX

Chapter 5 - STRENGTHENING AND INTERPRETING

5.1 INTRODUCTION

5.2 VARIABLE COMBINATIONS

5.3 SURROGATE SPLITS AND THEIR USES

5.4 ESTIMATING WITHIN-NODE COST

5.5 INTERPRETATION AND EXPLORATION

5.6 COMPUTATIONAL EFFICIENCY

5.7 COMPARISON OF ACCURACY WITH OTHER METHODS

APPENDIX

Chapter 6 - MEDICAL DIAGNOSIS AND PROGNOSIS

6.1 PROGNOSIS AFTER HEART ATTACK

6.2 DIAGNOSING HEART ATTACKS

6.3 IMMUNOSUPPRESSION AND THE DIAGNOSIS OF CANCER

6.4 GAIT ANALYSIS AND THE DETECTION OF OUTLIERS

6.5 RELATED WORK ON COMPUTER-AIDED DIAGNOSIS

Chapter 7 - MASS SPECTRA CLASSIFICATION

7.1 INTRODUCTION

7.2 GENERALIZED TREE CONSTRUCTION

7.3 THE BROMINE TREE: A NONSTANDARD EXAMPLE

Chapter 8 - REGRESSION TREES

8.1 INTRODUCTION

8.2 AN EXAMPLE

8.3 LEAST SQUARES REGRESSION

8.4 TREE STRUCTURED REGRESSION

8.5 PRUNING AND ESTIMATING

8.6 A SIMULATED EXAMPLE

8.7 TWO CROSS-VALIDATION ISSUES

8.8 STANDARD STRUCTURE TREES

8.9 USING SURROGATE SPLITS

8.10 INTERPRETATION

8.11 LEAST ABSOLUTE DEVIATION REGRESSION

8.12 OVERALL CONCLUSIONS

Chapter 9 - BAYES RULES AND PARTITIONS

9.1 BAYES RULE

9.2 BAYES RULE FOR A PARTITION

9.3 RISK REDUCTION SPLITTING RULE

9.4 CATEGORICAL SPLITS

Chapter 10 - OPTIMAL PRUNING

10.1 TREE TERMINOLOGY

10.2 OPTIMALLY PRUNED SUBTREES

10.3 AN EXPLICIT OPTIMAL PRUNING ALGORITHM

Chapter 11 - CONSTRUCTION OF TREES FROM A LEARNING SAMPLE

11.1 ESTIMATED BAYES RULE FOR A PARTITION

11.2 EMPIRICAL RISK REDUCTION SPLITTING RULE

11.3 OPTIMAL PRUNING

11.4 TEST SAMPLES

11.5 CROSS-VALIDATION

11.6 FINAL TREE SELECTION

11.7 BOOTSTRAP ESTIMATE OF OVERALL RISK

11.8 END-CUT PREFERENCE

Chapter 12 - CONSISTENCY

12.1 EMPIRICAL DISTRIBUTIONS

12.2 REGRESSION

12.3 CLASSIFICATION

12.4 PROOFS FOR SECTION 12.1

12.5 PROOFS FOR SECTION 12.2

12.6 PROOFS FOR SECTION 12.3

BIBLIOGRAPHY

NOTATION INDEX

SUBJECT INDEX

Table of Contents Dedication Title Page Copyright Page PREFACE Acknowledgements Chapter 1 - BACKGROUND 1.1 CLASSIFIERS AS PARTITIONS 1.2 USE OF DATA IN CONSTRUCTING CLASSIFIERS 1.3 THE PURPOSES OF CLASSIFICATION ANALYSIS 1.4 ESTIMATING ACCURACY 1.5 THE BAYES RULE AND CURRENT CLASSIFICATION PROCEDURES Chapter 2 - INTRODUCTION TO TREE CLASSIFICATION 2.1 THE SHIP CLASSIFICATION PROBLEM 2.2 TREE STRUCTURED CLASSIFIERS 2.3 CONSTRUCTION OF THE TREE CLASSIFIER 2.4 INITIAL TREE GROWING METHODOLOGY 2.5 METHODOLOGICAL DEVELOPMENT 2.6 TWO RUNNING EXAMPLES 2.7 THE ADVANTAGES OF THE TREE STRUCTURED APPROACH Chapter 3 - RIGHT SIZED TREES AND HONEST ESTIMATES 3.1 INTRODUCTION 3.2 GETTING READY TO PRUNE 3.3 MINIMAL COST-COMPLEXITY PRUNING 3.4 THE BEST PRUNED SUBTREE: AN ESTIMATION PROBLEM 3.5 SOME EXAMPLES APPENDIX Chapter 4 - SPLITTING RULES 4.1 REDUCING MISCLASSIFICATION COST 4.2 THE TWO-CLASS PROBLEM 4.3 THE MULTICLASS PROBLEM: UNIT COSTS 4.4 PRIORS AND VARIABLE MISCLASSIFICATION COSTS 4.5 TWO EXAMPLES 4.6 CLASS PROBABILITY TREES VIA GINI APPENDIX

Chapter 5 - STRENGTHENING AND INTERPRETING 5.1 INTRODUCTION 5.2 VARIABLE COMBINATIONS 5.3 SURROGATE SPLITS AND THEIR USES 5.4 ESTIMATING WITHIN-NODE COST 5.5 INTERPRETATION AND EXPLORATION 5.6 COMPUTATIONAL EFFICIENCY 5.7 COMPARISON OF ACCURACY WITH OTHER METHODS APPENDIX Chapter 6 - MEDICAL DIAGNOSIS AND PROGNOSIS 6.1 PROGNOSIS AFTER HEART ATTACK 6.2 DIAGNOSING HEART ATTACKS 6.3 IMMUNOSUPPRESSION AND THE DIAGNOSIS OF CANCER 6.4 GAIT ANALYSIS AND THE DETECTION OF OUTLIERS 6.5 RELATED WORK ON COMPUTER-AIDED DIAGNOSIS Chapter 7 - MASS SPECTRA CLASSIFICATION 7.1 INTRODUCTION 7.2 GENERALIZED TREE CONSTRUCTION 7.3 THE BROMINE TREE: A NONSTANDARD EXAMPLE Chapter 8 - REGRESSION TREES 8.1 INTRODUCTION 8.2 AN EXAMPLE 8.3 LEAST SQUARES REGRESSION 8.4 TREE STRUCTURED REGRESSION 8.5 PRUNING AND ESTIMATING 8.6 A SIMULATED EXAMPLE 8.7 TWO CROSS-VALIDATION ISSUES 8.8 STANDARD STRUCTURE TREES 8.9 USING SURROGATE SPLITS 8.10 INTERPRETATION 8.11 LEAST ABSOLUTE DEVIATION REGRESSION 8.12 OVERALL CONCLUSIONS Chapter 9 - BAYES RULES AND PARTITIONS 9.1 BAYES RULE 9.2 BAYES RULE FOR A PARTITION 9.3 RISK REDUCTION SPLITTING RULE

9.4 CATEGORICAL SPLITS Chapter 10 - OPTIMAL PRUNING 10.1 TREE TERMINOLOGY 10.2 OPTIMALLY PRUNED SUBTREES 10.3 AN EXPLICIT OPTIMAL PRUNING ALGORITHM Chapter 11 - CONSTRUCTION OF TREES FROM A LEARNING SAMPLE 11.1 ESTIMATED BAYES RULE FOR A PARTITION 11.2 EMPIRICAL RISK REDUCTION SPLITTING RULE 11.3 OPTIMAL PRUNING 11.4 TEST SAMPLES 11.5 CROSS-VALIDATION 11.6 FINAL TREE SELECTION 11.7 BOOTSTRAP ESTIMATE OF OVERALL RISK 11.8 END-CUT PREFERENCE Chapter 12 - CONSISTENCY 12.1 EMPIRICAL DISTRIBUTIONS 12.2 REGRESSION 12.3 CLASSIFICATION 12.4 PROOFS FOR SECTION 12.1 12.5 PROOFS FOR SECTION 12.2 12.6 PROOFS FOR SECTION 12.3 BIBLIOGRAPHY NOTATION INDEX SUBJECT INDEX

Lovingly dedicated to our children Jessica, Rebecca, Kymm; Melanie; Elyse, Adam, Rachel, Stephen; Daniel and Kevin

Library of Congress Cataloging-in-Publication Data Main entry under title: Classification and regession trees. (The Wadsworth statistics/probability series) Bibliography: p. Includes Index. ISBN 0-412-04841-8 1. Discriminant analysis. 2. Regression analysis. 3. Trees (Graph theory) I. Breiman, Leo. II. Title: Regression trees. II. Series. QA278.65.C54 1984 519.5′36—dc20 83-19708 CIP use. publisher. This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe. Visit the CRC Press Web site at www.crcpress.com First CRC Press reprint 1998 © 1984, 1993 by Chapman & Hall No claim to original U.S. Government works International Standard Book Number 0-412-04841-8 Library of Congress Card Number 83-19708 Printed in the United States of America 7 8 9 0 Printed on acid-free paper

PREFACE The tree methodology discussed in this book is a child of the computer age. Unlike many other statistical procedures which were moved from pencil and paper to calculators and then to computers, this use of trees was unthinkable before computers. Binary trees give an interesting and often illuminating way of looking at data in classification or regression problems. They should not be used to the exclusion of other methods. We do not claim that they are always better. They do add a flexible nonparametric tool to the data analyst’s arsenal. Both practical and theoretical sides have been developed in our study of tree methods. The book reflects these two sides. The first eight chapters are largely expository and cover the use of trees as a data analysis method. These were written by Leo Breiman with the exception of Chapter 6 by Richard Olshen. Jerome Friedman developed the software and ran the examples. Chapters 9 through 12 place trees in a more mathematical context and prove some of their fundamental properties. The first three of these chapters were written by Charles Stone and the last was jointly written by Stone and Olshen. Trees, as well as many other powerful data analytic tools (factor analysis, nonmetric scaling, and so forth) were originated by social scientists motivated by the need to cope with actual problems and data. Use of trees in regression dates back to the AID (Automatic Interaction Detection) program developed at the Institute for Social Research, University of Michigan, by Morgan and Sonquist in the early 1960s. The ancestor classification program is THAID, developed at the institute in the early 1970s by Morgan and Messenger. The research and developments described in this book are aimed at strengthening and extending these original methods. Our work on trees began in 1973 when Breiman and Friedman, independently of each other, “reinvented the wheel” and began to use tree methods in classification. Later, they joined forces and were joined in turn by Stone, who contributed significantly to the methodological development. Olshen was an early user of tree methods in medical applications and contributed to their theoretical development. Our blossoming fascination with trees and the number of ideas passing back and forth and being incorporated by Friedman into CART (Classification and Regression Trees) soon gave birth to the idea of a book on the subject. In 1980 conception occurred. While the pregnancy has been rather prolonged, we hope that the baby appears acceptably healthy to the members of our statistical community. The layout of the book is Readers are encouraged to contact Richard Olshen regarding the availability of CART software.

分享到：

赞收藏

资料库

classification and regression trees.pdf

相关推荐

课程资源

热门标签

最新资料