logo资料库

Introduction to data mining.pdf

第1页 / 共792页
第2页 / 共792页
第3页 / 共792页
第4页 / 共792页
第5页 / 共792页
第6页 / 共792页
第7页 / 共792页
第8页 / 共792页
资料共792页,剩余部分请下载后查看
Contents
Introduction
Data
Exploring Data
Classification Basic Concept
Classifidation Alternative Techniques
Association Analysis Basic
Association Analysis Advanced
Cluster Analysis Basic
Cluster Analysis Additional
Anomaly Detection
Appendixes
Index
\((PANG.N I NG TANMichigan State UniversityMICHAEL STEINBACHUniversity of MinnesotaVI PI N KU MARUniversity of Minnesotaand Army High PerformanceComputing Research Center+f.f_l crf.rfh. .W if faqtY 6l$t.T.R.C.i'&'ufe61ttt1/.Y \ t.\ $t,/,1'n,5\. 7\ V '48!Boston San Francisco NewYorkLondon Toronto Sydney Tokyo Singapore MadridMexicoCity Munich Paris CapeTown HongKong Montreal
G.Rr+6,qIf you purchased this book within the United States or Canada you should be aware that it has beenwrongfirlly imported without the approval of the Publishel or the Author.T3Loo 6- {)gq* 3 AcquisitionsEditor Matt GoldsteinProjectEditor Katherine HarutunianProduction Supervisor Marilyn LloydProduction Services Paul C. Anagnostopoulos of Windfall SoftwareMarketing Manager Michelle BrownCopyeditor Kathy SmithProofreader IenniferMcClainTechnicallllustration GeorgeNicholsCover Design Supervisor Joyce Cosentino WellsCover Design Night & Day DesignCover Image @ 2005 Rob Casey/Brand X pictureshepress and Manufacturing Caroline FellPrinter HamiltonPrintingAccess the latest information about Addison-Wesley titles from our iWorld Wide Web site:http : //www. aw-bc.com/computingMany of the designations used by manufacturers and sellers to distiriguish their productsare claimed as trademarks. where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capsor all caps.The programs and applications presented in this book have been incl,[rded for theirinstructional value. They have been tested with care, but are not guatanteed for anyparticular purpose. The publisher does not offer any warranties or representations, nor doesit accept any liabilities with respect to the programs or applications.Copyright @ 2006 by Pearson Education, Inc.For information on obtaining permission for use of material in this work, please submit awritten request to Pearson Education, Inc., Rights and Contract Department, 75 ArlingtonStreet, Suite 300, Boston, MA02II6 or fax your request to (617) g4g-j047.All rights reserved. No part of this publication may be reproduced, stored in a retrievalsystem, or transmitted, in any form or by any means, electronic, mechanical, photocopying,recording, or any other media embodiments now known or hereafter to become known,without the prior written permission of the publisher. printed in the united States ofAmerica.lsBN 0-321-42052-72 3 4 5 67 8 9 10-HAM-O8 07 06
our famili,es
PrefaceAdvances in data generation and collection are producing data sets of mas-sive size in commerce and a variety of scientific disciplines. Data warehousesstore details of the sales and operations of businesses, Earth-orbiting satellitesbeam high-resolution images and sensor data back to Earth, and genomics ex-periments generate sequence, structural, and functional data for an increasingnumber of organisms. The ease with which data can now be gathered andstored has created a new attitude toward data analysis: Gather whatever datayou can whenever and wherever possible. It has become an article of faiththat the gathered data will have value, either for the purpose that initiallymotivated its collection or for purposes not yet envisioned.The field of data mining grew out of the limitations of current data anal-ysis techniques in handling the challenges posedl by these new types of datasets. Data mining does not replace other areas of data analysis, but rathertakes them as the foundation for much of its work. While some areas of datamining, such as association analysis, are unique to the field, other areas, suchas clustering, classification, and anomaly detection, build upon a long historyof work on these topics in other fields. Indeed, the willingness of data miningresearchers to draw upon existing techniques has contributed to the strengthand breadth of the field, as well as to its rapid growth.Another strength of the field has been its emphasis on collaboration withresearchers in other areas. The challenges of analyzing new types of datacannot be met by simply applying data analysis techniques in isolation fromthose who understand the data and the domain in which it resides. Often, skillin building multidisciplinary teams has been as responsible for the success ofdata mining projects as the creation of new and innovative algorithms. Justas, historically, many developments in statistics were driven by the needs ofagriculture, industry, medicine, and business, rxrany of the developments indata mining are being driven by the needs of those same fields.This book began as a set of notes and lecture slides for a data miningcourse that has been offered at the University of Minnesota since Spring 1998to upper-division undergraduate and graduate Students. Presentation slides
viii Prefaceand exercises developed in these offerings grew with time and served as a basisfor the book. A survey of clustering techniques in data mining, originallywritten in preparation for research in the area, served as a starting pointfor one of the chapters in the book. Over time, the clustering chapter wasjoined by chapters on data, classification, association analysis, and anomalydetection. The book in its current form has been class tested at the homeinstitutions of the authors-the University of Minnesota and Michigan StateUniversity-as well as several other universities.A number of data mining books appeared in the meantime, but were notcompletely satisfactory for our students primarily graduate and undergrad-uate students in computer science, but including students from industry anda wide variety of other disciplines. Their mathematical and computer back-grounds varied considerably, but they shared a common goal: to learn aboutdata mining as directly as possible in order to quickly apply it to problemsin their own domains. Thus, texts with extensive mathematical or statisticalprerequisites were unappealing to many of them, as were texts that required asubstantial database background. The book that evolved in response to thesestudents needs focuses as directly as possible on the key concepts of data min-ing by illustrating them with examples, simple descriptions of key algorithms,and exercises.Overview Specifically, this book provides a comprehensive introduction todata mining and is designed to be accessible and useful to students, instructors,researchers, and professionals. Areas covered include data preprocessing, vi-sualization, predictive modeling, association analysis, clustering, and anomalydetection. The goal is to present fundamental concepts and algorithms foreach topic, thus providing the reader with the necessary background for theapplication of data mining to real problems. In addition, this book also pro-vides a starting point for those readers who are interested in pursuing researchin data mining or related fields.The book covers five main topics: data, classification, association analysis,clustering, and anomaly detection. Except for anomaly detection, each of theseareas is covered in a pair of chapters. For classification, association analysis,and clustering, the introductory chapter covers basic concepts, representativealgorithms, and evaluation techniques, while the more advanced chapter dis-cusses advanced concepts and algorithms. The objective is to provide thereader with a sound understanding of the foundations of data mining, whilestill covering many important advanced topics. Because of this approach, thebook is useful both as a learning tool and as a reference.
分享到:
收藏