logo资料库

Data clustering algorithms and application.pdf

第1页 / 共648页
第2页 / 共648页
第3页 / 共648页
第4页 / 共648页
第5页 / 共648页
第6页 / 共648页
第7页 / 共648页
第8页 / 共648页
资料共648页,剩余部分请下载后查看
Front Cover
Contents
Preface
Editor Biographies
Contributors
Chapter 1: An Introduction to Cluster Analysis
Chapter 2: Feature Selection for Clustering: A Review
Chapter 3: Probabilistic Models for Clustering
Chapter 4: A Survey of Partitional and Hierarchical Clustering Algorithms
Chapter 5: Density-Based Clustering
Chapter 6: Grid-Based Clustering
Chapter 7: Nonnegative Matrix Factorizations for Clustering: A Survey
Chapter 8: Spectral Clustering
Chapter 9: Clustering High-Dimensional Data
Chapter 10: A Survey of Stream Clustering Algorithms
Chapter 11: Big Data Clustering
Chapter 12: Clustering Categorical Data
Chapter 13: Document Clustering: The Next Frontier
Chapter 14 : Clustering Multimedia Data
Chapter 15: Time-Series Data Clustering
Chapter 16: Clustering Biological Data
Chapter 17: Network Clustering
Chapter 18: A Survey of Uncertain Data Clustering Algorithms
Chapter 19: Concepts of Visual and Interactive Clustering
Chapter 20: Semisupervised Clustering
Chapter 21: Alternative Clustering Analysis: A Review
Chapter 22 : Cluster Ensembles: Theory and Applications
Chapter 23: Clustering ValidationMeasures
Chapter 24: Educational and Software Resources for DataClustering
Color Inserts
Back Cover
Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Data Mining Chapman & Hall/CRC Data Mining and Knowledge Discovery Series DATA CLUSTERING Algorithms and Applications Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains. The book focuses on three primary aspects of data clustering: • Methods, describing key techniques commonly used for clustering, such as feature selection, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization Domains, covering methods used for different domains of data, such as categorical data, text data, multimedia data, graph data, biological data, stream data, uncertain data, time series clustering, high-dimensional clustering, and big data Variations and Insights, discussing important variations of the clustering process, such as semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation In this book, top researchers from around the world explore the characteristics of clustering problems in a variety of application areas. They also explain how to glean detailed insight from the clustering process—including how to verify the quality of the underlying clusters—through supervision, human intervention, or the automated generation of alternative clusters. K15510 D A T A C L U S T E R N G I A g g a r w a l R e d d y K15510_Cover.indd 1 7/24/13 2:46 PM • • •
DATA CLUSTERINGAlgorithms and Applications© 2014 by Taylor & Francis Group, LLC
Chapman & Hall/CRC Data Mining and Knowledge Discovery SeriesPUBLISHED TITLESSERIES EDITORVipin KumarUniversity of MinnesotaDepartment of Computer Science and EngineeringMinneapolis, Minnesota, U.S.A.AIMS AND SCOPEThis series aims to capture new developments and applications in data mining and knowledge discovery, while summarizing the computational tools and techniques useful in data analysis. This series encourages the integration of mathematical, statistical, and computational methods and techniques through the publication of a broad range of textbooks, reference works, and hand-books. The inclusion of concrete examples and applications is highly encouraged. The scope of the series includes, but is not limited to, titles in the areas of data mining and knowledge discovery methods and applications, modeling, algorithms, theory and foundations, data and knowledge visualization, data mining systems and tools, and privacy and security issues. ADVANCES IN MACHINE LEARNING AND DATA MINING FOR ASTRONOMY Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, and Ashok N. SrivastavaBIOLOGICAL DATA MINING Jake Y. Chen and Stefano LonardiCOMPUTATIONAL INTELLIGENT DATA ANALYSIS FOR SUSTAINABLE DEVELOPMENT Ting Yu, Nitesh V. Chawla, and Simeon SimoffCOMPUTATIONAL METHODS OF FEATURE SELECTION Huan Liu and Hiroshi MotodaCONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY, AND APPLICATIONS Sugato Basu, Ian Davidson, and Kiri L. WagstaffCONTRAST DATA MINING: CONCEPTS, ALGORITHMS, AND APPLICATIONS Guozhu Dong and James BaileyDATA CLUSTERING: ALGORITHMS AND APPLICATIONS Charu C. Aggarawal and Chandan K. ReddyDATA CLUSTERING IN C++: AN OBJECT-ORIENTED APPROACH Guojun GanDATA MINING FOR DESIGN AND MARKETING Yukio Ohsawa and Katsutoshi Yada DATA MINING WITH R: LEARNING WITH CASE STUDIES Luís TorgoFOUNDATIONS OF PREDICTIVE ANALYTICS James Wu and Stephen CoggeshallGEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY, SECOND EDITION Harvey J. Miller and Jiawei HanHANDBOOK OF EDUCATIONAL DATA MINING Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d. Baker© 2014 by Taylor & Francis Group, LLC
INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS Vagelis HristidisINTELLIGENT TECHNOLOGIES FOR WEB APPLICATIONS Priti Srinivas Sajja and Rajendra AkerkarINTRODUCTION TO PRIVACY-PRESERVING DATA PUBLISHING: CONCEPTS AND TECHNIQUES Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S. YuKNOWLEDGE DISCOVERY FOR COUNTERTERRORISM AND LAW ENFORCEMENT David SkillicornKNOWLEDGE DISCOVERY FROM DATA STREAMS João GamaMACHINE LEARNING AND KNOWLEDGE DISCOVERY FOR ENGINEERING SYSTEMS HEALTH MANAGEMENT Ashok N. Srivastava and Jiawei HanMINING SOFTWARE SPECIFICATIONS: METHODOLOGIES AND APPLICATIONS David Lo, Siau-Cheng Khoo, Jiawei Han, and Chao LiuMULTIMEDIA DATA MINING: A SYSTEMATIC INTRODUCTION TO CONCEPTS AND THEORY Zhongfei Zhang and Ruofei ZhangMUSIC DATA MINING Tao Li, Mitsunori Ogihara, and George TzanetakisNEXT GENERATION OF DATA MINING Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin KumarPRACTICAL GRAPH MINING WITH R Nagiza F. Samatova, William Hendrix, John Jenkins, Kanchana Padmanabhan, and Arpan Chakraborty RELATIONAL DATA CLUSTERING: MODELS, ALGORITHMS, AND APPLICATIONS Bo Long, Zhongfei Zhang, and Philip S. YuSERVICE-ORIENTED DISTRIBUTED KNOWLEDGE DISCOVERY Domenico Talia and Paolo TrunfioSPECTRAL FEATURE SELECTION FOR DATA MINING Zheng Alan Zhao and Huan LiuSTATISTICAL DATA MINING USING SAS APPLICATIONS, SECOND EDITION George FernandezSUPPORT VECTOR MACHINES: OPTIMIZATION BASED THEORY, ALGORITHMS, AND EXTENSIONS Naiyang Deng, Yingjie Tian, and Chunhua ZhangTEMPORAL DATA MINING Theophano MitsaTEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS Ashok N. Srivastava and Mehran SahamiTHE TOP TEN ALGORITHMS IN DATA MINING Xindong Wu and Vipin Kumar UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS David Skillicorn© 2014 by Taylor & Francis Group, LLC
© 2014 by Taylor & Francis Group, LLC
DATA CLUSTERINGAlgorithms and ApplicationsEdited byCharu C. AggarwalChandan K. Reddy© 2014 by Taylor & Francis Group, LLC
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2014 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20130508 International Standard Book Number-13: 978-1-4665-5822-9 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid- ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti- lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy- ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com © 2014 by Taylor & Francis Group, LLC
Contents Preface Editor Biographies Contributors 1 An Introduction to Cluster Analysis Charu C. Aggarwal 1.1 1.2 1.2.6 1.2.7 1.3 1.4 1.5 . . . . . . . . . . . . . . . . . . . . . . Introduction . . Common Techniques Used in Cluster Analysis 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feature Selection Methods . . . . . . . . . . . . . Probabilistic and Generative Models . . . . . . . . Distance-Based Algorithms . . . . . . . . Density- and Grid-Based Methods . . . . . Leveraging Dimensionality Reduction Methods . . . . 1.2.5.1 Generative Models for Dimensionality Reduction . . . . . . 1.2.5.2 Matrix Factorization and Co-Clustering . . . . . . . . . . . . . . . 1.2.5.3 Spectral Methods . . . . . . . . . . . . The High Dimensional Scenario . . . . . . Scalable Techniques for Cluster Analysis . . . . . . . . . . . . . . . . . . . . 1.2.7.1 . . . . . . . . . . . . . 1.2.7.2 . . . . . . . . . . . . 1.2.7.3 . . . . . . . . . . . . Data Types Studied in Cluster Analysis . . . . . . . . . . . . 1.3.1 . . . . . . . . . . . . 1.3.2 . . . . . . . . . . . . 1.3.3 . . . . . . . . . . . . 1.3.4 . . . . . . . . . . . . 1.3.5 . . . . . . . . . . . . 1.3.6 . . . . 1.3.7 . . . . . . . . . . . . Insights Gained from Different Variations of Cluster Analysis . . . . . 1.4.1 . . . . . . . . . . . . . . . . 1.4.2 . . . . . . . . . . . 1.4.3 Multiview and Ensemble-Based Insights . . . . . . . . . . . 1.4.4 Discussion and Conclusions . . . . . . . . . . . I/O Issues in Database Management Streaming Algorithms . . . The Big Data Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clustering Categorical Data . . . . Clustering Text Data . . . . . . . . Clustering Multimedia Data . . . . Clustering Time-Series Data . . . . Clustering Discrete Sequences . . . . . . . Clustering Network Data . Clustering Uncertain Data . . . . . Validation-Based Insights . . . . . Visual Insights . . . Supervised Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi xxiii xxv 1 2 3 4 4 5 7 8 8 8 10 11 13 13 14 14 15 15 16 16 17 17 18 19 19 20 20 21 21 22 vii © 2014 by Taylor & Francis Group, LLC
分享到:
收藏