logo资料库

Mining Text Data ------SPRINGER.pdf

第1页 / 共527页
第2页 / 共527页
第3页 / 共527页
第4页 / 共527页
第5页 / 共527页
第6页 / 共527页
第7页 / 共527页
第8页 / 共527页
资料共527页,剩余部分请下载后查看
Mining Text Data
Contents
Preface
Chapter 1 : AN INTRODUCTION TO TEXT MINING
Chapter 2 : INFORMATION EXTRACTION FROM TEXT
Chapter 3 : A SURVEY OF TEXT SUMMARIZATION TECHNIQUES
Chapter 4 : A SURVEY OF TEXT CLUSTERING ALGORITHMS
Chapter 5 : DIMENSIONALITY REDUCTION AND TOPIC MODELING: FROM LATENT SEMANTIC INDEXING TO LATENT DIRICHLET ALLOCATION AND BEYOND
Chapter 6 : A SURVEY OF TEXT CLASSIFICATION ALGORITHMS
Chapter 7: TRANSFER LEARNING FOR TEXT MINING
Chapter 8: PROBABILISTIC MODELS FOR TEXT MINING
Chapter 9: MINING TEXT STREAMS
Chapter 10: TRANSLINGUAL MINING FROM TEXT DATA
Chapter 11: TEXT MINING IN MULTIMEDIA
Chapter 12: TEXT ANALYTICS IN SOCIAL MEDIA
Chapter 13: A SURVEY OF OPINION MINING AND SENTIMENT ANALYSIS
Chapter 14: BIOMEDICAL TEXT MINING: A SURVEY OF RECENT PROGRESS
Index
Mining Text Data
Charu C. Aggarwal • ChengXiang Zhai Editors Mining Text Data
Library of Congress Control Number: 2012930923 Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use inconnection with any form of information storage and retrieval, electronic adaptation, computer software,or by similar or dissimilar methodology now known or hereafter developed is forbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even if theyare not identified as such, is not to be taken as an expression of opinion as to whether or not they aresubject to proprietary rights. Springer New York Dordrecht Heidelberg LondonISBN978-1-4614-3222-7e-ISBN978-1-4614-3223-4DOI10.1007/978-1-4614-© Springer Science+Business Media, LLC 2012EditorsCharu C. AggarwalIBM T.J. Watson Research CenterYorktown Heights, NY, USAcharu@us.ibm.comUniversity of Illinois at Urbana-ChampaignUrbana, IL, USAczhai@cs.uiuc.edu3223-4ChengXiang Zhai
Contents1AnIntroductiontoTextMining1CharuC.AggarwalandChengXiangZhai1.Introduction12.AlgorithmsforTextMining43.FutureDirections8References102InformationExtractionfromText11JingJiang1.Introduction112.NamedEntityRecognition152.1Rule-basedApproach162.2StatisticalLearningApproach173.RelationExtraction223.1Feature-basedClassification233.2KernelMethods263.3WeaklySupervisedLearningMethods294.UnsupervisedInformationExtraction304.1RelationDiscoveryandTemplateInduction314.2OpenInformationExtraction325.Evaluation336.ConclusionsandSummary34References353ASurveyofTextSummarizationTechniques43AniNenkovaandKathleenMcKeown1.HowdoExtractiveSummarizersWork?442.TopicRepresentationApproaches462.1TopicWords462.2Frequency-drivenApproaches482.3LatentSemanticAnalysis522.4BayesianTopicModels532.5SentenceClusteringandDomain-dependentTopics553.InfluenceofContext563.1WebSummarization573.2SummarizationofScientificArticles58v
for Summarization 2. Feature Selection and Transformation Methods for Text Clustering 81 5 Dimensionality Reduction and Topic Modeling Steven P. Crain, Ke Zhou, Shuang-Hong Yang and Hongyuan Zha 1. 2. 3. 4. Introduction 1.1 The Relationship Between Clustering, Dimension Reduction and Topic Modeling Notation and Concepts The Procedure of Latent Semantic Indexing Implementation Issues Analysis 1.2 Latent Semantic Indexing 2.1 2.2 2.3 Topic Models and Dimension Reduction 3.1 3.2 Interpretation and Evaluation Probabilistic Latent Semantic Indexing Latent Dirichlet Allocation 129 130 131 132 133 134 135 137 139 140 142 148 viMININGTEXTDATA3.3Query-focusedSummarization583.4EmailSummarization594.IndicatorRepresentationsandMachineLearning4.1GraphMethodsforSentenceImportance604.2MachineLearningforSummarization625.SelectingSummarySentences645.1GreedyApproaches:MaximalMarginalRelevance645.2GlobalSummarySelection656.Conclusion66References664ASurveyofTextClusteringAlgorithms77CharuC.AggarwalandChengXiangZhai1.Introduction772.1FeatureSelectionMethods812.2LSI-basedMethods842.3Non-negativeMatrixFactorization863.Distance-basedClusteringAlgorithms893.1AgglomerativeandHierarchicalClusteringAlgorithms903.2Distance-basedPartitioningAlgorithms923.3AHybridApproach:TheScatter-GatherMethod944.WordandPhrase-basedClustering994.1ClusteringwithFrequentWordPatterns1004.2LeveragingWordClustersforDocumentClusters1024.3Co-clusteringWordsandDocuments1034.4ClusteringwithFrequentPhrases1055.ProbabilisticDocumentClusteringandTopicModels1076.OnlineClusteringwithTextStreams1107.ClusteringTextinNetworks1158.Semi-SupervisedClustering1189.ConclusionsandSummary120References12160
Contentsvii4.1Interpretation1484.2Evaluation1494.3ParameterSelection1504.4DimensionReduction1505.BeyondLatentDirichletAllocation1515.1Scalability1515.2DynamicData1515.3NetworkedData1525.4AdaptingTopicModelstoApplications1546.Conclusion155References1566ASurveyofTextClassificationAlgorithms163CharuC.AggarwalandChengXiangZhai1.Introduction1632.FeatureSelectionforTextClassification1672.1GiniIndex1682.2InformationGain1692.3MutualInformation1692.4χ2-Statistic1702.5FeatureTransformationMethods:SupervisedLSI1712.6SupervisedClusteringforDimensionalityReduction1722.7LinearDiscriminantAnalysis1732.8GeneralizedSingularValueDecomposition1752.9InteractionofFeatureSelectionwithClassification1753.DecisionTreeClassifiers1764.Rule-basedClassifiers1785.ProbabilisticandNaiveBayesClassifiers1815.1BernoulliMultivariateModel1835.2MultinomialDistribution1885.3MixtureModelingforTextClassification1906.LinearClassifiers1936.1SVMClassifiers1946.2Regression-BasedClassifiers1966.3NeuralNetworkClassifiers1976.4SomeObservationsaboutLinearClassifiers1997.Proximity-basedClassifiers2008.ClassificationofLinkedandWebData2039.Meta-AlgorithmsforTextClassification2099.1ClassifierEnsembleLearning2099.2DataCenteredMethods:BoostingandBagging2109.3OptimizingSpecificMeasuresofAccuracy21110.ConclusionsandSummary213References2137TransferLearningforTextMining223WeikePan,ErhengZhongandQiangYang1.Introduction2242.TransferLearninginTextClassification2252.1CrossDomainTextClassification225
分享到:
收藏