logo资料库

Semi-supervised learning.pdf

第1页 / 共524页
第2页 / 共524页
第3页 / 共524页
第4页 / 共524页
第5页 / 共524页
第6页 / 共524页
第7页 / 共524页
第8页 / 共524页
资料共524页,剩余部分请下载后查看
Contents
Preface
1 Introduction to Semi-Supervised Learning
1.1 Supervised, Unsupervised, and Semi-Supervised Learning
1.2 When Can Semi-Supervised Learning Work?
1.3 Classes of Algorithms and Organization of This Book
I Generative Models
2 A Taxonomy for Semi-Supervised Learning Methods
2.1 The Semi-Supervised Learning Problem
2.2 Paradigms for Semi-Supervised Learning
2.3 Examples
2.4 Conclusions
3 Semi-Supervised Text Classification Using EM
3.1 Introduction
3.2 A Generative Model for Text
3.3 Experimental Results with Basic EM
3.4 Using a More Expressive Generative Model
3.5 Overcoming the Challenges of Local Maxima
3.6 Conclusions and Summary
4 Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers
4.1 Do Unlabeled Data Improve or Degrade Classification Performance?
4.2 Understanding Unlabeled Data: Asymptotic Bias
4.3 The Asymptotic Analysis of Generative Semi-Supervised Learning
4.4 The Value of Labeled and Unlabeled Data
4.5 Finite Sample Effects
4.6 Model Search and Robustness
4.7 Conclusion
5 Probabilistic Semi-Supervised Clustering with Constraints
5.1 Introduction
5.2 HMRF Model for Semi-Supervised Clustering
5.3 HMRF-KMeans Algorithm
5.4 Active Learning for Constraint Acquisition
5.5 Experimental Results
5.6 Related Work
5.7 Conclusions
II Low-Density Separation
6 Transductive Support Vector Machines
6.1 Introduction
6.2 Transductive Support Vector Machines
6.3 Why Use Margin on the Test Set?
6.4 Experiments and Applications of TSVMs
6.5 Solving the TSVM Optimization Problem
6.6 Connection to Related Approaches
6.7 Summary and Conclusions
7 Semi-Supervised Learning Using Semi-Definite Programming
7.1 Relaxing SVM Transduction
7.2 An Approximation for Speedup
7.3 General Semi-Supervised Learning Settings
7.4 Empirical Results
7.5 Summary and Outlook
Appendix: The Extended Schur Complement Lemma
8 Gaussian Processes and the Null-Category Noise Model
8.1 Introduction
8.2 The Noise Model
8.3 Process Model and Effect of the Null-Category
8.4 Posterior Inference and Prediction
8.6 Discussion
9 Entropy Regularization
9.1 Introduction
9.2 Derivation of the Criterion
9.3 Optimization Algorithms
9.4 Related Methods
9.5 Experiments
Appendix: Proof of Theorem 9.1
10 Data-Dependent Regularization
10.1 Introduction
10.2 Information Regularization on Metric Spaces
10.3 Information Regularization and Relational Data
10.4 Discussion
III Graph-Based Methods
11 Label Propagation and Quadratic Criterion
11.1 Introduction
11.2 Label Propagation on a Similarity Graph
11.3 Quadratic Cost Criterion
11.4 From Transduction to Induction
11.5 Incorporating Class Prior Knowledge
11.6 Curse of Dimensionality for Semi-Supervised Learning
11.7 Discussion
Acknowledgments
12 The Geometric Basis of Semi-Supervised Learning
12.1 Introduction
12.2 Incorporating Geometry in Regularization
12.3 Algorithms
12.4 Data-Dependent Kernels for Semi-Supervised Learning
12.5 Linear Methods for Large-Scale Semi-Supervised Learning
12.6 Connections to Other Algorithms and Related Work
12.7 Future Directions
13 Discrete Regularization
13.1 Introduction
13.2 Discrete Analysis
13.3 Discrete Regularization
13.4 Conclusion
14 Semi-Supervised Learning with Conditional Harmonic Mixing
14.1 Introduction
14.2 Conditional Harmonic Mixing
14.3 Learning in CHM Models
14.4 Incorporating Prior Knowledge
14.5 Learning the Conditionals
14.6 Model Averaging
14.7 Experiments
14.8 Conclusions
Acknowledgments
IV Change of Representation
15 Graph Kernels by Spectral Transforms
15.1 The Graph Laplacian
15.2 Kernels by Spectral Transforms
15.3 Kernel Alignment
15.4 Optimizing Alignment Using QCQP for Semi-Supervised Learning
15.5 Semi-Supervised Kernels with Order Constraints
15.6 Experimental Results
15.7 Conclusion
Acknowledgment
16 Spectral Methods for Dimensionality Reduction
16.1 Introduction
16.2 Linear Methods
16.3 Graph-Based Methods
16.4 Kernel Methods
16.5 Discussion
17 Modifying Distances
17.1 Introduction
17.2 Estimating DBD Metrics
17.3 Computing DBD Metrics
17.4 Semi-Supervised Learning Using Density-Based Metrics
17.5 Conclusions and Future Work
Acknowledgements
V Semi-Supervised Learning in Practice
18 Large-Scale Algorithms
18.1 Introduction
18.2 Cost Approximations
18.3 Subset Selection
18.4 Discussion
19 Semi-Supervised Protein ClassificationUsing Cluster Kernels
19.1 Introduction
19.2 Representations and Kernels for Protein Sequences
19.3 Semi-Supervised Kernels for Protein Sequences
19.4 Experiments
19.5 Discussion
20 Prediction of Protein Function from Networks
20.1 Introduction
20.2 Graph-Based Semi-Supervised Learning
20.3 Combining Multiple Graphs
20.4 Experiments on Function Prediction of Proteins
20.5 Conclusion and Outlook
21 Analysis of Benchmarks
21.1 The Benchmark
21.2 Application of SSL Methods
21.3 Results and Discussion
VI Perspectives
22 An Augmented PAC Model for Semi-Supervised Learning
22.1 Introduction
22.2 A Formal Framework
22.3 Sample Complexity Results
22.4 Algorithmic Results
22.5 Related Models and Discussion
23 Metric-Based Approaches for Semi-Supervised Regression and Classification
23.1 Introduction
23.2 Metric Structure of Supervised Learning
23.3 Model Selection
23.4 Regularization
23.5 Classification
23.6 Conclusion
Acknowledgments
24 Transductive Inference and Semi-Supervised Learning
24.1 Problem Settings
24.2 Problem of Generalization in Inductive and Transductive Inference
24.3 Structure of the VC Bounds and Transductive Inference
24.4 The Symmetrization Lemma and Transductive Inference
24.5 Bounds for Transductive Inference
24.6 The Structural Risk Minimization Principle for Induction and Transduction
24.7 Combinatorics in Transductive Inference
24.8 Measures of the Size of Equivalence Classes
24.9 Algorithms for Inductive and Transductive SVMs
24.10 Semi-Supervised Learning
24.11 Conclusion: Transductive Inference and the New Problems of Inference
24.12 Beyond Transduction: Selective Inference
25 A Discussion of Semi-Supervised Learning and Transduction
References
Notation and Symbols
Contributors
Index
edited by Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien Semi-Supervised Learning
Semi-Supervised Learning
Adaptive Computation and Machine Learning Thomas Dietterich, Editor Christopher Bishop, David Heckerman, Michael Jordan, and Michael Kearns, As- sociate Editors Bioinformatics: The Machine Learning Approach, Pierre Baldi and Søren Brunak Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto Graphical Models for Machine Learning and Digital Communication, Brendan J. Frey Learning in Graphical Models, Michael I. Jordan Causation, Prediction, and Search, second edition, Peter Spirtes, Clark Glymour, and Richard Scheines Principles of Data Mining, David Hand, Heikki Mannila, and Padhraic Smyth Bioinformatics: The Machine Learning Approach, second edition, Pierre Baldi and Søren Brunak Learning Kernel Classifiers: Theory and Algorithms, Ralf Herbrich Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Bernhard Sch¨olkopf and Alexander J. Smola Introduction to Machine Learning, Ethem Alpaydin Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christo- pher K. I. Williams Semi-Supervised Learning, Olivier Chapelle, Bernhard Sch¨olkopf, and Alexander Zien
Semi-Supervised Learning Olivier Chapelle Bernhard Sch¨olkopf Alexander Zien The MIT Press Cambridge, Massachusetts London, England
c2006 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. Typeset by the authors using LATEX 2ε Printed and bound in the United States of America Library of Congress Cataloging-in-Publication Data Semi-supervised learning / edited by Olivier Chapelle, Bernhard Sch¨olkopf, Alexander Zien. p. cm. – (Adaptive computation and machine learning) Includes bibliographical references. ISBN 978-0-262-03358-9 (alk. paper) 1. Supervised learning (Machine learning) I. Chapelle, Olivier. II. Sch¨olkopf, Bernhard. III. Zien, Alexander. IV. Series. Q325.75.S42 2006 006.3’1–dc22 2006044448 10 9 8 7 6 5 4 3 2 1
Contents Series Foreword Preface 1 Introduction to Semi-Supervised Learning 1.1 Supervised, Unsupervised, and Semi-Supervised Learning . . . . . . 1.2 When Can Semi-Supervised Learning Work? . . . . . . . . . . . . . 1.3 Classes of Algorithms and Organization of This Book . . . . . . . . I Generative Models 2 A Taxonomy for Semi-Supervised Learning Methods Matthias Seeger 2.1 The Semi-Supervised Learning Problem . . . . . . . . . . . . . . . . 2.2 Paradigms for Semi-Supervised Learning . . . . . . . . . . . . . . . . 2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Semi-Supervised Text Classification Using EM Kamal Nigam, Andrew McCallum, Tom Mitchell Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 3.2 A Generative Model for Text . . . . . . . . . . . . . . . . . . . . . . 3.3 Experimental Results with Basic EM . . . . . . . . . . . . . . . . . . 3.4 Using a More Expressive Generative Model . . . . . . . . . . . . . . 3.5 Overcoming the Challenges of Local Maxima . . . . . . . . . . . . . 3.6 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . 4 Risks of Semi-Supervised Learning Fabio Cozman, Ira Cohen 4.1 Do Unlabeled Data Improve or Degrade Classification Performance? 4.2 Understanding Unlabeled Data: Asymptotic Bias . . . . . . . . . . . 4.3 The Asymptotic Analysis of Generative Semi-Supervised Learning . 4.4 The Value of Labeled and Unlabeled Data . . . . . . . . . . . . . . . 4.5 Finite Sample Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . xi xiii 1 1 4 8 13 15 15 17 22 31 33 33 35 41 43 49 54 57 57 59 63 67 69
vi Contents 4.6 Model Search and Robustness . . . . . . . . . . . . . . . . . . . . . . 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Probabilistic Semi-Supervised Clustering with Constraints 70 71 73 Sugato Basu, Mikhail Bilenko, Arindam Banerjee, Raymond Mooney 74 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 HMRF Model for Semi-Supervised Clustering . . . . . . . . . . . . . 81 5.3 HMRF-KMeans Algorithm . . . . . . . . . . . . . . . . . . . . . . 93 5.4 Active Learning for Constraint Acquisition . . . . . . . . . . . . . . 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 II Low-Density Separation 6 Transductive Support Vector Machines 103 105 Thorsten Joachims Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.1 6.2 Transductive Support Vector Machines . . . . . . . . . . . . . . . . . 108 6.3 Why Use Margin on the Test Set? . . . . . . . . . . . . . . . . . . . 111 6.4 Experiments and Applications of TSVMs . . . . . . . . . . . . . . . 112 6.5 Solving the TSVM Optimization Problem . . . . . . . . . . . . . . . 114 6.6 Connection to Related Approaches . . . . . . . . . . . . . . . . . . . 116 6.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 116 7 Semi-Supervised Learning Using Semi-Definite Programming 119 Tijl De Bie, Nello Cristianini 7.1 Relaxing SVM Transduction . . . . . . . . . . . . . . . . . . . . . . . 119 7.2 An Approximation for Speedup . . . . . . . . . . . . . . . . . . . . . 126 7.3 General Semi-Supervised Learning Settings . . . . . . . . . . . . . . 128 7.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.5 Summary and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Appendix: The Extended Schur Complement Lemma . . . . . . . . . 134 8 Gaussian Processes and the Null-Category Noise Model 137 Neil D. Lawrence, Michael I. Jordan 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.2 The Noise Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.3 Process Model and Effect of the Null-Category . . . . . . . . . . . . 143 8.4 Posterior Inference and Prediction . . . . . . . . . . . . . . . . . . . 145 8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9 Entropy Regularization 151
Contents vii Yves Grandvalet, Yoshua Bengio 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.2 Derivation of the Criterion . . . . . . . . . . . . . . . . . . . . . . . . 152 9.3 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 155 9.4 Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 9.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 9.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Appendix: Proof of Theorem 9.1 . . . . . . . . . . . . . . . . . . . . 166 10 Data-Dependent Regularization Adrian Corduneanu, Tommi Jaakkola 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 10.2 Information Regularization on Metric Spaces . . . . . . . . . . . . . 174 10.3 Information Regularization and Relational Data . . . . . . . . . . . . 182 10.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 169 III Graph-Based Methods 191 11 Label Propagation and Quadratic Criterion Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 11.2 Label Propagation on a Similarity Graph . . . . . . . . . . . . . . . 194 11.3 Quadratic Cost Criterion . . . . . . . . . . . . . . . . . . . . . . . . 198 11.4 From Transduction to Induction . . . . . . . . . . . . . . . . . . . . 205 11.5 Incorporating Class Prior Knowledge . . . . . . . . . . . . . . . . . . 205 11.6 Curse of Dimensionality for Semi-Supervised Learning . . . . . . . . 206 11.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 193 12 The Geometric Basis of Semi-Supervised Learning 217 Vikas Sindhwani, Misha Belkin, Partha Niyogi 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 12.2 Incorporating Geometry in Regularization . . . . . . . . . . . . . . . 220 12.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 12.4 Data-Dependent Kernels for Semi-Supervised Learning . . . . . . . . 229 12.5 Linear Methods for Large-Scale Semi-Supervised Learning . . . . . . 231 12.6 Connections to Other Algorithms and Related Work . . . . . . . . . 232 12.7 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 13 Discrete Regularization 237 Dengyong Zhou, Bernhard Sch¨olkopf 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 13.2 Discrete Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 13.3 Discrete Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 245 13.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
分享到:
收藏