Semi-supervised learning.pdf

发布时间：2022-06-08 发布人：admin 分类：说明书资料大小：5.77M 资料格式：pdf 举报版权申诉

sqrt12-8147457-4744302543328998532.pdf-第1页.png

第1页 / 共524页

sqrt12-8147457-4744302543328998532.pdf-第2页.png

第2页 / 共524页

sqrt12-8147457-4744302543328998532.pdf-第3页.png

第3页 / 共524页

sqrt12-8147457-4744302543328998532.pdf-第4页.png

第4页 / 共524页

sqrt12-8147457-4744302543328998532.pdf-第5页.png

第5页 / 共524页

sqrt12-8147457-4744302543328998532.pdf-第6页.png

第6页 / 共524页

sqrt12-8147457-4744302543328998532.pdf-第7页.png

第7页 / 共524页

sqrt12-8147457-4744302543328998532.pdf-第8页.png

第8页 / 共524页

Contents

Preface

1 Introduction to Semi-Supervised Learning

1.1 Supervised, Unsupervised, and Semi-Supervised Learning

1.2 When Can Semi-Supervised Learning Work?

1.3 Classes of Algorithms and Organization of This Book

I Generative Models

2 A Taxonomy for Semi-Supervised Learning Methods

2.1 The Semi-Supervised Learning Problem

2.2 Paradigms for Semi-Supervised Learning

2.3 Examples

2.4 Conclusions

3 Semi-Supervised Text Classification Using EM

3.1 Introduction

3.2 A Generative Model for Text

3.3 Experimental Results with Basic EM

3.4 Using a More Expressive Generative Model

3.5 Overcoming the Challenges of Local Maxima

3.6 Conclusions and Summary

4 Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers

4.1 Do Unlabeled Data Improve or Degrade Classification Performance?

4.2 Understanding Unlabeled Data: Asymptotic Bias

4.3 The Asymptotic Analysis of Generative Semi-Supervised Learning

4.4 The Value of Labeled and Unlabeled Data

4.5 Finite Sample Effects

4.6 Model Search and Robustness

4.7 Conclusion

5 Probabilistic Semi-Supervised Clustering with Constraints

5.1 Introduction

5.2 HMRF Model for Semi-Supervised Clustering

5.3 HMRF-KMeans Algorithm

5.4 Active Learning for Constraint Acquisition

5.5 Experimental Results

5.6 Related Work

5.7 Conclusions

II Low-Density Separation

6 Transductive Support Vector Machines

6.1 Introduction

6.2 Transductive Support Vector Machines

6.3 Why Use Margin on the Test Set?

6.4 Experiments and Applications of TSVMs

6.5 Solving the TSVM Optimization Problem

6.6 Connection to Related Approaches

6.7 Summary and Conclusions

7 Semi-Supervised Learning Using Semi-Definite Programming

7.1 Relaxing SVM Transduction

7.2 An Approximation for Speedup

7.3 General Semi-Supervised Learning Settings

7.4 Empirical Results

7.5 Summary and Outlook

Appendix: The Extended Schur Complement Lemma

8 Gaussian Processes and the Null-Category Noise Model

8.1 Introduction

8.2 The Noise Model

8.3 Process Model and Effect of the Null-Category

8.4 Posterior Inference and Prediction

8.6 Discussion

9 Entropy Regularization

9.1 Introduction

9.2 Derivation of the Criterion

9.3 Optimization Algorithms

9.4 Related Methods

9.5 Experiments

Appendix: Proof of Theorem 9.1

10 Data-Dependent Regularization

10.1 Introduction

10.2 Information Regularization on Metric Spaces

10.3 Information Regularization and Relational Data

10.4 Discussion

III Graph-Based Methods

11 Label Propagation and Quadratic Criterion

11.1 Introduction

11.2 Label Propagation on a Similarity Graph

11.3 Quadratic Cost Criterion

11.4 From Transduction to Induction

11.5 Incorporating Class Prior Knowledge

11.6 Curse of Dimensionality for Semi-Supervised Learning

11.7 Discussion

Acknowledgments

12 The Geometric Basis of Semi-Supervised Learning

12.1 Introduction

12.2 Incorporating Geometry in Regularization

12.3 Algorithms

12.4 Data-Dependent Kernels for Semi-Supervised Learning

12.5 Linear Methods for Large-Scale Semi-Supervised Learning

12.6 Connections to Other Algorithms and Related Work

12.7 Future Directions

13 Discrete Regularization

13.1 Introduction

13.2 Discrete Analysis

13.3 Discrete Regularization

13.4 Conclusion

14 Semi-Supervised Learning with Conditional Harmonic Mixing

14.1 Introduction

14.2 Conditional Harmonic Mixing

14.3 Learning in CHM Models

14.4 Incorporating Prior Knowledge

14.5 Learning the Conditionals

14.6 Model Averaging

14.7 Experiments

14.8 Conclusions

Acknowledgments

IV Change of Representation

15 Graph Kernels by Spectral Transforms

15.1 The Graph Laplacian

15.2 Kernels by Spectral Transforms

15.3 Kernel Alignment

15.4 Optimizing Alignment Using QCQP for Semi-Supervised Learning

15.5 Semi-Supervised Kernels with Order Constraints

15.6 Experimental Results

15.7 Conclusion

Acknowledgment

16 Spectral Methods for Dimensionality Reduction

16.1 Introduction

16.2 Linear Methods

16.3 Graph-Based Methods

16.4 Kernel Methods

16.5 Discussion

17 Modifying Distances

17.1 Introduction

17.2 Estimating DBD Metrics

17.3 Computing DBD Metrics

17.4 Semi-Supervised Learning Using Density-Based Metrics

17.5 Conclusions and Future Work

Acknowledgements

V Semi-Supervised Learning in Practice

18 Large-Scale Algorithms

18.1 Introduction

18.2 Cost Approximations

18.3 Subset Selection

18.4 Discussion

19 Semi-Supervised Protein ClassificationUsing Cluster Kernels

19.1 Introduction

19.2 Representations and Kernels for Protein Sequences

19.3 Semi-Supervised Kernels for Protein Sequences

19.4 Experiments

19.5 Discussion

20 Prediction of Protein Function from Networks

20.1 Introduction

20.2 Graph-Based Semi-Supervised Learning

20.3 Combining Multiple Graphs

20.4 Experiments on Function Prediction of Proteins

20.5 Conclusion and Outlook

21 Analysis of Benchmarks

21.1 The Benchmark

21.2 Application of SSL Methods

21.3 Results and Discussion

VI Perspectives

22 An Augmented PAC Model for Semi-Supervised Learning

22.1 Introduction

22.2 A Formal Framework

22.3 Sample Complexity Results

22.4 Algorithmic Results

22.5 Related Models and Discussion

23 Metric-Based Approaches for Semi-Supervised Regression and Classification

23.1 Introduction

23.2 Metric Structure of Supervised Learning

23.3 Model Selection

23.4 Regularization

23.5 Classification

23.6 Conclusion

Acknowledgments

24 Transductive Inference and Semi-Supervised Learning

24.1 Problem Settings

24.2 Problem of Generalization in Inductive and Transductive Inference

24.3 Structure of the VC Bounds and Transductive Inference

24.4 The Symmetrization Lemma and Transductive Inference

24.5 Bounds for Transductive Inference

24.6 The Structural Risk Minimization Principle for Induction and Transduction

24.7 Combinatorics in Transductive Inference

24.8 Measures of the Size of Equivalence Classes

24.9 Algorithms for Inductive and Transductive SVMs

24.10 Semi-Supervised Learning

24.11 Conclusion: Transductive Inference and the New Problems of Inference

24.12 Beyond Transduction: Selective Inference

25 A Discussion of Semi-Supervised Learning and Transduction

References

Notation and Symbols

Contributors

Index

edited by Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien Semi-Supervised Learning

Semi-Supervised Learning

Adaptive Computation and Machine Learning Thomas Dietterich, Editor Christopher Bishop, David Heckerman, Michael Jordan, and Michael Kearns, As- sociate Editors Bioinformatics: The Machine Learning Approach, Pierre Baldi and Søren Brunak Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto Graphical Models for Machine Learning and Digital Communication, Brendan J. Frey Learning in Graphical Models, Michael I. Jordan Causation, Prediction, and Search, second edition, Peter Spirtes, Clark Glymour, and Richard Scheines Principles of Data Mining, David Hand, Heikki Mannila, and Padhraic Smyth Bioinformatics: The Machine Learning Approach, second edition, Pierre Baldi and Søren Brunak Learning Kernel Classiﬁers: Theory and Algorithms, Ralf Herbrich Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Bernhard Sch¨olkopf and Alexander J. Smola Introduction to Machine Learning, Ethem Alpaydin Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christo- pher K. I. Williams Semi-Supervised Learning, Olivier Chapelle, Bernhard Sch¨olkopf, and Alexander Zien

Semi-Supervised Learning Olivier Chapelle Bernhard Sch¨olkopf Alexander Zien The MIT Press Cambridge, Massachusetts London, England

c2006 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. Typeset by the authors using LATEX 2ε Printed and bound in the United States of America Library of Congress Cataloging-in-Publication Data Semi-supervised learning / edited by Olivier Chapelle, Bernhard Sch¨olkopf, Alexander Zien. p. cm. – (Adaptive computation and machine learning) Includes bibliographical references. ISBN 978-0-262-03358-9 (alk. paper) 1. Supervised learning (Machine learning) I. Chapelle, Olivier. II. Sch¨olkopf, Bernhard. III. Zien, Alexander. IV. Series. Q325.75.S42 2006 006.3’1–dc22 2006044448 10 9 8 7 6 5 4 3 2 1

Contents Series Foreword Preface 1 Introduction to Semi-Supervised Learning 1.1 Supervised, Unsupervised, and Semi-Supervised Learning . . . . . . 1.2 When Can Semi-Supervised Learning Work? . . . . . . . . . . . . . 1.3 Classes of Algorithms and Organization of This Book . . . . . . . . I Generative Models 2 A Taxonomy for Semi-Supervised Learning Methods Matthias Seeger 2.1 The Semi-Supervised Learning Problem . . . . . . . . . . . . . . . . 2.2 Paradigms for Semi-Supervised Learning . . . . . . . . . . . . . . . . 2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Semi-Supervised Text Classiﬁcation Using EM Kamal Nigam, Andrew McCallum, Tom Mitchell Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 3.2 A Generative Model for Text . . . . . . . . . . . . . . . . . . . . . . 3.3 Experimental Results with Basic EM . . . . . . . . . . . . . . . . . . 3.4 Using a More Expressive Generative Model . . . . . . . . . . . . . . 3.5 Overcoming the Challenges of Local Maxima . . . . . . . . . . . . . 3.6 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . 4 Risks of Semi-Supervised Learning Fabio Cozman, Ira Cohen 4.1 Do Unlabeled Data Improve or Degrade Classiﬁcation Performance? 4.2 Understanding Unlabeled Data: Asymptotic Bias . . . . . . . . . . . 4.3 The Asymptotic Analysis of Generative Semi-Supervised Learning . 4.4 The Value of Labeled and Unlabeled Data . . . . . . . . . . . . . . . 4.5 Finite Sample Eﬀects . . . . . . . . . . . . . . . . . . . . . . . . . . . xi xiii 1 1 4 8 13 15 15 17 22 31 33 33 35 41 43 49 54 57 57 59 63 67 69

vi Contents 4.6 Model Search and Robustness . . . . . . . . . . . . . . . . . . . . . . 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Probabilistic Semi-Supervised Clustering with Constraints 70 71 73 Sugato Basu, Mikhail Bilenko, Arindam Banerjee, Raymond Mooney 74 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 HMRF Model for Semi-Supervised Clustering . . . . . . . . . . . . . 81 5.3 HMRF-KMeans Algorithm . . . . . . . . . . . . . . . . . . . . . . 93 5.4 Active Learning for Constraint Acquisition . . . . . . . . . . . . . . 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 II Low-Density Separation 6 Transductive Support Vector Machines 103 105 Thorsten Joachims Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.1 6.2 Transductive Support Vector Machines . . . . . . . . . . . . . . . . . 108 6.3 Why Use Margin on the Test Set? . . . . . . . . . . . . . . . . . . . 111 6.4 Experiments and Applications of TSVMs . . . . . . . . . . . . . . . 112 6.5 Solving the TSVM Optimization Problem . . . . . . . . . . . . . . . 114 6.6 Connection to Related Approaches . . . . . . . . . . . . . . . . . . . 116 6.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 116 7 Semi-Supervised Learning Using Semi-Deﬁnite Programming 119 Tijl De Bie, Nello Cristianini 7.1 Relaxing SVM Transduction . . . . . . . . . . . . . . . . . . . . . . . 119 7.2 An Approximation for Speedup . . . . . . . . . . . . . . . . . . . . . 126 7.3 General Semi-Supervised Learning Settings . . . . . . . . . . . . . . 128 7.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.5 Summary and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Appendix: The Extended Schur Complement Lemma . . . . . . . . . 134 8 Gaussian Processes and the Null-Category Noise Model 137 Neil D. Lawrence, Michael I. Jordan 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.2 The Noise Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.3 Process Model and Eﬀect of the Null-Category . . . . . . . . . . . . 143 8.4 Posterior Inference and Prediction . . . . . . . . . . . . . . . . . . . 145 8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9 Entropy Regularization 151

Contents vii Yves Grandvalet, Yoshua Bengio 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.2 Derivation of the Criterion . . . . . . . . . . . . . . . . . . . . . . . . 152 9.3 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 155 9.4 Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 9.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 9.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Appendix: Proof of Theorem 9.1 . . . . . . . . . . . . . . . . . . . . 166 10 Data-Dependent Regularization Adrian Corduneanu, Tommi Jaakkola 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 10.2 Information Regularization on Metric Spaces . . . . . . . . . . . . . 174 10.3 Information Regularization and Relational Data . . . . . . . . . . . . 182 10.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 169 III Graph-Based Methods 191 11 Label Propagation and Quadratic Criterion Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 11.2 Label Propagation on a Similarity Graph . . . . . . . . . . . . . . . 194 11.3 Quadratic Cost Criterion . . . . . . . . . . . . . . . . . . . . . . . . 198 11.4 From Transduction to Induction . . . . . . . . . . . . . . . . . . . . 205 11.5 Incorporating Class Prior Knowledge . . . . . . . . . . . . . . . . . . 205 11.6 Curse of Dimensionality for Semi-Supervised Learning . . . . . . . . 206 11.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 193 12 The Geometric Basis of Semi-Supervised Learning 217 Vikas Sindhwani, Misha Belkin, Partha Niyogi 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 12.2 Incorporating Geometry in Regularization . . . . . . . . . . . . . . . 220 12.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 12.4 Data-Dependent Kernels for Semi-Supervised Learning . . . . . . . . 229 12.5 Linear Methods for Large-Scale Semi-Supervised Learning . . . . . . 231 12.6 Connections to Other Algorithms and Related Work . . . . . . . . . 232 12.7 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 13 Discrete Regularization 237 Dengyong Zhou, Bernhard Sch¨olkopf 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 13.2 Discrete Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 13.3 Discrete Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 245 13.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

分享到：

赞收藏

资料库

Semi-supervised learning.pdf

相关推荐

课程资源

热门标签

最新资料