Contents
Preface
1 Introduction to Semi-Supervised Learning
1.1 Supervised, Unsupervised, and Semi-Supervised Learning
1.2 When Can Semi-Supervised Learning Work?
1.3 Classes of Algorithms and Organization of This Book
I Generative Models
2 A Taxonomy for Semi-Supervised Learning Methods
2.1 The Semi-Supervised Learning Problem
2.2 Paradigms for Semi-Supervised Learning
2.3 Examples
2.4 Conclusions
3 Semi-Supervised Text Classification Using EM
3.1 Introduction
3.2 A Generative Model for Text
3.3 Experimental Results with Basic EM
3.4 Using a More Expressive Generative Model
3.5 Overcoming the Challenges of Local Maxima
3.6 Conclusions and Summary
4 Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers
4.1 Do Unlabeled Data Improve or Degrade Classification Performance?
4.2 Understanding Unlabeled Data: Asymptotic Bias
4.3 The Asymptotic Analysis of Generative Semi-Supervised Learning
4.4 The Value of Labeled and Unlabeled Data
4.5 Finite Sample Effects
4.6 Model Search and Robustness
4.7 Conclusion
5 Probabilistic Semi-Supervised Clustering with Constraints
5.1 Introduction
5.2 HMRF Model for Semi-Supervised Clustering
5.3 HMRF-KMeans Algorithm
5.4 Active Learning for Constraint Acquisition
5.5 Experimental Results
5.6 Related Work
5.7 Conclusions
II Low-Density Separation
6 Transductive Support Vector Machines
6.1 Introduction
6.2 Transductive Support Vector Machines
6.3 Why Use Margin on the Test Set?
6.4 Experiments and Applications of TSVMs
6.5 Solving the TSVM Optimization Problem
6.6 Connection to Related Approaches
6.7 Summary and Conclusions
7 Semi-Supervised Learning Using Semi-Definite Programming
7.1 Relaxing SVM Transduction
7.2 An Approximation for Speedup
7.3 General Semi-Supervised Learning Settings
7.4 Empirical Results
7.5 Summary and Outlook
Appendix: The Extended Schur Complement Lemma
8 Gaussian Processes and the Null-Category Noise Model
8.1 Introduction
8.2 The Noise Model
8.3 Process Model and Effect of the Null-Category
8.4 Posterior Inference and Prediction
8.6 Discussion
9 Entropy Regularization
9.1 Introduction
9.2 Derivation of the Criterion
9.3 Optimization Algorithms
9.4 Related Methods
9.5 Experiments
Appendix: Proof of Theorem 9.1
10 Data-Dependent Regularization
10.1 Introduction
10.2 Information Regularization on Metric Spaces
10.3 Information Regularization and Relational Data
10.4 Discussion
III Graph-Based Methods
11 Label Propagation and Quadratic Criterion
11.1 Introduction
11.2 Label Propagation on a Similarity Graph
11.3 Quadratic Cost Criterion
11.4 From Transduction to Induction
11.5 Incorporating Class Prior Knowledge
11.6 Curse of Dimensionality for Semi-Supervised Learning
11.7 Discussion
Acknowledgments
12 The Geometric Basis of Semi-Supervised Learning
12.1 Introduction
12.2 Incorporating Geometry in Regularization
12.3 Algorithms
12.4 Data-Dependent Kernels for Semi-Supervised Learning
12.5 Linear Methods for Large-Scale Semi-Supervised Learning
12.6 Connections to Other Algorithms and Related Work
12.7 Future Directions
13 Discrete Regularization
13.1 Introduction
13.2 Discrete Analysis
13.3 Discrete Regularization
13.4 Conclusion
14 Semi-Supervised Learning with Conditional Harmonic Mixing
14.1 Introduction
14.2 Conditional Harmonic Mixing
14.3 Learning in CHM Models
14.4 Incorporating Prior Knowledge
14.5 Learning the Conditionals
14.6 Model Averaging
14.7 Experiments
14.8 Conclusions
Acknowledgments
IV Change of Representation
15 Graph Kernels by Spectral Transforms
15.1 The Graph Laplacian
15.2 Kernels by Spectral Transforms
15.3 Kernel Alignment
15.4 Optimizing Alignment Using QCQP for Semi-Supervised Learning
15.5 Semi-Supervised Kernels with Order Constraints
15.6 Experimental Results
15.7 Conclusion
Acknowledgment
16 Spectral Methods for Dimensionality Reduction
16.1 Introduction
16.2 Linear Methods
16.3 Graph-Based Methods
16.4 Kernel Methods
16.5 Discussion
17 Modifying Distances
17.1 Introduction
17.2 Estimating DBD Metrics
17.3 Computing DBD Metrics
17.4 Semi-Supervised Learning Using Density-Based Metrics
17.5 Conclusions and Future Work
Acknowledgements
V Semi-Supervised Learning in Practice
18 Large-Scale Algorithms
18.1 Introduction
18.2 Cost Approximations
18.3 Subset Selection
18.4 Discussion
19 Semi-Supervised Protein ClassificationUsing Cluster Kernels
19.1 Introduction
19.2 Representations and Kernels for Protein Sequences
19.3 Semi-Supervised Kernels for Protein Sequences
19.4 Experiments
19.5 Discussion
20 Prediction of Protein Function from Networks
20.1 Introduction
20.2 Graph-Based Semi-Supervised Learning
20.3 Combining Multiple Graphs
20.4 Experiments on Function Prediction of Proteins
20.5 Conclusion and Outlook
21 Analysis of Benchmarks
21.1 The Benchmark
21.2 Application of SSL Methods
21.3 Results and Discussion
VI Perspectives
22 An Augmented PAC Model for Semi-Supervised Learning
22.1 Introduction
22.2 A Formal Framework
22.3 Sample Complexity Results
22.4 Algorithmic Results
22.5 Related Models and Discussion
23 Metric-Based Approaches for Semi-Supervised Regression and Classification
23.1 Introduction
23.2 Metric Structure of Supervised Learning
23.3 Model Selection
23.4 Regularization
23.5 Classification
23.6 Conclusion
Acknowledgments
24 Transductive Inference and Semi-Supervised Learning
24.1 Problem Settings
24.2 Problem of Generalization in Inductive and Transductive Inference
24.3 Structure of the VC Bounds and Transductive Inference
24.4 The Symmetrization Lemma and Transductive Inference
24.5 Bounds for Transductive Inference
24.6 The Structural Risk Minimization Principle for Induction and Transduction
24.7 Combinatorics in Transductive Inference
24.8 Measures of the Size of Equivalence Classes
24.9 Algorithms for Inductive and Transductive SVMs
24.10 Semi-Supervised Learning
24.11 Conclusion: Transductive Inference and the New Problems of Inference
24.12 Beyond Transduction: Selective Inference
25 A Discussion of Semi-Supervised Learning and Transduction
References
Notation and Symbols
Contributors
Index