I Background
Introduction
The scarcity of labels
Semi-supervised learning
Overview and contributions
Overview
Contributions
Assumptions and approaches that enable semi-supervised learning
Assumptions required for semi-supervised learning
Smoothness assumption
Cluster assumption
Low-density separation
Existence of a discoverable manifold
Classes of semi-supervised learning algorithms
Methods based on generative models
Methods based on low-density separation
Graph-based methods
Methods based on a change of representation
II Related Work
Review of Semi-Supervised Learning Literature
Semi-supervised learning before the deep learning era
Semi-supervised deep learning
Autoencoder-based approaches
Regularisation and data augmentation-based approaches
Other approaches
Semi-supervised generative adversarial networks
Generative adversarial networks
Approaches by which generative adversarial networks can be used for semi-supervised learning
Model focus: Good Semi-Supervised Learning that Requires a Bad GAN
Shannon's entropy and its relation to decision boundaries
CatGAN
Improved GAN
The Improved GAN SSL Model
BadGAN
Implications of the BadGAN model
III Analysis and experiments
Enforcing low-density separation
Approaches to low-density separation based on entropy
Synthetic experiments used in this section
Using entropy to incorporate the low-density separation assumption into our model
Taking advantage of a known prior class distribution
Generating data in low-density regions of the input space
Viewing entropy maximisation as minimising a Kullback-Leibler divergence
Theoretical evaluation of different entropy-related loss functions
Similarity of Improved GAN and CatGAN approaches
The CatGAN and Reverse KL approaches may be `forgetful'
Another approach: removing the K+1th class' constraint in the Improved GAN formulation
Summary
Experiments with alternative loss functions on synthetic datasets
Research questions and hypotheses
Which loss function formulation is best?
Can the PixelCNN++ model be replaced by some other density estimate?
Discriminator from a pre-trained generative adversarial network
Pre-trained denoising autoencoder
Do the generated examples actually contribute to feature learning?
Is VAT or InfoReg really complementary to BadGAN?
Experiments
PI-MNIST-100
Experimental setup
Effectiveness of different proxies for entropy
Potential for replacing PixelCNN++ model with a DAE or discriminator from a GAN
Extent to which generated images contribute to feature learning
Complementarity of VAT and BadGAN
SVHN-1k
Experimental setup
Effectiveness of different proxies for entropy
Potential for replacing PixelCNN++ model with a DAE or discriminator from a GAN
Hypotheses as to why our BadGAN implementation does not perform well on SVHN-1k
Experiments summary
Conclusions, practical recommendations and suggestions for future work
IV Appendices
Information regularisation for neural networks
Derivation
Intuition and experiments on synthetic datasets
FastInfoReg: overcoming InfoReg's speed issues
Performance on PI-MNIST-100
Viewing entropy minimisation as a KL divergence minimisation problem
References