半监督生成对抗网络综述.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：9.72M 资料格式：pdf 举报版权申诉

qikaihuting-10540476-4744302543392440093.pdf-第1页.png

第1页 / 共80页

qikaihuting-10540476-4744302543392440093.pdf-第2页.png

第2页 / 共80页

qikaihuting-10540476-4744302543392440093.pdf-第3页.png

第3页 / 共80页

qikaihuting-10540476-4744302543392440093.pdf-第4页.png

第4页 / 共80页

qikaihuting-10540476-4744302543392440093.pdf-第5页.png

第5页 / 共80页

qikaihuting-10540476-4744302543392440093.pdf-第6页.png

第6页 / 共80页

qikaihuting-10540476-4744302543392440093.pdf-第7页.png

第7页 / 共80页

qikaihuting-10540476-4744302543392440093.pdf-第8页.png

第8页 / 共80页

I Background

Introduction

The scarcity of labels

Semi-supervised learning

Overview and contributions

Overview

Contributions

Assumptions and approaches that enable semi-supervised learning

Assumptions required for semi-supervised learning

Smoothness assumption

Cluster assumption

Low-density separation

Existence of a discoverable manifold

Classes of semi-supervised learning algorithms

Methods based on generative models

Methods based on low-density separation

Graph-based methods

Methods based on a change of representation

II Related Work

Review of Semi-Supervised Learning Literature

Semi-supervised learning before the deep learning era

Semi-supervised deep learning

Autoencoder-based approaches

Regularisation and data augmentation-based approaches

Other approaches

Semi-supervised generative adversarial networks

Generative adversarial networks

Approaches by which generative adversarial networks can be used for semi-supervised learning

Model focus: Good Semi-Supervised Learning that Requires a Bad GAN

Shannon's entropy and its relation to decision boundaries

CatGAN

Improved GAN

The Improved GAN SSL Model

BadGAN

Implications of the BadGAN model

III Analysis and experiments

Enforcing low-density separation

Approaches to low-density separation based on entropy

Synthetic experiments used in this section

Using entropy to incorporate the low-density separation assumption into our model

Taking advantage of a known prior class distribution

Generating data in low-density regions of the input space

Viewing entropy maximisation as minimising a Kullback-Leibler divergence

Theoretical evaluation of different entropy-related loss functions

Similarity of Improved GAN and CatGAN approaches

The CatGAN and Reverse KL approaches may be `forgetful'

Another approach: removing the K+1th class' constraint in the Improved GAN formulation

Summary

Experiments with alternative loss functions on synthetic datasets

Research questions and hypotheses

Which loss function formulation is best?

Can the PixelCNN++ model be replaced by some other density estimate?

Discriminator from a pre-trained generative adversarial network

Pre-trained denoising autoencoder

Do the generated examples actually contribute to feature learning?

Is VAT or InfoReg really complementary to BadGAN?

Experiments

PI-MNIST-100

Experimental setup

Effectiveness of different proxies for entropy

Potential for replacing PixelCNN++ model with a DAE or discriminator from a GAN

Extent to which generated images contribute to feature learning

Complementarity of VAT and BadGAN

SVHN-1k

Experimental setup

Effectiveness of different proxies for entropy

Potential for replacing PixelCNN++ model with a DAE or discriminator from a GAN

Hypotheses as to why our BadGAN implementation does not perform well on SVHN-1k

Experiments summary

Conclusions, practical recommendations and suggestions for future work

IV Appendices

Information regularisation for neural networks

Derivation

Intuition and experiments on synthetic datasets

FastInfoReg: overcoming InfoReg's speed issues

Performance on PI-MNIST-100

Viewing entropy minimisation as a KL divergence minimisation problem

References

MSc Artificial Intelligence Master Thesis Semi-Supervised Learning with Generative Adversarial Networks by Liam Schoneveld 11139013 September 1, 2017 36 ECTS February – September, 2017 Supervisor: Prof. Dr. M. Welling Daily Supervisor: T. Cohen MSc Assessor: Dr. E. Gavves Faculteit der Natuurkunde, Wiskunde en Informatica

Abstract As society continues to accumulate more and more data, demand for machine learning algorithms that can learn from data with limited human intervention only increases. Semi- supervised learning (SSL) methods, which extend supervised learning algorithms by enabling them to use unlabeled data, play an important role in addressing this challenge. In this thesis, a framework unifying the traditional assumptions and approaches to SSL is deﬁned. A synthesis of SSL literature then places a range of contemporary approaches into this common framework. Our focus is on methods which use generative adversarial networks (GANs) to perform SSL. We analyse in detail one particular GAN-based SSL approach [Dai et al. (2017)]. This is shown to be closely related to two preceding approaches. Through synthetic experiments we provide an intuitive understanding and motivate the formulation of our focus approach. We then theoretically analyse potential alternative formulations of its loss function. This analysis motivates a number of research questions that centre on possible improvements to, and experiments to better understand the focus model. While we ﬁnd support for our hypotheses, our conclusion more broadly is that the focus method is not especially robust. 1

Acknowledgements I would like to thank Taco Cohen for supervising my thesis. Despite his busy schedule, Taco was able to provide me with invaluable feedback throughout the course of this project. I am also extremely grateful to Auke Wiggers for his mentoring, discussions and guidance. He really helped me to think about these problems in a more eﬀective way. Tijmen and the rest of the Scyfer team deserve a special mention for providing a working environment that was fun but also set the bar high. My gratitude also goes out to my committee of Max Welling and Efstratios Gavves for agreeing to read and assess my work among their demanding schedules. I gratefully acknowledge Zihang Dai, author of the paper which is the central focus of this thesis, for his timely and insightful correspondence via email. Finally I would like to thank my parents, brother, and my grandpa. 2

Contents I Background 1 Introduction 1.1 1.2 The scarcity of labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Semi-supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Overview and contributions 2.1 2.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 5 7 7 7 3 Assumptions and approaches that enable semi-supervised learning 8 8 Assumptions required for semi-supervised learning . . . . . . . . . . . . . . 8 3.1.1 Smoothness assumption . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 Cluster assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . Low-density separation . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 9 Existence of a discoverable manifold . . . . . . . . . . . . . . . . . . 10 3.1.4 . . . . . . . . . . . . . . . . 10 3.2.1 Methods based on generative models . . . . . . . . . . . . . . . . . . 10 3.2.2 Methods based on low-density separation . . . . . . . . . . . . . . . . 11 3.2.3 Graph-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.4 Methods based on a change of representation . . . . . . . . . . . . . 11 Classes of semi-supervised learning algorithms 3.1 3.2 4.1 4.2 4.3 II Related Work 12 4 Review of Semi-Supervised Learning Literature 12 Semi-supervised learning before the deep learning era . . . . . . . . . . . . 12 Semi-supervised deep learning . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2.1 Autoencoder-based approaches . . . . . . . . . . . . . . . . . . . . . 15 4.2.2 Regularisation and data augmentation-based approaches . . . . . . . 17 4.2.3 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Semi-supervised generative adversarial networks . . . . . . . . . . . . . . . 21 4.3.1 Generative adversarial networks . . . . . . . . . . . . . . . . . . . . . 22 4.3.2 Approaches by which generative adversarial networks can be used for semi-supervised learning . . . . . . . . . . . . . . . . . . . . . . . 22 5.1 5.2 5.3 5 Model focus: Good Semi-Supervised Learning that Requires a Bad GAN 26 Shannon’s entropy and its relation to decision boundaries . . . . . . . . . . 26 CatGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Improved GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3.1 The Improved GAN SSL Model . . . . . . . . . . . . . . . . . . . . . 29 BadGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Implications of the BadGAN model . . . . . . . . . . . . . . . . . . . 31 5.4.1 5.4 III Analysis and experiments 33 6 Enforcing low-density separation 33 Approaches to low-density separation based on entropy . . . . . . . . . . . 33 Synthetic experiments used in this section . . . . . . . . . . . . . . . 33 6.1.1 6.1 3

6.1.2 Using entropy to incorporate the low-density separation assumption into our model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.1.3 Taking advantage of a known prior class distribution . . . . . . . . . 36 . . . . . . 38 6.1.4 Generating data in low-density regions of the input space Viewing entropy maximisation as minimising a Kullback-Leibler divergence 38 Theoretical evaluation of diﬀerent entropy-related loss functions . . . . . . 40 6.3.1 Similarity of Improved GAN and CatGAN approaches . . . . . . . . 40 6.3.2 The CatGAN and Reverse KL approaches may be ‘forgetful’ . . . . . 43 6.3.3 Another approach: removing the K+1th class’ constraint in the Im- 6.2 6.3 6.3.4 proved GAN formulation . . . . . . . . . . . . . . . . . . . . . . . . . 45 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Experiments with alternative loss functions on synthetic datasets . . . . . . 46 6.4 7 Research questions and hypotheses 48 RQ7.1 Which loss function formulation is best? . . . . . . . . . . . . . . . . . . . . 48 RQ7.2 Can the PixelCNN++ model be replaced by some other density estimate? 48 7.2.1 Discriminator from a pre-trained generative adversarial network . . . 48 . . . . . . . . . . . . . . . . . . . 48 7.2.2 RQ7.3 Do the generated examples actually contribute to feature learning? . . . . 49 RQ7.4 Is VAT or InfoReg really complementary to BadGAN? . . . . . . . . . . . . 49 Pre-trained denoising autoencoder 8 Experiments 8.1 8.2 8.1.1 8.1.2 8.1.3 51 PI-MNIST-100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Eﬀectiveness of diﬀerent proxies for entropy . . . . . . . . . . . . . . 52 Potential for replacing PixelCNN++ model with a DAE or discrim- inator from a GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 8.1.4 Extent to which generated images contribute to feature learning . . . 60 8.1.5 Complementarity of VAT and BadGAN . . . . . . . . . . . . . . . . 60 SVHN-1k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Eﬀectiveness of diﬀerent proxies for entropy . . . . . . . . . . . . . . 62 Potential for replacing PixelCNN++ model with a DAE or discrim- inator from a GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 8.2.1 8.2.2 8.2.3 8.2.4 Hypotheses as to why our BadGAN implementation does not perform well on SVHN-1k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Experiments summary . . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.2.5 9 Conclusions, practical recommendations and suggestions for future work 67 IV Appendices 69 A Information regularisation for neural networks 69 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 . . . . . . . . . . . . . . . 70 Intuition and experiments on synthetic datasets FastInfoReg: overcoming InfoReg’s speed issues . . . . . . . . . . . . . . . 71 Performance on PI-MNIST-100 . . . . . . . . . . . . . . . . . . . . . . . . . 71 A.1 A.2 A.3 A.4 B Viewing entropy minimisation as a KL divergence minimisation problem 73 References 75 4

Part I Background 1 Introduction 1.1 The scarcity of labels “If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don’t know how to make the cake.” Yann LeCun (2016) Today, data is of ever-increasing abundance. It is estimated that 4.4 zettabytes of digital data existed in 2013, and this is set to rise to 44 zettabytes (i.e., 44 trillion gigabytes) by 2020 [International Data Corporation (2014)]. Such a vast store of information presents substantial opportunities to harness. Machine learning provides us with ways to make the most of these opportunities, through algorithms that essentially enable computers to learn from data. One signiﬁcant drawback of many machine learning approaches however, is that they require annotated data. Annotations can take diﬀerent forms and be used in diﬀerent ways. Most commonly though, the annotations required by machine learning algorithms consist of the desired targets or outputs our trained model should produce after being shown the corresponding input. It is relatively uncommon that data found ‘in the wild’ comes with ready-made anno- tations. There are some exceptions, such as captions added to images by online content creators, but even these may not be appropriate for the particular objective. Manually an- notating data is usually time-consuming and costly. Hence, as Yann LeCun’s quote suggests, there is an ever-growing need for machine learning methods designed to work with a limited stock of annotations. As we explain in the next section, algorithms designed to do so are known as unsupervised, or semi-supervised approaches. 1.2 Semi-supervised learning To deﬁne semi-supervised learning (SSL), we begin by deﬁning supervised and unsupervised learning, as SSL lies somewhere in between these two concepts. Supervised learning algorithms are machine learning approaches which require that every input data point has a corresponding output data point. The goal of these algorithms is often to train a model that can accurately predict the correct outputs for inputs that were not seen during training. That is, it learns a function from training data that we hope will generalise to unseen data points. Unsupervised learning algorithms are those which require only input data points – no corresponding outputs are provided or assumed to exist. These algorithms can have a variety of goals; unsupervised generative models are generally tasked with being able to generate new data points from the same distribution as the input data, while a range of other unsupervised methods are focused on learning some new representation of the input data. For instance, one might aim to learn a representation of the data that requires less storage space while retaining most of the input information (i.e., compression). 5

Figure 1: Illustration showing the general idea behind many SSL classiﬁcation algorithms, based on the cluster assumption. The known labels in the top panel are propagated to the unlabeled points in the clusters they belong to. SSL algorithms fall in between these paradigms. Strictly speaking, SSL methods are those designed to be used with datasets comprised of both annotated (or labeled) and un- annotated (or unlabeled) subsets. Generally though, these methods assume the number of labeled instances is much smaller than the number of unlabeled instances. This is because unlabeled data tends to be more useful when we have few labeled examples. As explained in Section 3.1, in general these methods rely on some assumption of smoothness or clustering of the input data. The main intuition at the core of most semi-supervised classiﬁcation methods is illustrated in Figure 1. In the context of today’s data-rich world as described in the introduction, we believe that SSL methods play a particularly important role. While unsupervised learning is vital, as we cannot expect to annotate even a tiny proportion of the world’s data, we believe SSL might prove equally important, due to its ability to give direction to unsupervised learning methods. That is, the two sides of SSL can assist one another with regards to the practitioner’s task; the supervised side directs the unsupervised side into extracting structure that is more relevant to our particular task, while the unsupervised side provides the supervised side with more usable information. 6

2 Overview and contributions 2.1 Overview This thesis is structured as follows. In Section 3, we present the assumptions that are re- quired for SSL, and the place the main categories of historic SSL approaches into a contem- porary context. In Section 4 we synthesise the SSL literature, focusing mainly on approaches that involve deep learning in some way. Readers that are already well-versed in the basic concepts of SSL, and the surrounding literature, can be advised to skip or skim Sections 3 and 4. In Section 5 we give a detailed background on three related SSL models. These are all based on generative adversarial networks (GANs) and form a central focus of this thesis. In Section 6 we motivate a number of more basic cost functions used to enable SSL and illustrate their behaviour through synthetic experiments. This exercise gives readers a more intuitive understanding behind the approach taken in our focus model. We then undertake a more theoretical analysis of these loss functions, derive some new alternatives, and hypothesise about their potential advantages and disadvantages in the context of our focus model. Based on the preceding analysis and discussion, in Section 7 we formulate a number of research questions. Then in Section 8 we address each of these questions through larger empirical experiments, and present and discuss the results. Finally in Section 9 we conclude our study, give some practical recommendations, and suggest promising directions for future research. 2.2 Contributions The contributions made in this thesis include: • A broad review of deep learning-based approaches to SSL, placed into a historical context (Sections 3, 4 and 5). • An intuitive walk-through, that motivates and illustrates the behaviour of a number of loss functions commonly found in SSL models (Section 6.1). This is also used to more clearly explain the logic behind our focus model (the BadGAN model, introduced in Section 5.4). • Theoretical analysis of these loss functions and the introduction of alternative options, alongside a theoretical comparison between the approaches used by the CatGAN (in- troduced in Section 5.2) and Improved GAN (introduced in Section 5.3) models (re- mainder of Section 6). • Larger empirical experiments, which address the following research questions: Which of our analysed loss functions, or approaches to the BadGAN-style model, works best? Can the PixelCNN++ component of the BadGAN model be replaced by some- thing that is faster to train and infer from? Do the generated images actively contribute to higher-order feature learning of the classiﬁer network in some way? Is Virtual Adversarial Training [Miyato et al. (2017)] truly orthogonal to a BadGAN- style approach, as asserted in Dai et al. (2017)? • The revival of Information Regularisation, a SSL approach from 2006, derivations mak- ing it suitable for use with neural networks, and evidence suggesting it is competitive with modern approaches (Appendix A). 7

分享到：

赞收藏

资料库

半监督生成对抗网络综述.pdf

相关推荐

人工智能

热门标签

最新资料