logo资料库

matconvnet操作手册.pdf

第1页 / 共55页
第2页 / 共55页
第3页 / 共55页
第4页 / 共55页
第5页 / 共55页
第6页 / 共55页
第7页 / 共55页
第8页 / 共55页
资料共55页,剩余部分请下载后查看
Introduction to MatConvNet
Getting started
MatConvNet at a glance
Documentation and examples
Speed
Acknowledgments
Neural Network Computations
Overview
Network structures
Sequences
Directed acyclic graphs
Computing derivatives with backpropagation
Derivatives of tensor functions
Derivatives of function compositions
Backpropagation networks
Backpropagation in DAGs
DAG backpropagation networks
Wrappers and pre-trained models
Wrappers
SimpleNN
DagNN
Pre-trained models
Learning models
Running large scale experiments
Computational blocks
Convolution
Convolution transpose (deconvolution)
Spatial pooling
Activation functions
Spatial bilinear resampling
Normalization
Local response normalization (LRN)
Batch normalization
Spatial normalization
Softmax
Categorical losses
Classification losses
Attribute losses
Comparisons
p-distance
Geometry
Preliminaries
Simple filters
Pooling in Caffe
Convolution transpose
Transposing receptive fields
Composing receptive fields
Overlaying receptive fields
Implementation details
Convolution
Convolution transpose
Spatial pooling
Activation functions
ReLU
Sigmoid
Spatial bilinear resampling
Normalization
Local response normalization (LRN)
Batch normalization
Spatial normalization
Softmax
Categorical losses
Classification losses
Attribute losses
Comparisons
p-distance
Bibliography
MatConvNet Convolutional Neural Networks for MATLAB Andrea Vedaldi Karel Lenc Ankush Gupta i
ii Abstract MatConvNet is an implementation of Convolutional Neural Networks (CNNs) for MATLAB. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing linear convolutions with filter banks, feature pooling, and many more. In this manner, MatConvNet allows fast prototyping of new CNN architec- tures; at the same time, it supports efficient computation on CPU and GPU allowing to train complex models on large datasets such as ImageNet ILSVRC. This document provides an overview of CNNs and how they are implemented in MatConvNet and gives the technical details of each computational block in the toolbox.
Contents 1 Introduction to MatConvNet 1.1 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 MatConvNet at a glance . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Documentation and examples . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Neural Network Computations 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Network structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Directed acyclic graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Computing derivatives with backpropagation . . . . . . . . . . . . . . . . . . 2.3.1 Derivatives of tensor functions . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Derivatives of function compositions . . . . . . . . . . . . . . . . . . 2.3.3 Backpropagation networks . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Backpropagation in DAGs . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 DAG backpropagation networks . . . . . . . . . . . . . . . . . . . . . 3 Wrappers and pre-trained models 3.1 Wrappers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SimpleNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 3.1.2 DagNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Pre-trained models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Learning models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Running large scale experiments . . . . . . . . . . . . . . . . . . . . . . . . . 4 Computational blocks 4.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Convolution transpose (deconvolution) . . . . . . . . . . . . . . . . . . . . . 4.3 Spatial pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Activation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Spatial bilinear resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Local response normalization (LRN) 1 2 4 5 6 7 9 9 10 10 11 12 12 13 14 15 18 21 21 21 21 22 23 23 25 25 27 29 29 30 30 30 iii
iv CONTENTS 4.6.2 Batch normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.3 4.6.4 Softmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Categorical losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Classification losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 Attribute losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p-distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 5 Geometry 5.1 Preliminaries 5.2 Simple filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Pooling in Caffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Convolution transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Transposing receptive fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Composing receptive fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Overlaying receptive fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Implementation details 6.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Convolution transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Spatial pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Activation functions 6.4.1 ReLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Sigmoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Spatial bilinear resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Local response normalization (LRN) . . . . . . . . . . . . . . . . . . 6.6.2 Batch normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.3 6.6.4 Softmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Categorical losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.1 Classification losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.2 Attribute losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p-distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Bibliography 30 31 31 31 32 33 34 34 37 37 38 38 40 41 42 42 43 43 44 45 45 45 46 46 46 46 47 48 48 49 49 49 50 50 51
Chapter 1 Introduction to MatConvNet MatConvNet is a MATLAB toolbox implementing Convolutional Neural Networks (CNN) for computer vision applications. Since the breakthrough work of [7], CNNs have had a major impact in computer vision, and image understanding in particular, essentially replacing traditional image representations such as the ones implemented in our own VLFeat [11] open source library. While most CNNs are obtained by composing simple linear and non-linear filtering op- erations such as convolution and rectification, their implementation is far from trivial. The reason is that CNNs need to be learned from vast amounts of data, often millions of images, requiring very efficient implementations. As most CNN libraries, MatConvNet achieves this by using a variety of optimizations and, chiefly, by supporting computations on GPUs. Numerous other machine learning, deep learning, and CNN open source libraries exist. To cite some of the most popular ones: CudaConvNet,1 Torch,2 Theano,3 and Caffe4. Many of these libraries are well supported, with dozens of active contributors and large user bases. Therefore, why creating yet another library? The key motivation for developing MatConvNet was to provide an environment par- ticularly friendly and efficient for researchers to use in their investigations.5 MatConvNet achieves this by its deep integration in the MATLAB environment, which is one of the most popular development environments in computer vision research as well as in many other areas. In particular, MatConvNet exposes as simple MATLAB commands CNN building blocks such as convolution, normalisation and pooling (chapter 4); these can then be combined and extended with ease to create CNN architectures. While many of such blocks use optimised CPU and GPU implementations written in C++ and CUDA (section section 1.4), MATLAB native support for GPU computation means that it is often possible to write new blocks in MATLAB directly while maintaining computational efficiency. Compared to writing new CNN components using lower level languages, this is an important simplification that can significantly accelerate testing new ideas. Using MATLAB also provides a bridge towards 1https://code.google.com/p/cuda-convnet/ 2http://cilvr.nyu.edu/doku.php?id=code:start 3http://deeplearning.net/software/theano/ 4http://caffe.berkeleyvision.org 5While from a user perspective MatConvNet currently relies on MATLAB, the library is being devel- oped with a clean separation between MATLAB code and the C++ and CUDA core; therefore, in the future the library may be extended to allow processing convolutional networks independently of MATLAB. 1
2 CHAPTER 1. INTRODUCTION TO MATCONVNET other areas; for instance, MatConvNet was recently used by the University of Arizona in planetary science, as summarised in this NVIDIA blogpost.6 MatConvNet can learn large CNN models such AlexNet [7] and the very deep net- works of [9] from millions of images. Pre-trained versions of several of these powerful models can be downloaded from the MatConvNet home page7. While powerful, MatConvNet remains simple to use and install. The implementation is fully self-contained, requiring only MATLAB and a compatible C++ compiler (using the GPU code requires the freely-available CUDA DevKit and a suitable NVIDIA GPU). As demonstrated in fig. 1.1 and section 1.1, it is possible to download, compile, and install MatConvNet using three MATLAB com- mands. Several fully-functional examples demonstrating how small and large networks can be learned are included. Importantly, several standard pre-trained network can be immedi- ately downloaded and used in applications. A manual with a complete technical description of the toolbox is maintained along with the toolbox.8 These features make MatConvNet useful in an educational context too.9 MatConvNet is open-source released under a BSD-like license. It can be downloaded from http://www.vlfeat.org/matconvnet as well as from GitHub.10. 1.1 Getting started MatConvNet is simple to install and use. fig. 1.1 provides a complete example that clas- sifies an image using a latest-generation deep convolutional neural network. The example includes downloading MatConvNet, compiling the package, downloading a pre-trained CNN model, and evaluating the latter on one of MATLAB’s stock images. The key command in this example is vl_simplenn, a wrapper that takes as input the CNN net and the pre-processed image im_ and produces as output a structure res of results. This particular wrapper can be used to model networks that have a simple structure, namely a chain of operations. Examining the code of vl_simplenn (edit vl_simplenn in MatCon- vNet) we note that the wrapper transforms the data sequentially, applying a number of MATLAB functions as specified by the network configuration. These function, discussed in detail in chapter 4, are called “building blocks” and constitute the backbone of MatCon- vNet. While most blocks implement simple operations, what makes them non trivial is their efficiency (section 1.4) as well as support for backpropagation (section 2.3) to allow learning CNNs. Next, we demonstrate how to use one of such building blocks directly. For the sake of the example, consider convolving an image with a bank of linear filters. Start by reading an image in MATLAB, say using im = single(imread('peppers.png')), obtaining a H × W × D array im, where D = 3 is the number of colour channels in the image. Then create a bank of K = 16 random filters of size 3 × 3 using f = randn(3,3,3,16,'single'). Finally, convolve the 6http://devblogs.nvidia.com/parallelforall/deep-learning-image-understanding-planetary-science/ 7http://www.vlfeat.org/matconvnet/ 8http://www.vlfeat.org/matconvnet/matconvnet-manual.pdf 9An example laboratory experience based on MatConvNet can be downloaded from http://www. robots.ox.ac.uk/~vgg/practicals/cnn/index.html. 10http://www.github.com/matconvnet
1.1. GETTING STARTED 3 'matconvnet−1.0−beta12.tar.gz']) ; % install and compile MatConvNet (run once) untar(['http://www.vlfeat.org/matconvnet/download/' ... cd matconvnet−1.0−beta12 run matlab/vl_compilenn % download a pre−trained CNN from the web (run once) urlwrite(... 'http://www.vlfeat.org/matconvnet/models/imagenet−vgg−f.mat', ... 'imagenet−vgg−f.mat') ; % setup MatConvNet run matlab/vl_setupnn % load the pre−trained CNN net = load('imagenet−vgg−f.mat') ; % load and preprocess an image im = imread('peppers.png') ; im_ = imresize(single(im), net.meta.normalization.imageSize(1:2)) ; im_ = im_ − net.meta.normalization.averageImage ; % run the CNN res = vl_simplenn(net, im_) ; % show the classification result scores = squeeze(gather(res(end).x)) ; [bestScore, best] = max(scores) ; figure(1) ; clf ; imagesc(im) ; title(sprintf('%s (%d), score %.3f',... net.classes.description{best}, best, bestScore)) ; Figure 1.1: A complete example including download, installing, compiling and running Mat- ConvNet to classify one of MATLAB stock images using a large CNN pre-trained on ImageNet. bell pepper (946), score 0.704
4 CHAPTER 1. INTRODUCTION TO MATCONVNET image with the filters by using the command y = vl_nnconv(x,f,[]). This results in an array y with K channels, one for each of the K filters in the bank. While users are encouraged to make use of the blocks directly to create new architectures, MATLAB provides wrappers such as vl_simplenn for standard CNN architectures such as AlexNet [7] or Network-in-Network [8]. Furthermore, the library provides numerous examples (in the examples/ subdirectory), including code to learn a variety of models on the MNIST, CIFAR, and ImageNet datasets. All these examples use the examples/cnn_train training code, which is an implementation of stochastic gradient descent (section 3.3). While this training code is perfectly serviceable and quite flexible, it remains in the examples/ subdirec- tory as it is somewhat problem-specific. Users are welcome to implement their optimisers. 1.2 MatConvNet at a glance MatConvNet has a simple design philosophy. Rather than wrapping CNNs around complex layers of software, it exposes simple functions to compute CNN building blocks, such as linear convolution and ReLU operators, directly as MATLAB commands. These building blocks are easy to combine into complete CNNs and can be used to implement sophisticated learning algorithms. While several real-world examples of small and large CNN architectures and training routines are provided, it is always possible to go back to the basics and build your own, using the efficiency of MATLAB in prototyping. Often no C coding is required at all to try new architectures. As such, MatConvNet is an ideal playground for research in computer vision and CNNs. MatConvNet contains the following elements: • CNN computational blocks. A set of optimized routines computing fundamental building blocks of a CNN. For example, a convolution block is implemented by y=vl_nnconv(x,f,b) where x is an image, f a filter bank, and b a vector of biases (sec- tion 4.1). The derivatives are computed as [dzdx,dzdf,dzdb] = vl_nnconv(x,f,b,dzdy) where dzdy is the derivative of the CNN output w.r.t y (section 4.1). chapter 4 de- scribes all the blocks in detail. • CNN wrappers. MatConvNet provides a simple wrapper, suitably invoked by vl_simplenn, that implements a CNN with a linear topology (a chain of blocks). It also provides a much more flexible wrapper supporting networks with arbitrary topologies, encapsulated in the dagnn.DagNN MATLAB class. • Example applications. MatConvNet provides several examples of learning CNNs with stochastic gradient descent and CPU or GPU, on MNIST, CIFAR10, and ImageNet data. • Pre-trained models. MatConvNet provides several state-of-the-art pre-trained CNN models that can be used off-the-shelf, either to classify images or to produce image encodings in the spirit of Caffe or DeCAF.
分享到:
收藏