logo资料库

Sparse Modeling for Image and Vision Processing.pdf

第1页 / 共202页
第2页 / 共202页
第3页 / 共202页
第4页 / 共202页
第5页 / 共202页
第6页 / 共202页
第7页 / 共202页
第8页 / 共202页
资料共202页,剩余部分请下载后查看
A Short Introduction to Parsimony
Early concepts of parsimony in statistics
Wavelets in signal processing
Modern parsimony: the 1-norm and other variants
Dictionary learning
Compressed sensing and sparse recovery
Theoretical results about dictionary learning
Discovering the Structure of Natural Images
Pre-processing
Principal component analysis
Clustering or vector quantization
Dictionary learning
Structured dictionary learning
Other matrix factorization methods
Discussion
Sparse Models for Image Processing
Image denoising
Image inpainting
Image demosaicking
Image up-scaling
Inverting nonlinear local transformations
Video processing
Face compression
Other patch modeling approaches
Sparse Coding for Visual Recognition
A coding and pooling approach to image modeling
The botany of sparse feature coding
Face recognition
Patch classification and edge detection
Connections with neural networks
Other applications
Optimization Algorithms
Sparse reconstruction with the 0-penalty
Sparse reconstruction with the 1-norm
Iterative reweighted-1 methods
Iterative reweighted-2 methods
Optimization for dictionary learning
Other optimization techniques
Conclusions
Acknowledgments
References
Foundations and Trends R in Computer Graphics and Vision Vol. 8, No. 2-3 (2012) 85–283 c 2014 J. Mairal, F. Bach and J. Ponce DOI: 10.1561/0600000058 Sparse Modeling for Image and Vision Processing Julien Mairal Inria 1 Francis Bach Inria2 julien.mairal@inria.fr francis.bach@inria.fr Jean Ponce Ecole Normale Supérieure3 jean.ponce@ens.fr 1LEAR team, laboratoire Jean Kuntzmann, CNRS, Univ. Grenoble Alpes, France. 2SIERRA team, département d’informatique de l’Ecole Normale Supérieure, ENS/CNRS/Inria UMR 8548, France. 3WILLOW team, département d’informatique de l’Ecole Normale Supérieure, ENS/CNRS/Inria UMR 8548, France.
Contents 1 A Short Introduction to Parsimony 1.1 Early concepts of parsimony in statistics . . . . . . . . . . 1.2 Wavelets in signal processing . . . . . . . . . . . . . . . . 1.3 Modern parsimony: the ℓ1-norm and other variants . . . . 1.4 Dictionary learning . . . . . . . . . . . . . . . . . . . . . . 1.5 Compressed sensing and sparse recovery . . . . . . . . . . 1.6 Theoretical results about dictionary learning . . . . . . . . 2 Discovering the Structure of Natural Images 2.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Principal component analysis . . . . . . . . . . . . . . . . 2.3 Clustering or vector quantization . . . . . . . . . . . . . . 2.4 Dictionary learning . . . . . . . . . . . . . . . . . . . . . . 2.5 Structured dictionary learning . . . . . . . . . . . . . . . . 2.6 Other matrix factorization methods . . . . . . . . . . . . . 2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Sparse Models for Image Processing 3.1 3.2 3.3 Image denoising . . . . . . . . . . . . . . . . . . . . . . . Image inpainting . . . . . . . . . . . . . . . . . . . . . . . Image demosaicking . . . . . . . . . . . . . . . . . . . . . 2 6 8 14 32 35 39 44 46 52 56 59 60 64 73 75 76 82 84 ii
Image up-scaling . . . . . . . . . . . . . . . . . . . . . . . 3.4 3.5 Inverting nonlinear local transformations . . . . . . . . . . 3.6 Video processing . . . . . . . . . . . . . . . . . . . . . . . 3.7 Face compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Other patch modeling approaches iii 87 92 94 96 99 4 Sparse Coding for Visual Recognition 106 4.1 A coding and pooling approach to image modeling . . . . 107 4.2 The botany of sparse feature coding . . . . . . . . . . . . 115 4.3 Face recognition . . . . . . . . . . . . . . . . . . . . . . . 122 4.4 Patch classification and edge detection . . . . . . . . . . . 124 4.5 Connections with neural networks . . . . . . . . . . . . . . 130 4.6 Other applications . . . . . . . . . . . . . . . . . . . . . . 135 5 Optimization Algorithms 140 5.1 Sparse reconstruction with the ℓ0-penalty . . . . . . . . . 141 5.2 Sparse reconstruction with the ℓ1-norm . . . . . . . . . . . 148 5.3 Iterative reweighted-ℓ1 methods . . . . . . . . . . . . . . . 154 5.4 Iterative reweighted-ℓ2 methods . . . . . . . . . . . . . . . 156 5.5 Optimization for dictionary learning . . . . . . . . . . . . . 158 5.6 Other optimization techniques . . . . . . . . . . . . . . . 169 6 Conclusions Acknowledgments References 170 172 173
Abstract In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection—that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of rep- resenting data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by sev- eral scientific communities such as neuroscience, bioinformatics, or com- puter vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts. J. Mairal, F. Bach and J. Ponce. Sparse Modeling for Image and Vision Processing. Foundations and Trends R in Computer Graphics and Vision, vol. 8, no. 2-3, pp. 85–283, 2012. DOI: 10.1561/0600000058.
1 A Short Introduction to Parsimony In its most general definition, the principle of sparsity, or parsimony, consists of representing some phenomenon with as few variables as possible. It appears to be central to many research fields and is often considered to be inspired from an early doctrine formulated by the philosopher and theologian William of Ockham in the 14th century, which essentially favors simple theories over more complex ones. Of course, the link between Ockham and the tools presented in this mono- graph is rather thin, and more modern views seem to appear later in the beginning of the 20th century. Discussing the scientific method, Wrinch and Jeffreys [1921] introduce indeed a simplicity principle re- lated to parsimony as follows: The existence of simple laws is, then, apparently, to be re- garded as a quality of nature; and accordingly we may infer that it is justifiable to prefer a simple law to a more complex one that fits our observations slightly better. Remarkably, Wrinch and Jeffreys [1921] further discuss statistical mod- eling of physical observations and relate the concept of “simplicity” to the number of learning parameters; as a matter of fact, this concept is relatively close to the contemporary view of parsimony. 2
3 Subsequently, numerous tools have been developed by statisticians to build models of physical phenomena with good predictive power. Models are usually learned from observed data, and their generaliza- tion performance is evaluated on test data. Among a collection of plau- sible models, the simplest one is often preferred, and the number of underlying parameters is used as a criterion to perform model selec- tion [Mallows, 1964, 1966, Akaike, 1973, Hocking, 1976, Barron et al., 1998, Rissanen, 1978, Schwarz, 1978, Tibshirani, 1996]. In signal processing, similar problems as in statistics arise, but a different terminology is used. Observations, or data vectors, are called “signals”, and data modeling appears to be a crucial step for perform- ing various operations such as restoration, compression, or for solving inverse problems. Here also, the sparsity principle plays an important role and has been successful [Mallat and Zhang, 1993, Pati et al., 1993, Donoho and Johnstone, 1994, Cotter et al., 1999, Chen et al., 1999, Donoho, 2006, Candès et al., 2006]. Each signal is approximated by a sparse linear combination of prototypes called dictionary elements, resulting in simple and compact models. However, statistics and signal processing remain two distinct fields with different objectives and methodology; specifically, signals of- ten come from the same data source, e.g., natural images, whereas problems considered in statistics are unrelated to each other in gen- eral. Then, a long series of works has been devoted to finding ap- propriate dictionaries for signal classes of interest, leading to vari- ous sorts of wavelets [Freeman and Adelson, 1991, Simoncelli et al., 1992, Donoho, 1999, Candès and Donoho, 2002, Do and Vetterli, 2005, Le Pennec and Mallat, 2005, Mallat, 2008]. Even though statistics and signal processing have devised most of the methodology of sparse modeling, the parsimony principle was also discovered independently in other fields. To some extent, it appears indeed in the work of Markowitz [1952] about portfolio selection in finance, and also in geophysics [Claerbout and Muir, 1973, Taylor et al., 1979]. In neuroscience, Olshausen and Field [1996, 1997] proposed a significantly different approach to sparse modeling than previ- ously established practices. Whereas classical techniques in signal
4 A Short Introduction to Parsimony processing were using fixed off-the-shelf dictionaries, the method of Olshausen and Field [1996, 1997] consists of learning it from train- ing data. In a pioneer exploratory experiment, they demonstrated that dictionary learning could easily discover underlying structures in nat- ural image patches; later, their approach found numerous applica- tions in many fields, notably in image and audio processing [Lewicki, 2002, Elad and Aharon, 2006, Mairal et al., 2009, Yang et al., 2010a] and computer vision [Raina et al., 2007, Yang et al., 2009, Zeiler et al., 2011, Mairal et al., 2012, Song et al., 2012, Castrodad and Sapiro, 2012, Elhamifar et al., 2012, Pokrass et al., 2013]. The goal of this monograph is to present basic tools of sparse mod- eling and their applications to visual recognition and image processing. We aim at offering a self-contained view combining pluri-disciplinary methodology, practical advice, and a large review of the literature. Most of the figures in the paper are produced with the software SPAMS1, and the corresponding Matlab code will be provided on the first author’s webpage. The monograph is organized as follows: the current introductory section is divided into several parts providing a simple historical view of sparse estimation. In Section 1.1, we start with early concepts of parsimony in statistics and information theory from the 70’s and 80’s. We present the use of sparse estimation within the wavelet framework in Section 1.2, which was essentially developed in the 90’s. Section 1.3 introduces the era of “modern parsimony”—that is, the ℓ1-norm and its variants, which have been heavily used during the last two decades. Section 1.4 is devoted to the dictionary learning formulation originally introduced by Olshausen and Field [1996, 1997], which is a key com- ponent of most applications presented later in this monograph. In Sec- tions 1.5 and 1.6, we conclude our introductory tour with some theo- retical aspects, such as the concept of “compressed sensing” and sparse recovery that has attracted a large attention in recent years. With all these parsimonious tools in hand, we discuss the use of sparse coding and related sparse matrix factorization techniques for discovering the underlying structure of natural image patches in Sec- 1available here http://spams-devel.gforge.inria.fr/.
5 tion 2 . Even though the task here is subjective and exploratory, it is the first successful instance of dictionary learning; the insight gained from these early experiments forms the basis of concrete applications presented in subsequent sections. Section 3 covers numerous applications of sparse models of natural image patches in image processing, such as image denoising, super- resolution, inpainting, or demosaicking. This section is concluded with other related patch-modeling approaches. Section 4 presents recent success of sparse models for visual recogni- tion, such as codebook learning of visual descriptors, face recognition, or more low-level tasks such as edge detection and classification of tex- tures and digits. We conclude the section with other computer vision applications such as visual tracking and data visualization. Section 5 is devoted to optimization algorithms. It presents in a concise way efficient algorithms for solving sparse decomposition and dictionary learning problems. We see our monograph as a good complement of other books and monographs about sparse estimation, which offer different perspec- tives, such as Mallat [2008], Elad [2010] in signal and image processing, or Bach et al. [2012a] in optimization and machine learning. We also put the emphasis on the structure of natural image patches learned with dictionary learning, and thus present an alternative view to the book of Hyvärinen et al. [2009], which is focused on independent component analysis. Notation. In this monograph, vectors are denoted by bold lower-case letters and matrices by upper-case ones. For instance, we consider in the rest of this paragraph a vector x in Rn and a matrix X in Rm×n. The columns of X are represented by indexed vectors x1, . . . , xn such that we can write X = [x1, . . . , xn]. The i-th entry of x is denoted by x[i], and the i-th entry of the j-th column of X is represented by X[i, j]. For any subset g of {1, . . . , n}, we denote by x[g] the vec- tor in R|g| that records the entries of x corresponding to indices in g. For q ≥ 1, we define the ℓq-norm of x as kxkq , (Pn i=1 |x[i]|q)1/q, , limq→+∞ kxkq = maxi=1,...,n |x[i]|. and the ℓ∞-norm as kxk∞
分享到:
收藏