Sparse Modeling for Image and Vision Processing.pdf

发布时间：2022-06-08 发布人：admin 分类：说明书资料大小：21.60M 资料格式：pdf 举报版权申诉

lisen92-8616267-review_sparse_arxiv.pdf-第1页.png

第1页 / 共202页

lisen92-8616267-review_sparse_arxiv.pdf-第2页.png

第2页 / 共202页

lisen92-8616267-review_sparse_arxiv.pdf-第3页.png

第3页 / 共202页

lisen92-8616267-review_sparse_arxiv.pdf-第4页.png

第4页 / 共202页

lisen92-8616267-review_sparse_arxiv.pdf-第5页.png

第5页 / 共202页

lisen92-8616267-review_sparse_arxiv.pdf-第6页.png

第6页 / 共202页

lisen92-8616267-review_sparse_arxiv.pdf-第7页.png

第7页 / 共202页

lisen92-8616267-review_sparse_arxiv.pdf-第8页.png

第8页 / 共202页

A Short Introduction to Parsimony

Early concepts of parsimony in statistics

Wavelets in signal processing

Modern parsimony: the 1-norm and other variants

Dictionary learning

Compressed sensing and sparse recovery

Theoretical results about dictionary learning

Discovering the Structure of Natural Images

Pre-processing

Principal component analysis

Clustering or vector quantization

Dictionary learning

Structured dictionary learning

Other matrix factorization methods

Discussion

Sparse Models for Image Processing

Image denoising

Image inpainting

Image demosaicking

Image up-scaling

Inverting nonlinear local transformations

Video processing

Face compression

Other patch modeling approaches

Sparse Coding for Visual Recognition

A coding and pooling approach to image modeling

The botany of sparse feature coding

Face recognition

Patch classification and edge detection

Connections with neural networks

Other applications

Optimization Algorithms

Sparse reconstruction with the 0-penalty

Sparse reconstruction with the 1-norm

Iterative reweighted-1 methods

Iterative reweighted-2 methods

Optimization for dictionary learning

Other optimization techniques

Conclusions

Acknowledgments

References

Foundations and Trends R in Computer Graphics and Vision Vol. 8, No. 2-3 (2012) 85–283 c 2014 J. Mairal, F. Bach and J. Ponce DOI: 10.1561/0600000058 Sparse Modeling for Image and Vision Processing Julien Mairal Inria 1 Francis Bach Inria2 julien.mairal@inria.fr francis.bach@inria.fr Jean Ponce Ecole Normale Supérieure3 jean.ponce@ens.fr 1LEAR team, laboratoire Jean Kuntzmann, CNRS, Univ. Grenoble Alpes, France. 2SIERRA team, département d’informatique de l’Ecole Normale Supérieure, ENS/CNRS/Inria UMR 8548, France. 3WILLOW team, département d’informatique de l’Ecole Normale Supérieure, ENS/CNRS/Inria UMR 8548, France.

Contents 1 A Short Introduction to Parsimony 1.1 Early concepts of parsimony in statistics . . . . . . . . . . 1.2 Wavelets in signal processing . . . . . . . . . . . . . . . . 1.3 Modern parsimony: the ℓ1-norm and other variants . . . . 1.4 Dictionary learning . . . . . . . . . . . . . . . . . . . . . . 1.5 Compressed sensing and sparse recovery . . . . . . . . . . 1.6 Theoretical results about dictionary learning . . . . . . . . 2 Discovering the Structure of Natural Images 2.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Principal component analysis . . . . . . . . . . . . . . . . 2.3 Clustering or vector quantization . . . . . . . . . . . . . . 2.4 Dictionary learning . . . . . . . . . . . . . . . . . . . . . . 2.5 Structured dictionary learning . . . . . . . . . . . . . . . . 2.6 Other matrix factorization methods . . . . . . . . . . . . . 2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Sparse Models for Image Processing 3.1 3.2 3.3 Image denoising . . . . . . . . . . . . . . . . . . . . . . . Image inpainting . . . . . . . . . . . . . . . . . . . . . . . Image demosaicking . . . . . . . . . . . . . . . . . . . . . 2 6 8 14 32 35 39 44 46 52 56 59 60 64 73 75 76 82 84 ii

Image up-scaling . . . . . . . . . . . . . . . . . . . . . . . 3.4 3.5 Inverting nonlinear local transformations . . . . . . . . . . 3.6 Video processing . . . . . . . . . . . . . . . . . . . . . . . 3.7 Face compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Other patch modeling approaches iii 87 92 94 96 99 4 Sparse Coding for Visual Recognition 106 4.1 A coding and pooling approach to image modeling . . . . 107 4.2 The botany of sparse feature coding . . . . . . . . . . . . 115 4.3 Face recognition . . . . . . . . . . . . . . . . . . . . . . . 122 4.4 Patch classiﬁcation and edge detection . . . . . . . . . . . 124 4.5 Connections with neural networks . . . . . . . . . . . . . . 130 4.6 Other applications . . . . . . . . . . . . . . . . . . . . . . 135 5 Optimization Algorithms 140 5.1 Sparse reconstruction with the ℓ0-penalty . . . . . . . . . 141 5.2 Sparse reconstruction with the ℓ1-norm . . . . . . . . . . . 148 5.3 Iterative reweighted-ℓ1 methods . . . . . . . . . . . . . . . 154 5.4 Iterative reweighted-ℓ2 methods . . . . . . . . . . . . . . . 156 5.5 Optimization for dictionary learning . . . . . . . . . . . . . 158 5.6 Other optimization techniques . . . . . . . . . . . . . . . 169 6 Conclusions Acknowledgments References 170 172 173

Abstract In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection—that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of rep- resenting data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by sev- eral scientiﬁc communities such as neuroscience, bioinformatics, or com- puter vision. The goal of this monograph is to oﬀer a self-contained view of sparse modeling for visual recognition and image processing. More speciﬁcally, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts. J. Mairal, F. Bach and J. Ponce. Sparse Modeling for Image and Vision Processing. Foundations and Trends R in Computer Graphics and Vision, vol. 8, no. 2-3, pp. 85–283, 2012. DOI: 10.1561/0600000058.

1 A Short Introduction to Parsimony In its most general deﬁnition, the principle of sparsity, or parsimony, consists of representing some phenomenon with as few variables as possible. It appears to be central to many research ﬁelds and is often considered to be inspired from an early doctrine formulated by the philosopher and theologian William of Ockham in the 14th century, which essentially favors simple theories over more complex ones. Of course, the link between Ockham and the tools presented in this mono- graph is rather thin, and more modern views seem to appear later in the beginning of the 20th century. Discussing the scientiﬁc method, Wrinch and Jeﬀreys [1921] introduce indeed a simplicity principle re- lated to parsimony as follows: The existence of simple laws is, then, apparently, to be re- garded as a quality of nature; and accordingly we may infer that it is justiﬁable to prefer a simple law to a more complex one that ﬁts our observations slightly better. Remarkably, Wrinch and Jeﬀreys [1921] further discuss statistical mod- eling of physical observations and relate the concept of “simplicity” to the number of learning parameters; as a matter of fact, this concept is relatively close to the contemporary view of parsimony. 2

3 Subsequently, numerous tools have been developed by statisticians to build models of physical phenomena with good predictive power. Models are usually learned from observed data, and their generaliza- tion performance is evaluated on test data. Among a collection of plau- sible models, the simplest one is often preferred, and the number of underlying parameters is used as a criterion to perform model selec- tion [Mallows, 1964, 1966, Akaike, 1973, Hocking, 1976, Barron et al., 1998, Rissanen, 1978, Schwarz, 1978, Tibshirani, 1996]. In signal processing, similar problems as in statistics arise, but a diﬀerent terminology is used. Observations, or data vectors, are called “signals”, and data modeling appears to be a crucial step for perform- ing various operations such as restoration, compression, or for solving inverse problems. Here also, the sparsity principle plays an important role and has been successful [Mallat and Zhang, 1993, Pati et al., 1993, Donoho and Johnstone, 1994, Cotter et al., 1999, Chen et al., 1999, Donoho, 2006, Candès et al., 2006]. Each signal is approximated by a sparse linear combination of prototypes called dictionary elements, resulting in simple and compact models. However, statistics and signal processing remain two distinct ﬁelds with diﬀerent objectives and methodology; speciﬁcally, signals of- ten come from the same data source, e.g., natural images, whereas problems considered in statistics are unrelated to each other in gen- eral. Then, a long series of works has been devoted to ﬁnding ap- propriate dictionaries for signal classes of interest, leading to vari- ous sorts of wavelets [Freeman and Adelson, 1991, Simoncelli et al., 1992, Donoho, 1999, Candès and Donoho, 2002, Do and Vetterli, 2005, Le Pennec and Mallat, 2005, Mallat, 2008]. Even though statistics and signal processing have devised most of the methodology of sparse modeling, the parsimony principle was also discovered independently in other ﬁelds. To some extent, it appears indeed in the work of Markowitz [1952] about portfolio selection in ﬁnance, and also in geophysics [Claerbout and Muir, 1973, Taylor et al., 1979]. In neuroscience, Olshausen and Field [1996, 1997] proposed a signiﬁcantly diﬀerent approach to sparse modeling than previ- ously established practices. Whereas classical techniques in signal

4 A Short Introduction to Parsimony processing were using ﬁxed oﬀ-the-shelf dictionaries, the method of Olshausen and Field [1996, 1997] consists of learning it from train- ing data. In a pioneer exploratory experiment, they demonstrated that dictionary learning could easily discover underlying structures in nat- ural image patches; later, their approach found numerous applica- tions in many ﬁelds, notably in image and audio processing [Lewicki, 2002, Elad and Aharon, 2006, Mairal et al., 2009, Yang et al., 2010a] and computer vision [Raina et al., 2007, Yang et al., 2009, Zeiler et al., 2011, Mairal et al., 2012, Song et al., 2012, Castrodad and Sapiro, 2012, Elhamifar et al., 2012, Pokrass et al., 2013]. The goal of this monograph is to present basic tools of sparse mod- eling and their applications to visual recognition and image processing. We aim at oﬀering a self-contained view combining pluri-disciplinary methodology, practical advice, and a large review of the literature. Most of the ﬁgures in the paper are produced with the software SPAMS1, and the corresponding Matlab code will be provided on the ﬁrst author’s webpage. The monograph is organized as follows: the current introductory section is divided into several parts providing a simple historical view of sparse estimation. In Section 1.1, we start with early concepts of parsimony in statistics and information theory from the 70’s and 80’s. We present the use of sparse estimation within the wavelet framework in Section 1.2, which was essentially developed in the 90’s. Section 1.3 introduces the era of “modern parsimony”—that is, the ℓ1-norm and its variants, which have been heavily used during the last two decades. Section 1.4 is devoted to the dictionary learning formulation originally introduced by Olshausen and Field [1996, 1997], which is a key com- ponent of most applications presented later in this monograph. In Sec- tions 1.5 and 1.6, we conclude our introductory tour with some theo- retical aspects, such as the concept of “compressed sensing” and sparse recovery that has attracted a large attention in recent years. With all these parsimonious tools in hand, we discuss the use of sparse coding and related sparse matrix factorization techniques for discovering the underlying structure of natural image patches in Sec- 1available here http://spams-devel.gforge.inria.fr/.

5 tion 2 . Even though the task here is subjective and exploratory, it is the ﬁrst successful instance of dictionary learning; the insight gained from these early experiments forms the basis of concrete applications presented in subsequent sections. Section 3 covers numerous applications of sparse models of natural image patches in image processing, such as image denoising, super- resolution, inpainting, or demosaicking. This section is concluded with other related patch-modeling approaches. Section 4 presents recent success of sparse models for visual recogni- tion, such as codebook learning of visual descriptors, face recognition, or more low-level tasks such as edge detection and classiﬁcation of tex- tures and digits. We conclude the section with other computer vision applications such as visual tracking and data visualization. Section 5 is devoted to optimization algorithms. It presents in a concise way eﬃcient algorithms for solving sparse decomposition and dictionary learning problems. We see our monograph as a good complement of other books and monographs about sparse estimation, which oﬀer diﬀerent perspec- tives, such as Mallat [2008], Elad [2010] in signal and image processing, or Bach et al. [2012a] in optimization and machine learning. We also put the emphasis on the structure of natural image patches learned with dictionary learning, and thus present an alternative view to the book of Hyvärinen et al. [2009], which is focused on independent component analysis. Notation. In this monograph, vectors are denoted by bold lower-case letters and matrices by upper-case ones. For instance, we consider in the rest of this paragraph a vector x in Rn and a matrix X in Rm×n. The columns of X are represented by indexed vectors x1, . . . , xn such that we can write X = [x1, . . . , xn]. The i-th entry of x is denoted by x[i], and the i-th entry of the j-th column of X is represented by X[i, j]. For any subset g of {1, . . . , n}, we denote by x[g] the vec- tor in R|g| that records the entries of x corresponding to indices in g. For q ≥ 1, we deﬁne the ℓq-norm of x as kxkq , (Pn i=1 |x[i]|q)1/q, , limq→+∞ kxkq = maxi=1,...,n |x[i]|. and the ℓ∞-norm as kxk∞

分享到：

赞收藏

资料库

Sparse Modeling for Image and Vision Processing.pdf

相关推荐

课程资源

热门标签

最新资料