logo资料库

计算机视觉——算法与应用(PDF).pdf

第1页 / 共979页
第2页 / 共979页
第3页 / 共979页
第4页 / 共979页
第5页 / 共979页
第6页 / 共979页
第7页 / 共979页
第8页 / 共979页
资料共979页,剩余部分请下载后查看
Preface
Contents
Introduction
What is computer vision?
A brief history
Book overview
Sample syllabus
A note on notation
Additional reading
Image formation
Geometric primitives and transformations
Geometric primitives
2D transformations
3D transformations
3D rotations
3D to 2D projections
Lens distortions
Photometric image formation
Lighting
Reflectance and shading
Optics
The digital camera
Sampling and aliasing
Color
Compression
Additional reading
Exercises
Image processing
Point operators
Pixel transforms
Color transforms
Compositing and matting
Histogram equalization
Application: Tonal adjustment
Linear filtering
Separable filtering
Examples of linear filtering
Band-pass and steerable filters
More neighborhood operators
Non-linear filtering
Morphology
Distance transforms
Connected components
Fourier transforms
Fourier transform pairs
Two-dimensional Fourier transforms
Wiener filtering
Application: Sharpening, blur, and noise removal
Pyramids and wavelets
Interpolation
Decimation
Multi-resolution representations
Wavelets
Application: Image blending
Geometric transformations
Parametric transformations
Mesh-based warping
Application: Feature-based morphing
Global optimization
Regularization
Markov random fields
Application: Image restoration
Additional reading
Exercises
Feature detection and matching
Points and patches
Feature detectors
Feature descriptors
Feature matching
Feature tracking
Application: Performance-driven animation
Edges
Edge detection
Edge linking
Application: Edge editing and enhancement
Lines
Successive approximation
Hough transforms
Vanishing points
Application: Rectangle detection
Additional reading
Exercises
Segmentation
Active contours
Snakes
Dynamic snakes and CONDENSATION
Scissors
Level Sets
Application: Contour tracking and rotoscoping
Split and merge
Watershed
Region splitting (divisive clustering)
Region merging (agglomerative clustering)
Graph-based segmentation
Probabilistic aggregation
Mean shift and mode finding
K-means and mixtures of Gaussians
Mean shift
Normalized cuts
Graph cuts and energy-based methods
Application: Medical image segmentation
Additional reading
Exercises
Feature-based alignment
2D and 3D feature-based alignment
2D alignment using least squares
Application: Panography
Iterative algorithms
Robust least squares and RANSAC
3D alignment
Pose estimation
Linear algorithms
Iterative algorithms
Application: Augmented reality
Geometric intrinsic calibration
Calibration patterns
Vanishing points
Application: Single view metrology
Rotational motion
Radial distortion
Additional reading
Exercises
Structure from motion
Triangulation
Two-frame structure from motion
Projective (uncalibrated) reconstruction
Self-calibration
Application: View morphing
Factorization
Perspective and projective factorization
Application: Sparse 3D model extraction
Bundle adjustment
Exploiting sparsity
Application: Match move and augmented reality
Uncertainty and ambiguities
Application: Reconstruction from Internet photos
Constrained structure and motion
Line-based techniques
Plane-based techniques
Additional reading
Exercises
Dense motion estimation
Translational alignment
Hierarchical motion estimation
Fourier-based alignment
Incremental refinement
Parametric motion
Application: Video stabilization
Learned motion models
Spline-based motion
Application: Medical image registration
Optical flow
Multi-frame motion estimation
Application: Video denoising
Application: De-interlacing
Layered motion
Application: Frame interpolation
Transparent layers and reflections
Additional reading
Exercises
Image stitching
Motion models
Planar perspective motion
Application: Whiteboard and document scanning
Rotational panoramas
Gap closing
Application: Video summarization and compression
Cylindrical and spherical coordinates
Global alignment
Bundle adjustment
Parallax removal
Recognizing panoramas
Direct vs. feature-based alignment
Compositing
Choosing a compositing surface
Pixel selection and weighting (de-ghosting)
Application: Photomontage
Blending
Additional reading
Exercises
Computational photography
Photometric calibration
Radiometric response function
Noise level estimation
Vignetting
Optical blur (spatial response) estimation
High dynamic range imaging
Tone mapping
Application: Flash photography
Super-resolution and blur removal
Color image demosaicing
Application: Colorization
Image matting and compositing
Blue screen matting
Natural image matting
Optimization-based matting
Smoke, shadow, and flash matting
Video matting
Texture analysis and synthesis
Application: Hole filling and inpainting
Application: Non-photorealistic rendering
Additional reading
Exercises
Stereo correspondence
Epipolar geometry
Rectification
Plane sweep
Sparse correspondence
3D curves and profiles
Dense correspondence
Similarity measures
Local methods
Sub-pixel estimation and uncertainty
Application: Stereo-based head tracking
Global optimization
Dynamic programming
Segmentation-based techniques
Application: Z-keying and background replacement
Multi-view stereo
Volumetric and 3D surface reconstruction
Shape from silhouettes
Additional reading
Exercises
3D reconstruction
Shape from X
Shape from shading and photometric stereo
Shape from texture
Shape from focus
Active rangefinding
Range data merging
Application: Digital heritage
Surface representations
Surface interpolation
Surface simplification
Geometry images
Point-based representations
Volumetric representations
Implicit surfaces and level sets
Model-based reconstruction
Architecture
Heads and faces
Application: Facial animation
Whole body modeling and tracking
Recovering texture maps and albedos
Estimating BRDFs
Application: 3D photography
Additional reading
Exercises
Image-based rendering
View interpolation
View-dependent texture maps
Application: Photo Tourism
Layered depth images
Impostors, sprites, and layers
Light fields and Lumigraphs
Unstructured Lumigraph
Surface light fields
Application: Concentric mosaics
Environment mattes
Higher-dimensional light fields
The modeling to rendering continuum
Video-based rendering
Video-based animation
Video textures
Application: Animating pictures
3D Video
Application: Video-based walkthroughs
Additional reading
Exercises
Recognition
Object detection
Face detection
Pedestrian detection
Face recognition
Eigenfaces
Active appearance and 3D shape models
Application: Personal photo collections
Instance recognition
Geometric alignment
Large databases
Application: Location recognition
Category recognition
Bag of words
Part-based models
Recognition with segmentation
Application: Intelligent photo editing
Context and scene understanding
Learning and large image collections
Application: Image search
Recognition databases and test sets
Additional reading
Exercises
Conclusion
Linear algebra and numerical techniques
Matrix decompositions
Singular value decomposition
Eigenvalue decomposition
QR factorization
Cholesky factorization
Linear least squares
Total least squares
Non-linear least squares
Direct sparse matrix techniques
Variable reordering
Iterative techniques
Conjugate gradient
Preconditioning
Multigrid
Bayesian modeling and inference
Estimation theory
Likelihood for multivariate Gaussian noise
Maximum likelihood estimation and least squares
Robust statistics
Prior models and Bayesian inference
Markov random fields
Gradient descent and simulated annealing
Dynamic programming
Belief propagation
Graph cuts
Linear programming
Uncertainty estimation (error analysis)
Supplementary material
Data sets
Software
Slides and lectures
Bibliography
References
Index
Computer Vision: Algorithms and Applications Richard Szeliski September 3, 2010 draft c2010 Springer This electronic draft is for non-commercial personal use only, and may not be posted or re-distributed in any form. Please refer interested readers to the book’s Web site at http://szeliski.org/Book/.
This book is dedicated to my parents, Zdzisław and Jadwiga, and my family, Lyn, Anne, and Stephen.
1 Introduction What is computer vision? • A brief history • Book overview • Sample syllabus • Notation 2 Image formation Geometric primitives and transformations • Photometric image formation • The digital camera 3 Image processing 1 29 99 Point operators • Linear filtering • More neighborhood operators • Fourier transforms • Pyramids and wavelets • Geometric transformations • Global optimization 4 Feature detection and matching 205 Points and patches • Edges • Lines 5 Segmentation Mean shift and mode finding • Normalized cuts • Active contours • Split and merge • Graph cuts and energy-based methods 6 Feature-based alignment 2D and 3D feature-based alignment • Pose estimation • Geometric intrinsic calibration 7 Structure from motion 267 309 343 Triangulation • Two-frame structure from motion • Factorization • Bundle adjustment • Constrained structure and motion n^
8 Dense motion estimation Translational alignment • Parametric motion • Spline-based motion • Optical flow • Layered motion 9 Image stitching Motion models • Global alignment • Compositing 381 427 10 Computational photography 467 Photometric calibration • High dynamic range imaging • Super-resolution and blur removal • Image matting and compositing • Texture analysis and synthesis 11 Stereo correspondence 533 Epipolar geometry • Sparse correspondence • Dense correspondence • Local methods • Global optimization • Multi-view stereo 12 3D reconstruction 577 Shape from X • Active rangefinding • Surface representations • Point-based representations • Volumetric representations • Model-based reconstruction • 619 13 Image-based rendering Recovering texture maps and albedos View interpolation • Layered depth images • Light fields and Lumigraphs • Environment mattes • Video-based rendering 14 Recognition 655 Instance recognition • Category recognition • Object detection • Face recognition • Context and scene understanding • Recognition databases and test sets
Preface The seeds for this book were first planted in 2001 when Steve Seitz at the University of Wash- ington invited me to co-teach a course called “Computer Vision for Computer Graphics”. At that time, computer vision techniques were increasingly being used in computer graphics to create image-based models of real-world objects, to create visual effects, and to merge real- world imagery using computational photography techniques. Our decision to focus on the applications of computer vision to fun problems such as image stitching and photo-based 3D modeling from personal photos seemed to resonate well with our students. Since that time, a similar syllabus and project-oriented course structure has been used to teach general computer vision courses both at the University of Washington and at Stanford. (The latter was a course I co-taught with David Fleet in 2003.) Similar curricula have been adopted at a number of other universities and also incorporated into more specialized courses on computational photography. (For ideas on how to use this book in your own course, please see Table 1.1 in Section 1.4.) This book also reflects my 20 years’ experience doing computer vision research in corpo- rate research labs, mostly at Digital Equipment Corporation’s Cambridge Research Lab and at Microsoft Research. In pursuing my work, I have mostly focused on problems and solu- tion techniques (algorithms) that have practical real-world applications and that work well in practice. Thus, this book has more emphasis on basic techniques that work under real-world conditions and less on more esoteric mathematics that has intrinsic elegance but less practical applicability. This book is suitable for teaching a senior-level undergraduate course in computer vision to students in both computer science and electrical engineering. I prefer students to have either an image processing or a computer graphics course as a prerequisite so that they can spend less time learning general background mathematics and more time studying computer vision techniques. The book is also suitable for teaching graduate-level courses in computer vision (by delving into the more demanding application and algorithmic areas) and as a gen- eral reference to fundamental techniques and the recent research literature. To this end, I have attempted wherever possible to at least cite the newest research in each sub-field, even if the
viii Computer Vision: Algorithms and Applications (September 3, 2010 draft) technical details are too complex to cover in the book itself. In teaching our courses, we have found it useful for the students to attempt a number of small implementation projects, which often build on one another, in order to get them used to working with real-world images and the challenges that these present. The students are then asked to choose an individual topic for each of their small-group, final projects. (Sometimes these projects even turn into conference papers!) The exercises at the end of each chapter contain numerous suggestions for smaller mid-term projects, as well as more open-ended problems whose solutions are still active research topics. Wherever possible, I encourage students to try their algorithms on their own personal photographs, since this better motivates them, often leads to creative variants on the problems, and better acquaints them with the variety and complexity of real-world imagery. In formulating and solving computer vision problems, I have often found it useful to draw inspiration from three high-level approaches: • Scientific: build detailed models of the image formation process and develop mathe- matical techniques to invert these in order to recover the quantities of interest (where necessary, making simplifying assumption to make the mathematics more tractable). • Statistical: use probabilistic models to quantify the prior likelihood of your unknowns and the noisy measurement processes that produce the input images, then infer the best possible estimates of your desired quantities and analyze their resulting uncertainties. The inference algorithms used are often closely related to the optimization techniques used to invert the (scientific) image formation processes. • Engineering: develop techniques that are simple to describe and implement but that are also known to work well in practice. Test these techniques to understand their limitation and failure modes, as well as their expected computational costs (run-time performance). These three approaches build on each other and are used throughout the book. My personal research and development philosophy (and hence the exercises in the book) have a strong emphasis on testing algorithms. It’s too easy in computer vision to develop an algorithm that does something plausible on a few images rather than something correct. The best way to validate your algorithms is to use a three-part strategy. First, test your algorithm on clean synthetic data, for which the exact results are known. Second, add noise to the data and evaluate how the performance degrades as a function of noise level. Finally, test the algorithm on real-world data, preferably drawn from a wide variety of sources, such as photos found on the Web. Only then can you truly know if your algorithm can deal with real-world complexity, i.e., images that do not fit some simplified model or assumptions.
分享到:
收藏