logo资料库

稠密重建(MVS)的几种方法详解.pdf

第1页 / 共151页
第2页 / 共151页
第3页 / 共151页
第4页 / 共151页
第5页 / 共151页
第6页 / 共151页
第7页 / 共151页
第8页 / 共151页
资料共151页,剩余部分请下载后查看
Multi-View Stereo: A Tutorial Yasutaka Furukawa Washington University in St. Louis furukawa@wustl.edu Carlos Hernández Google Inc. carloshernandez@google.com
Contents 1 Introduction 1.1 Imagery collection . . . . . . . . . . . . . . . . . . . . . . 1.2 Camera projection models . . . . . . . . . . . . . . . . . . 1.3 Structure from Motion . . . . . . . . . . . . . . . . . . . 1.4 Bundle Adjustment . . . . . . . . . . . . . . . . . . . . . 1.5 Multi-View Stereo . . . . . . . . . . . . . . . . . . . . . . 2 Multi-view Photo-consistency 2.1 Photo-consistency measures . . . . . . . . . . . . . . . . . 2.2 Visibility estimation in state-of-the-art algorithms . . . . . 2 5 7 9 12 13 16 17 31 3 Algorithms: From Photo-Consistency to 3D Reconstruction 37 43 61 71 83 3.1 Depthmap Reconstruction . . . . . . . . . . . . . . . . . . 3.2 Point-cloud Reconstruction . . . . . . . . . . . . . . . . . 3.3 Volumetric data fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 MVS Mesh Refinement 4 Multi-view Stereo and Structure Priors 97 99 . . . . . . 105 Image Classification for Structure Priors . . . . . . . . . . 107 4.1 Departure from Depthmap to Planemap . . . . . . . . . . 4.2 Departure from Planes to Geometric Primitives 4.3 2
3 5 Software, Best Practices, and Successful Applications 114 5.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.2 Best practices for Image Acquisition . . . . . . . . . . . . 115 5.3 Successful Applications . . . . . . . . . . . . . . . . . . . 117 6 Limitations and Future Directions 123 6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.2 Open Problems . . . . . . . . . . . . . . . . . . . . . . . 126 . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.3 Conclusions Acknowledgements References 130 131
Abstract This tutorial presents a hands-on view of the field of multi-view stereo with a focus on practical algorithms. Multi-view stereo algorithms are able to construct highly detailed 3D models from images alone. They take a possibly very large set of images and construct a 3D plausible geometry that explains the images under some reasonable assumptions, the most important being scene rigidity. The tutorial frames the multi- view stereo problem as an image/geometry consistency optimization problem. It describes in detail its main two ingredients: robust im- plementations of photometric consistency measures, and efficient opti- mization algorithms. It then presents how these main ingredients are used by some of the most successful algorithms, applied into real appli- cations, and deployed as products in the industry. Finally it describes more advanced approaches exploiting domain-specific knowledge such as structural priors, and gives an overview of the remaining challenges and future research directions.
1 Introduction Reconstructing 3D geometry from photographs is a classic Computer Vision problem that has occupied researchers for more than 30 years. Its applications range from 3D mapping and navigation to online shopping, 3D printing, computational photography, computer video games, or cultural heritage archival. Only recently however have these techniques matured enough to exit the laboratory controlled environment into the wild, and provide industrial scale robustness, accuracy and scalability. Modeling the 3D geometry of real objects or scenes is a chal- lenging task that has seen a variety of tools and approaches ap- plied such as Computer Aided Design (CAD) tools [3], arm-mounted probes, active methods [110, 131, 11, 10] and passive image-based meth- ods [162, 165, 176]. Among all, passive image-based methods, the sub- ject of this tutorial, provide a fast way of capturing accurate 3D content at a fraction of the cost of other approaches. The steady increase of im- age resolution and quality has turned digital cameras into cheap and reliable high resolution sensors that can generate outstanding quality 3D content. The goal of an image-based 3D reconstruction algorithm can be de- scribed as ”given a set of photographs of an object or a scene, estimate 2
3 Figure 1.1: Image-based 3D reconstruction. Given a set of photographs (left), the goal of image-based 3D reconstruction algorithms is to estimate the most likely 3D shape that explains those photographs (right). the most likely 3D shape that explains those photographs, under the assumptions of known materials, viewpoints, and lighting conditions” (See Figure 1.1). The definition highlights the difficulty of the task, namely the assumption that materials, viewpoints, and lighting are known. If these are not known, the problem is generally ill-posed since multiple combinations of geometry, materials, viewpoints, and lighting can produce exactly the same photographs. As a result, without fur- ther assumptions, no single algorithm can correctly reconstruct the 3D geometry from photographs alone. However, under a set of reasonable extra assumptions, e.g. rigid Lambertian textured surfaces, state-of- the-art techniques can produce highly detailed reconstructions even from millions of photographs. There exist many cues that can be used to extract geometry from photographs: texture, defocus, shading, contours, and stereo correspon- dence. The latter three have been very successful, with stereo corre- spondence being the most successful in terms of robustness and the number of applications. Multi-view stereo (MVS) is the general term given to a group of techniques that use stereo correspondence as their main cue and use more than two images [165, 176]. All the MVS algorithms described in the following chapters assume the same input: a set of images and their corresponding camera param- eters. This chapter gives an overview of an MVS pipeline starting from
4 Introduction Figure 1.2: Example of a multi-view stereo pipeline. Clockwise: input imagery, posed imagery, reconstructed 3D geometry, textured 3D geometry. photographs alone. An important take-home message of this chapter is simple: An MVS algorithm is only as good as the quality of the input images and camera parameters. Moreover, a large part of the recent success of MVS is due to the success of the underlying Structure from Motion (SfM) algorithms that compute the camera parameters. Figure 1.2 provides a sketch of a generic MVS pipeline. Different applications may use different implementations of each of the main blocks, but the overall approach is always similar: • Collect images, • Compute camera parameters for each image, • Reconstruct the 3D geometry of the scene from the set of images and corresponding camera parameters. • Optionally reconstruct the materials of the scene.
1.1. Imagery collection 5 Figure 1.3: Different MVS capture setups. From left to right: a controlled MVS capture using diffuse lights and a turn table, outdoor capture of small-scale scenes, and crowd-sourcing from online photo-sharing websites. In the chapter we will give more insight into the first three main stages of MVS: imagery collection, camera parameters estimation, and 3D geometry reconstruction. Chapter 2 develops the notion of photo- consistency as the main signal being optimized by MVS algorithms. Chapter 3 presents and compares some of the most successful MVS al- gorithms. Chapter 4 discusses the use of domain knowledge, in particu- lar, structural priors in improving the reconstruction quality. Chapter 5 gives an overview of successful applications, available software, and best practices. Finally Chapter 6 describes some of the current limitations of MVS as well as research directions to solve them. 1.1 Imagery collection One can roughly classify MVS capture setups into three categories (See Figure 1.3): • Laboratory setting, • Outdoor small-scale scene capture, • Large-scale scene capture using fleets or crowd-sourcing, e.g., cars, planes, drones, and Internet. MVS algorithms first started in a laboratory setting [184, 147, 58], where the light conditions could be easily controlled and the camera
分享到:
收藏