logo资料库

ECCV2016会议论文全集.pdf

第1页 / 共902页
第2页 / 共902页
第3页 / 共902页
第4页 / 共902页
第5页 / 共902页
第6页 / 共902页
第7页 / 共902页
第8页 / 共902页
资料共902页,剩余部分请下载后查看
Foreword
Preface
Organization
Contents -- Part IV
Poster Session 4 (Continued)
Generating Visual Explanations
1 Introduction
2 Related Work
3 Visual Explanation Model
3.1 Relevance Loss
3.2 Discriminative Loss
4 Experimental Setup
5 Results
5.1 Quantitative Results
5.2 Qualitative Results
6 Conclusion
References
Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps
1 Introduction
2 Literature Review
3 Proposed Method
3.1 Overview
3.2 Height-Map Generation
3.3 2D Joints Localization
3.4 3D Motion Estimation
4 Experiments
4.1 Datasets
4.2 Evaluation of 2D Joints Localization
4.3 Evaluation of 3D Motion Recovery with Ground-Truth 2D Joints
4.4 Evaluation of 3D Motion Recovery with Predicted 2D Joints
5 Conclusion
References
Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons
1 Introduction
2 Related Work
3 Preliminaries
3.1 Tensor Notations
3.2 Kernel Linearization
4 Proposed Approach
4.1 Problem Formulation
4.2 Sequence Compatibility Kernel
4.3 Dynamics Compatibility Kernel
5 Computational Complexity
6 Experiments
6.1 Datasets
6.2 Experimental Setup
7 Conclusions
References
Manhattan-World Urban Reconstruction from Point Clouds
1 Introduction
2 Related Work
3 Overview
4 Candidate Box Generation
4.1 Plane Extraction
4.2 Candidate Boxes
5 Box Selection
5.1 Objectives
5.2 Optimization
6 Results and Discussion
7 Conclusions
References
From Multiview Image Curves to 3D Drawings
1 Introduction
2 Enhanced 3D Curve Sketch
3 From 3D Curve Sketch to 3D Drawing
4 Experiments and Evaluation
5 Conclusion
References
Shape from Selfies: Human Body Shape Estimation Using CCA Regression Forests
1 Introduction
2 Related Work
3 Shape Estimation Algorithm
3.1 Method Overview
3.2 Shape as a Geometric Model
3.3 Feature Extraction
3.4 View Direction Classification
3.5 Learning Shape Parameters
4 Validation and Results
5 Discussion and Conclusions
References
Can We Jointly Register and Reconstruct Creased Surfaces by Shape-from-Template Accurately?
1 Introduction
2 Background
2.1 Deformation Models and Priors in SfT
2.2 Data Constraints in SfT
2.3 Modelling Creases in Other Problem Domains and Previous Attempts in SfT
2.4 Contributions
3 Problem Formulation
3.1 Template Definition
3.2 Global Cost Function
4 Optimization
4.1 Overview
4.2 Improving Convergence
5 Experimental Results
5.1 Ground Truth Acquisition
5.2 Implementation Details and Evaluation Metrics
5.3 Results
6 Conclusion
References
Distractor-Supported Single Target Tracking in Extremely Cluttered Scenes
1 Introduction
2 Related Work
3 Proposed Distractor-Supported Single-Target Tracking Method
3.1 Robust Estimation with Coarse-to-fine Multi-level Clustering
3.2 Global Dynamic Constraint in a Feedback Loop
4 Experimental Results
4.1 Experiment on Highly Cluttered Dataset
4.2 Experiment on Non-cluttered Dataset
5 Conclusions
References
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
1 Introduction
2 Related Work
3 Ordering Constrained Video Action Labeling
3.1 Extended Connectionist Temporal Classification
3.2 ECTC Forward-Backward Algorithm
4 Extension to Frame-Level Semi-supervised Learning
5 Experiments
5.1 Implementation Details
5.2 Evaluating Complex Activity Segmentation
5.3 Evaluating Action Detection
6 Conclusions
References
Deep Joint Image Filtering
1 Introduction
2 Related Work
3 Learning Deep Joint Image Filters
3.1 Network Architecture Design
3.2 Relationship to Prior Work
4 Experimental Results
4.1 Depth Map Upsampling
4.2 Joint Image Upsampling
4.3 Structure-Texture Separation
4.4 Cross-Modality Filtering for Noise Reduction
5 Discussions
6 Conclusions
References
Efficient Multi-frequency Phase Unwrapping Using Kernel Density Estimation
1 Introduction
1.1 Related Work
1.2 Structure
2 Depth Decoding
2.1 Phase Unwrapping
2.2 CRT Based Unwrapping
2.3 Phase Fusion
3 Kernel Density Based Unwrapping
3.1 Unwrapping Likelihood
3.2 Multiple Hypotheses
3.3 Phase Likelihood
3.4 Hypothesis Selection
3.5 Spatial Selection Versus Smoothing
4 Experiments
4.1 Implementation
4.2 Ground Truth for Unwrapping
4.3 Datasets
4.4 Comparison of Noise Propagation Models
4.5 Outlier Rejection
4.6 Parameter Settings
4.7 Coverage Experiments
4.8 Kinect Fusion
5 Concluding Remarks
References
A Multi-scale CNN for Affordance Segmentation in RGB Images
1 Introduction
2 Prior Work
3 Generation of Affordance Ground Truth
4 Affordance Segmentation with a Multi-scale CNN
5 Training
6 Results
7 Conclusion
References
Hierarchical Dynamic Parsing and Encoding for Action Recognition
1 Introduction
2 Related Work
3 Hierarchical Dynamic Parsing and Encoding
3.1 Unsupervised Temporal Clustering
3.2 The First Layer Modeling
3.3 The Second Layer Modeling
4 Experiments
4.1 Datasets
4.2 Experimental Setup
4.3 Influence of Parameters
4.4 Comparison of Pooling in the First Layer
4.5 Comparison with State-of-the-Art
5 Conclusions
References
Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation
1 Introduction
2 Related Work
2.1 CNN-Based Fully-Supervised Semantic Segmentation
2.2 CNN-Based Weakly-Supervised Segmentation
2.3 Gradient-Based Region Estimation with Back-Propagation
3 Methods
3.1 Overview
3.2 Training CNN
3.3 Class Saliency Maps
3.4 Fully Connected CRF
4 Experiments
4.1 Dataset
4.2 Experimental Setup
4.3 Evaluation on Class Saliency Maps
4.4 Effects of Parameter Choices
4.5 Comparison with Other Methods
5 Conclusions
References
A Diagram is Worth a Dozen Images
1 Introduction
2 Background
3 The Language of Diagrams
4 Syntactic Diagram Parsing
5 Semantic Interpretation
6 Dataset
7 Experiments
7.1 Generating Constituent Proposals
7.2 Generating Relationship Proposals
7.3 Syntactic Parsing: DPG Inference
7.4 Diagram Question Answering
8 Conclusion
References
Automatic Attribute Discovery with Neural Activations
1 Introduction
2 Related Work
3 Datasets and Pre-processing
3.1 Etsy Dataset
3.2 Wear Dataset
4 Attribute Discovery
4.1 Divergence of Neural Activations
4.2 Visualness
4.3 Human Perception
4.4 Experimental Results
5 Understanding Perceptual Depth
6 Saliency Detection
7 Conclusion
References
``What Happens If...'' Learning to Predict the Effect of Forces in Images
1 Introduction
2 Related Work
3 Problem Statement
4 Forces in Scenes (ForScene) Dataset
5 Model
5.1 Model Architecture
5.2 Training
5.3 Testing
6 Experiments
6.1 Dataset Details
6.2 Force Representation
6.3 Network and Optimization Parameters
6.4 Prediction of Velocity Sequences
6.5 Unseen Categories
7 Conclusion
References
View Synthesis by Appearance Flow
1 Introduction
2 Related Work
3 Approach
3.1 Learning View Synthesis via Appearance Flow
3.2 Learning to Leverage Multiple Input Views
4 Experiments
4.1 Novel View Synthesis for Objects
4.2 Novel View Synthesis for Scenes
5 Discussion
References
Top-Down Learning for Structured Labeling with Convolutional Pseudoprior
1 Introduction
2 Related Work
3 Formulations
4 Experiments
4.1 Sequential Labeling: 1-D Case
4.2 Image Semantic Labeling: 2-D Case
5 Conclusions
References
Generative Image Modeling Using Style and Structure Adversarial Networks
1 Introduction
2 Related Work
3 Background for Generative Adversarial Networks
4 Style and Structure GAN
4.1 Structure-GAN
4.2 Style-GAN
4.3 Multi-task Learning with Pixel-Wise Constraints
4.4 Joint Learning for S2-GAN
5 Experiments
5.1 Qualitative Results for Image Generation
5.2 Quantitative Results for Image Generation
5.3 Representation Learning for Recognition Tasks
6 Conclusion
References
Joint Learning of Semantic and Latent Attributes
1 Introduction
2 Related Work
3 Methodology
3.1 Formulation
3.2 Optimisation
3.3 Application to Person Re-ID
3.4 Application to User-Defined Attribute Prediction
4 Experiments
4.1 Person Re-ID
4.2 User-Defined Attribute Prediction
4.3 Zero-Shot Learning
4.4 Further Evaluations
5 Conclusions
References
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
1 Introduction
2 Related Work
3 Multi-scale Object Proposal Network
3.1 Multi-scale Detection
3.2 Architecture
3.3 Sampling
3.4 Implementation Details
4 Object Detection Network
4.1 CNN Feature Map Approximation
4.2 Context Embedding
4.3 Implementation Details
5 Experimental Evaluation
5.1 Proposal Evaluation
5.2 Object Detection Evaluation
6 Conclusions
References
Deep Specialized Network for Illuminant Estimation
1 Introduction
2 Related Work
3 Illuminant Estimation by Convolutional Network
3.1 Hypothesis Network - a Branch-Level Ensemble Network
3.2 Selection Network - A Hypothesis Selection Network
3.3 Local to Global Estimation
4 Experiments
4.1 Global-Illuminant Setting
4.2 Multi-illuminant Setting
5 Conclusion
References
Weakly-Supervised Semantic Segmentation Using Motion Cues
1 Introduction
2 Related Work
3 Learning Semantic Segmentation from Video
3.1 Network Architecture
3.2 Estimating Latent Variables with Label Prediction
3.3 Fine-Tuning M-CNN
4 Results and Evaluation
4.1 Experimental Protocol
4.2 Implementation Details
4.3 Evaluation of M-CNN
4.4 Training on Weakly-Annotated Videos and Images
4.5 Co-localization
5 Summary
References
Human-in-the-Loop Person Re-identification
1 Introduction
2 Human-in-the-Loop Incremental Learning
2.1 Problem Formulation
2.2 Modelling Human Feedback as a Loss Function
2.3 Real-Time Model Update for Instant Feedback Reward
3 Metric Ensemble Learning for Automated Re-id
4 Experiments
4.1 Evaluation on Human-in-the-Loop Person Re-id
4.2 Evaluation on Automated Person Re-Id
5 Conclusions
References
Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects
1 Introduction
1.1 Related Work
1.2 Motivation
1.3 Contributions
2 Method
2.1 Pixel-Wise Posterior Object Segmentation
2.2 Level-Set Pose Embedding
2.3 Iterative Pose Optimization
2.4 Initialization
3 Implementation
3.1 Rendering Engine
3.2 Image Processing
3.3 Occlusion Handling
4 Evaluation
4.1 Performance Analysis
4.2 Experimental Comparison
5 Conclusions
References
Estimation of Human Body Shape in Motion with Wide Clothing
1 Introduction
2 Related Work
3 S-SCAPE Model
4 Estimating Model Parameters for a Motion Sequence
4.1 Prior Model for
4.2 Landmark Energy
4.3 Data Energy
4.4 Clothing Energy
4.5 Optimization Schedule
5 Implementation Details
6 Evaluation
6.1 Dataset
6.2 Evaluation of Posture and Shape Fitting
6.3 Comparative Evaluation
7 Conclusion
References
A Shape-Based Approach for Salient Object Detection Using Deep Learning
1 Introduction
2 Related Work
3 Proposed Method
3.1 Saliency Representation
3.2 Convolutional Neural Networks for Shape Prediction
3.3 Refinement of Saliency Maps Using Hierarchical Segmentations
4 Experimental Results
4.1 Experimental Settings
4.2 Experimental Results
5 Conclusions
References
Fast Optical Flow Using Dense Inverse Search
1 Introduction
1.1 Related Work
1.2 Contributions
2 Proposed Method
2.1 Fast Inverse Search for Correspondences
2.2 Fast Optical Flow with Multi-scale Reasoning
2.3 Fast Variational Refinement
2.4 Extensions
3 Experiments
3.1 Implementation and Parameter Selection
3.2 Evaluation of Inverse Search
3.3 MPI Sintel Optical Flow Results
3.4 KITTI Optical Flow Results
3.5 High Frame-Rate Optical Flow
4 Conclusions
References
Global Registration of 3D Point Sets via LRS Decomposition
1 Introduction
2 Low-Rank and Sparse Decomposition
3 Problem Definition
4 Proposed Approach
5 Experiments
5.1 Simulated Data
5.2 Real Data
6 Conclusions
References
Recognition from Hand Cameras: A Revisit with Deep Learning
1 Introduction
2 Related Work
2.1 Egocentric Recognition
2.2 Hand Detection and Pose Estimation
2.3 Camera for Hands
3 Our System
3.1 Wearable Cues
3.2 Hand Alignment
3.3 Hand States Recognition
3.4 State Change Detection
3.5 Full Model
3.6 Deep Feature
3.7 Object Discovery
3.8 Combining HandCam with HeadCam
4 Dataset
5 Implementation Details
6 Experiment Results
6.1 Free Vs. Active Recognition
6.2 Gesture Recognition
6.3 Object Category Recognition
6.4 Combining HandCam with HeadCam
7 Conclusion
References
Learning
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
1 Introduction
2 Related Work
3 Binary Convolutional Neural Network
3.1 Binary-Weight-Networks
3.2 XNOR-Networks
4 Experiments
4.1 Efficiency Analysis
4.2 Image Classification
4.3 Ablation Studies
5 Conclusion
References
Top-Down Neural Attention by Excitation Backprop
1 Introduction
2 Related Work
3 Method
3.1 Top-Down Neural Attention Based on Probabilistic WTA
3.2 Excitation Backprop
3.3 Contrastive Top-Down Attention
4 Experiments
4.1 The Pointing Game
4.2 Localizing Dominant Objects
4.3 Text-to-Region Association
5 Conclusion
References
Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network
1 Introduction
2 Related Work
3 Recursive Filter via RNNs
3.1 Preliminaries of Recursive Filters
3.2 Recursive Decomposition
3.3 Constructing Recursive Filter via Linear RNNs
4 Learning Spatially Variant Recursive Filters
4.1 Spatially Variant LRNN
4.2 Learning LRNN Weight Maps via CNN
5 Experimental Results
5.1 Edge-Preserving Smoothing
5.2 Image Denoising
5.3 Image Propagation Examples
6 Conclusion
References
Learning Representations for Automatic Colorization
1 Introduction
2 Related Work
3 Method
3.1 Color Spaces
3.2 Loss
3.3 Inference
3.4 Histogram Transfer from Ground-Truth
3.5 Neural Network Architecture and Training
4 Experiments
4.1 Representation Learning
5 Conclusion
References
Poster Session 5
Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation
1 Introduction
2 Related Work
3 Deep Reconstruction-Classification Networks
4 Experiments and Results
4.1 Experiment I: SVHN, MNIST, USPS, CIFAR, and STL
4.2 Experiments II: Office Dataset
5 Analysis
6 Conclusions
References
Learning Without Forgetting
1 Introduction
2 Related Work
3 Learning Without Forgetting
4 Experiments
4.1 Main Experiments
4.2 Design Choices and Alternatives
5 Discussion
References
Identity Mappings in Deep Residual Networks
1 Introduction
2 Analysis of Deep Residual Networks
3 On the Importance of Identity Skip Connections
3.1 Experiments on Skip Connections
3.2 Discussions
4 On the Usage of Activation Functions
4.1 Experiments on Activation
4.2 Analysis
5 Results
6 Conclusions
References
Deep Networks with Stochastic Depth
1 Introduction
2 Background
3 Deep Networks with Stochastic Depth
4 Results
5 Analytic Experiments
6 Conclusion
References
Less Is More: Towards Compact CNNs
1 Introduction
2 Related Work
3 Sparse Constrained Convolutional Neural Networks
3.1 Training a Sparse Constrained CNN
3.2 Forward-Backward Splitting
4 Sparse Constraints
4.1 Tensor Low Rank Constraints
4.2 Group Sparse Constraints
5 Importance of Rectified Linear Units in Sparse Constrained CNNs
6 Experiments
6.1 LeNet on MNIST
6.2 CIFAR-10 Quick on CIFAR-10
6.3 AlexNet and VGG on ImageNet
7 Conclusion
References
Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints
1 Introduction
2 Related Work
3 Overview
4 Unsupervised Constraint Mining
4.1 Positive Constraint Mining
4.2 Negative Constraint Mining
5 Visual Representation Learning
5.1 Unsupervised Feature Learning
5.2 Semi-supervised Learning
6 Experiments
6.1 Implementation Details
6.2 Datasets and Evaluation Metrics
6.3 Controlled Experiments
6.4 Unsupervised Learning Results
6.5 Semi-supervised Learning Results
7 Conclusions
References
Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation
1 Introduction
2 Related Work
3 Weakly Supervised Segmentation from Image-Level Labels
3.1 The SEC Loss for Weakly Supervised Image Segmentation
3.2 Training
4 Experiments
4.1 Experimental Setup
4.2 Results
4.3 Detailed Discussion
5 Conclusion
References
Patch-Based Low-Rank Matrix Completion for Learning of Shape and Motion Models from Few Training Samples
1 Introduction
1.1 Related Work
2 Methods
2.1 Low-Rank Matrix Completion of Ill-Conditioned matrices
2.2 Patch-Based Model Generation
2.3 Patch Selection and Domain Partitioning
3 Experiments and Applications
3.1 2D Contour Data of the IMM Face Database
3.2 3D Lung Surfaces of the LIDC Database
3.3 Respiratory Lung Motion
4 Results
5 Discussion and Conclusion
References
Chained Predictions Using Convolutional Neural Networks
1 Introduction
2 Related Work
3 Chain Models for Structured Tasks
3.1 Chain Models for Single Images
3.2 Chain Models for Videos
3.3 Improved Learning with Scheduled Sampling
4 Experimental Evaluation
4.1 Pose Estimation from a Single Image
4.2 Pose Estimation from Videos
5 Conclusions
References
Multi-region Two-Stream R-CNN for Action Detection
1 Introduction
2 Related Work
3 End-to-end Two-Stream Faster R-CNN
4 Multi-region Two-Stream Faster R-CNN
5 Linking and Temporal Localization
6 Experiments
6.1 Datasets and Evaluation Metrics
6.2 Implementation Details
6.3 Evaluation of Multi-region Two-Stream Faster R-CNN
6.4 Comparison to the State of the Art
7 Conclusion
References
Semantic Co-segmentation in Videos
1 Introduction
2 Related Work
3 Proposed Algorithm
3.1 Overview
3.2 Semantic Tracklet Generation
3.3 Semantic Tracklet Co-selection via Submodular Function
4 Experimental Results
4.1 Experimental Settings
4.2 Youtube-Objects Dataset
4.3 MOViCS Dataset
4.4 Safari Dataset
5 Concluding Remarks
References
Attribute2Image: Conditional Image Generation from Visual Attributes
1 Introduction
2 Related Work
3 Attribute-Conditioned Generative Modeling of Images
3.1 Base Model: Conditional Variational Auto-Encoder (CVAE)
3.2 Disentangling CVAE with a Layered Representation
4 Posterior Inference via Optimization
5 Experiments
5.1 Attribute-Conditioned Image Generation
5.2 Attribute-Conditioned Image Reconstruction and Completion
6 Conclusion
References
Modeling Context Between Objects for Referring Expression Understanding
1 Introduction
2 Related Work
3 Modeling Context Between Objects
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Comparison of Different Techniques
4.4 Ablation Experiments
5 Conclusions
References
Friction from Reflectance: Deep Reflectance Codes for Predicting Physical Surface Properties from One-Shot In-Field Reflectance
1 Introduction
2 Related Work
3 One-Shot In-Field Reflectance Disks
4 Deep Reflectance Codes
5 Friction from Reflectance
5.1 Friction-Reflectance Database
5.2 Hashing for Friction Prediction
6 Experimental Results
6.1 Hashing for Material Recognition
6.2 Friction Prediction
7 Conclusions
References
Saliency Detection with Recurrent Fully Convolutional Networks
1 Introduction
2 Related Work
3 Saliency Prediction by Recurrent Networks
3.1 Fully Convolutional Networks for Saliency Detection
3.2 Recurrent Network for Saliency Detection
3.3 Training RFCN for Saliency Detection
3.4 Post-processing
4 Experiments
4.1 Experimental Setup
4.2 Performance Comparison with State-of-the-art
4.3 Ablation Studies
5 Conclusions
References
Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks
1 Introduction
2 Related Work
3 Method
3.1 Model Architecture
3.2 Reconstruction with Selection Layer
3.3 Scaling up to Full Resolution
4 Dataset
5 Experiments
5.1 Implementation Details
5.2 Comparison Algorithms
5.3 Results
5.4 Algorithm Analysis
6 Conclusions
References
Temporal Model Adaptation for Person Re-identification
1 Introduction
2 Relation to Existing Work
3 Temporal Model Adaptation for Re-identification
3.1 Preliminaries
3.2 Low-Rank Sparse Similarity-Dissimilarity Learning
3.3 Model Adaptation with Reduced Human Effort
3.4 Discussion
4 Experimental Results
4.1 State-of-the-art Comparisons
4.2 Influence of the Temporal Model Adaptation Components
4.3 Computational Complexity
5 Conclusion
References
Author Index
Bastian Leibe Jiri Matas Nicu Sebe Max Welling (Eds.) 8 0 9 9 S C N L Computer Vision – ECCV 2016 14th European Conference Amsterdam, The Netherlands, October 11–14, 2016 Proceedings, Part IV 1 2 3
Lecture Notes in Computer Science 9908 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7412
Bastian Leibe Jiri Matas Nicu Sebe Max Welling (Eds.) Computer Vision – ECCV 2016 14th European Conference Amsterdam, The Netherlands, October 11–14, 2016 Proceedings, Part IV 123
Editors Bastian Leibe RWTH Aachen Aachen Germany Jiri Matas Czech Technical University Prague 2 Czech Republic Nicu Sebe University of Trento Povo - Trento Italy Max Welling University of Amsterdam Amsterdam The Netherlands ISSN 0302-9743 Lecture Notes in Computer Science ISBN 978-3-319-46492-3 DOI 10.1007/978-3-319-46493-0 ISSN 1611-3349 (electronic) ISBN 978-3-319-46493-0 (eBook) Library of Congress Control Number: 2016951693 LNCS Sublibrary: SL6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics © Springer International Publishing AG 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword Welcome to the proceedings of the 2016 edition of the European Conference on Computer Vision held in Amsterdam! It is safe to say that the European Conference on Computer Vision is one of the top conferences in computer vision. It is good to reiterate the history of the conference to see the broad base the conference has built in its 13 editions. First held in 1990 in Antibes (France), it was followed by subsequent con- ferences in Santa Margherita Ligure (Italy) in 1992, Stockholm (Sweden) in 1994, Cambridge (UK) in 1996, Freiburg (Germany) in 1998, Dublin (Ireland) in 2000, Copenhagen (Denmark) in 2002, Prague (Czech Republic) in 2004, Graz (Austria) in 2006, Marseille (France) in 2008, Heraklion (Greece) in 2010, Florence (Italy) in 2012, and Zürich (Switzerland) in 2014. For the 14th edition, many people worked hard to provide attendees with a most warm welcome while enjoying the best science. The Program Committee, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, did an excellent job. Apart from the scientific program, the workshops were selected and handled by Hervé Jégou and Gang Hua, and the tutorials by Jacob Verbeek and Rita Cucchiara. Thanks for the great job. The coordination with the subsequent ACM Multimedia offered an opportunity to expand the tutorials with an additional invited session, offered by the University of Amsterdam and organized together with the help of ACM Multimedia. Of the many people who worked hard as local organizers, we would like to single out Martine de Wit of the UvA Conference Office, who delicately and efficiently organized the main body. Also the local organizers Hamdi Dibeklioglu, Efstratios Gavves, Jan van Gemert, Thomas Mensink, and Mihir Jain had their hands full. As a venue, we chose the Royal Theatre Carré located on the canals of the Amstel River in downtown Amsterdam. Space in Amsterdam is sparse, so it was a little tighter than usual. The university lent us their downtown campuses for the tutorials and the workshops. A relatively new thing was the industry and the sponsors for which Ronald Poppe and Peter de With did a great job, while Andy Bagdanov and John Schavemaker arranged the demos. Michael Wilkinson took care to make Yom Kippur as comfortable as possible for those for whom it is an important day. We thank Marc Pollefeys, Alberto del Bimbo, and Virginie Mes for their advice and help behind the scenes. We thank all the anonymous volunteers for their hard and precise work. We also thank our generous sponsors. Their support is an essential part of the program. It is good to see such a level of industrial interest in what our community is doing! Amsterdam does not need any introduction. Please emerge yourself but do not drown in it, have a nice time. October 2016 Theo Gevers Arnold Smeulders
Preface Welcome to the proceedings of the 2016 European Conference on Computer Vision (ECCV 2016) held in Amsterdam, The Netherlands. We are delighted to present this volume reflecting a strong and exciting program, the result of an extensive review process. In total, we received 1,561 paper submissions. Of these, 81 violated the ECCV submission guidelines or did not pass the plagiarism test and were rejected without review. We employed the iThenticate software (www.ithenticate.com) for plagiarism detection. Of the remaining papers, 415 were accepted (26.6 %): 342 as posters (22.6 %), 45 as spotlights (2.9 %), and 28 as oral presentations (1.8 %). The spotlights – short, five- minute podium presentations – are novel to ECCV and were introduced after their success at the CVPR 2016 conference. All orals and spotlights are presented as posters as well. The selection process was a combined effort of four program co-chairs (PCs), 74 area chairs (ACs), 1,086 Program Committee members, and 77 additional reviewers. As PCs, we were primarily responsible for the design and execution of the review process. Beyond administrative rejections, we were involved in acceptance decisions only in the very few cases where the ACs were not able to agree on a decision. PCs, as is customary in the field, were not allowed to co-author a submission. General co-chairs and other co-organizers played no role in the review process, were permitted to submit papers, and were treated as any other author. Acceptance decisions were made by two independent ACs. There were 74 ACs, selected by the PCs according to their technical expertise, experience, and geographical diversity (41 from European, five from Asian, two from Australian, and 26 from North American institutions). The ACs were aided by 1,086 Program Committee members to whom papers were assigned for reviewing. There were 77 additional reviewers, each supervised by a Program Committee member. The Program Committee was selected from committees of previous ECCV, ICCV, and CVPR conferences and was extended on the basis of suggestions from the ACs and the PCs. Having a large pool of Program Committee members for reviewing allowed us to match expertise while bounding reviewer loads. Typically five papers, but never more than eight, were assigned to a Program Committee member. Graduate students had a maximum of four papers to review. The ECCV 2016 review process was in principle double-blind. Authors did not know reviewer identities, nor the ACs handling their paper(s). However, anonymity becomes difficult to maintain as more and more submissions appear concurrently on arXiv.org. This was not against the ECCV 2016 double submission rules, which followed the practice of other major computer vision conferences in the recent past. The existence of arXiv publications, mostly not peer-reviewed, raises difficult problems with the assessment of unpublished, concurrent, and prior art, content overlap, plagiarism, and self-plagiarism. Moreover, it undermines the anonymity of submissions. We found that not all cases can be covered by a simple set of rules. Almost all controversies during the review process were related to the arXiv issue. Most of the reviewer inquiries were
VIII Preface resolved by giving the benefit of the doubt to ECCV authors. However, the problem will have to be discussed by the community so that consensus is found on how to handle the issues brought by publishing on arXiv. Particular attention was paid to handling conflicts of interest. Conflicts of interest between ACs, Program Committee members, and papers were identified based on the authorship of ECCV 2016 submissions, on the home institutions, and on previous col- laborations of all researchers involved. To find institutional conflicts, all authors, Program Committee members, and ACs were asked to list the Internet domains of their current institutions. To find collaborators, the Researcher.cc database (http://researcher.cc/), funded by the Computer Vision Foundation, was used to find any co-authored papers in the period 2012–2016. We pre-assigned approximately 100 papers to each AC, based on affinity scores from the Toronto Paper Matching System. ACs then bid on these, indi- cating their level of expertise. Based on these bids, and conflicts of interest, approxi- mately 40 papers were assigned to each AC. The ACs then suggested seven reviewers from the pool of Program Committee members for each paper, in ranked order, from which three were chosen automatically by CMT (Microsofts Academic Conference Management Service), taking load balancing and conflicts of interest into account. The initial reviewing period was five weeks long, after which reviewers provided reviews with preliminary recommendations. With the generous help of several last- minute reviewers, each paper received three reviews. Submissions with all three reviews suggesting rejection were independently checked by two ACs and if they agreed, the manuscript was rejected at this stage (“early rejects”). In total, 334 manuscripts (22.5 %) were early-rejected, reducing the average AC load to about 30. Authors of the remaining submissions were then given the opportunity to rebut the reviews, primarily to identify factual errors. Following this, reviewers and ACs dis- cussed papers at length, after which reviewers finalized their reviews and gave a final recommendation to the ACs. Each manuscript was evaluated independently by two ACs who were not aware of each others, identities. In most of the cases, after extensive discussions, the two ACs arrived at a common decision, which was always adhered to by the PCs. In the very few borderline cases where an agreement was not reached, the PCs acted as tie-breakers. Owing to the rapid expansion of the field, which led to an unexpectedly large increase in the number of submissions, the size of the venue became a limiting factor and a hard upper bound on the number of accepted papers had to be imposed. We were able to increase the limit by replacing one oral session by a poster session. Nevertheless, this forced the PCs to reject some borderline papers that could otherwise have been accepted. We want to thank everyone involved in making the ECCV 2016 possible. First and foremost, the success of ECCV 2016 depended on the quality of papers submitted by the authors, and on the very hard work of the ACs, the Program Committee members, and the additional reviewers. We are particularly grateful to Rene Vidal for his continuous support and sharing experience from organizing ICCV 2015, to Laurent Charlin for the use of the Toronto Paper Matching System, to Ari Kobren for the use of the Researcher.cc tools, to the Computer Vision Foundation (CVF) for facilitating the use of the iThenticate plagiarism detection software, and to Gloria Zen and Radu-Laurentiu Vieriu for setting up CMT and managing the various tools involved. We also owe a debt of gratitude for the support of the Amsterdam local organizers, especially Hamdi Dibeklioglu for keeping the
分享到:
收藏