Foreword
Preface
Organization
Contents -- Part IV
Poster Session 4 (Continued)
Generating Visual Explanations
1 Introduction
2 Related Work
3 Visual Explanation Model
3.1 Relevance Loss
3.2 Discriminative Loss
4 Experimental Setup
5 Results
5.1 Quantitative Results
5.2 Qualitative Results
6 Conclusion
References
Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps
1 Introduction
2 Literature Review
3 Proposed Method
3.1 Overview
3.2 Height-Map Generation
3.3 2D Joints Localization
3.4 3D Motion Estimation
4 Experiments
4.1 Datasets
4.2 Evaluation of 2D Joints Localization
4.3 Evaluation of 3D Motion Recovery with Ground-Truth 2D Joints
4.4 Evaluation of 3D Motion Recovery with Predicted 2D Joints
5 Conclusion
References
Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons
1 Introduction
2 Related Work
3 Preliminaries
3.1 Tensor Notations
3.2 Kernel Linearization
4 Proposed Approach
4.1 Problem Formulation
4.2 Sequence Compatibility Kernel
4.3 Dynamics Compatibility Kernel
5 Computational Complexity
6 Experiments
6.1 Datasets
6.2 Experimental Setup
7 Conclusions
References
Manhattan-World Urban Reconstruction from Point Clouds
1 Introduction
2 Related Work
3 Overview
4 Candidate Box Generation
4.1 Plane Extraction
4.2 Candidate Boxes
5 Box Selection
5.1 Objectives
5.2 Optimization
6 Results and Discussion
7 Conclusions
References
From Multiview Image Curves to 3D Drawings
1 Introduction
2 Enhanced 3D Curve Sketch
3 From 3D Curve Sketch to 3D Drawing
4 Experiments and Evaluation
5 Conclusion
References
Shape from Selfies: Human Body Shape Estimation Using CCA Regression Forests
1 Introduction
2 Related Work
3 Shape Estimation Algorithm
3.1 Method Overview
3.2 Shape as a Geometric Model
3.3 Feature Extraction
3.4 View Direction Classification
3.5 Learning Shape Parameters
4 Validation and Results
5 Discussion and Conclusions
References
Can We Jointly Register and Reconstruct Creased Surfaces by Shape-from-Template Accurately?
1 Introduction
2 Background
2.1 Deformation Models and Priors in SfT
2.2 Data Constraints in SfT
2.3 Modelling Creases in Other Problem Domains and Previous Attempts in SfT
2.4 Contributions
3 Problem Formulation
3.1 Template Definition
3.2 Global Cost Function
4 Optimization
4.1 Overview
4.2 Improving Convergence
5 Experimental Results
5.1 Ground Truth Acquisition
5.2 Implementation Details and Evaluation Metrics
5.3 Results
6 Conclusion
References
Distractor-Supported Single Target Tracking in Extremely Cluttered Scenes
1 Introduction
2 Related Work
3 Proposed Distractor-Supported Single-Target Tracking Method
3.1 Robust Estimation with Coarse-to-fine Multi-level Clustering
3.2 Global Dynamic Constraint in a Feedback Loop
4 Experimental Results
4.1 Experiment on Highly Cluttered Dataset
4.2 Experiment on Non-cluttered Dataset
5 Conclusions
References
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
1 Introduction
2 Related Work
3 Ordering Constrained Video Action Labeling
3.1 Extended Connectionist Temporal Classification
3.2 ECTC Forward-Backward Algorithm
4 Extension to Frame-Level Semi-supervised Learning
5 Experiments
5.1 Implementation Details
5.2 Evaluating Complex Activity Segmentation
5.3 Evaluating Action Detection
6 Conclusions
References
Deep Joint Image Filtering
1 Introduction
2 Related Work
3 Learning Deep Joint Image Filters
3.1 Network Architecture Design
3.2 Relationship to Prior Work
4 Experimental Results
4.1 Depth Map Upsampling
4.2 Joint Image Upsampling
4.3 Structure-Texture Separation
4.4 Cross-Modality Filtering for Noise Reduction
5 Discussions
6 Conclusions
References
Efficient Multi-frequency Phase Unwrapping Using Kernel Density Estimation
1 Introduction
1.1 Related Work
1.2 Structure
2 Depth Decoding
2.1 Phase Unwrapping
2.2 CRT Based Unwrapping
2.3 Phase Fusion
3 Kernel Density Based Unwrapping
3.1 Unwrapping Likelihood
3.2 Multiple Hypotheses
3.3 Phase Likelihood
3.4 Hypothesis Selection
3.5 Spatial Selection Versus Smoothing
4 Experiments
4.1 Implementation
4.2 Ground Truth for Unwrapping
4.3 Datasets
4.4 Comparison of Noise Propagation Models
4.5 Outlier Rejection
4.6 Parameter Settings
4.7 Coverage Experiments
4.8 Kinect Fusion
5 Concluding Remarks
References
A Multi-scale CNN for Affordance Segmentation in RGB Images
1 Introduction
2 Prior Work
3 Generation of Affordance Ground Truth
4 Affordance Segmentation with a Multi-scale CNN
5 Training
6 Results
7 Conclusion
References
Hierarchical Dynamic Parsing and Encoding for Action Recognition
1 Introduction
2 Related Work
3 Hierarchical Dynamic Parsing and Encoding
3.1 Unsupervised Temporal Clustering
3.2 The First Layer Modeling
3.3 The Second Layer Modeling
4 Experiments
4.1 Datasets
4.2 Experimental Setup
4.3 Influence of Parameters
4.4 Comparison of Pooling in the First Layer
4.5 Comparison with State-of-the-Art
5 Conclusions
References
Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation
1 Introduction
2 Related Work
2.1 CNN-Based Fully-Supervised Semantic Segmentation
2.2 CNN-Based Weakly-Supervised Segmentation
2.3 Gradient-Based Region Estimation with Back-Propagation
3 Methods
3.1 Overview
3.2 Training CNN
3.3 Class Saliency Maps
3.4 Fully Connected CRF
4 Experiments
4.1 Dataset
4.2 Experimental Setup
4.3 Evaluation on Class Saliency Maps
4.4 Effects of Parameter Choices
4.5 Comparison with Other Methods
5 Conclusions
References
A Diagram is Worth a Dozen Images
1 Introduction
2 Background
3 The Language of Diagrams
4 Syntactic Diagram Parsing
5 Semantic Interpretation
6 Dataset
7 Experiments
7.1 Generating Constituent Proposals
7.2 Generating Relationship Proposals
7.3 Syntactic Parsing: DPG Inference
7.4 Diagram Question Answering
8 Conclusion
References
Automatic Attribute Discovery with Neural Activations
1 Introduction
2 Related Work
3 Datasets and Pre-processing
3.1 Etsy Dataset
3.2 Wear Dataset
4 Attribute Discovery
4.1 Divergence of Neural Activations
4.2 Visualness
4.3 Human Perception
4.4 Experimental Results
5 Understanding Perceptual Depth
6 Saliency Detection
7 Conclusion
References
``What Happens If...'' Learning to Predict the Effect of Forces in Images
1 Introduction
2 Related Work
3 Problem Statement
4 Forces in Scenes (ForScene) Dataset
5 Model
5.1 Model Architecture
5.2 Training
5.3 Testing
6 Experiments
6.1 Dataset Details
6.2 Force Representation
6.3 Network and Optimization Parameters
6.4 Prediction of Velocity Sequences
6.5 Unseen Categories
7 Conclusion
References
View Synthesis by Appearance Flow
1 Introduction
2 Related Work
3 Approach
3.1 Learning View Synthesis via Appearance Flow
3.2 Learning to Leverage Multiple Input Views
4 Experiments
4.1 Novel View Synthesis for Objects
4.2 Novel View Synthesis for Scenes
5 Discussion
References
Top-Down Learning for Structured Labeling with Convolutional Pseudoprior
1 Introduction
2 Related Work
3 Formulations
4 Experiments
4.1 Sequential Labeling: 1-D Case
4.2 Image Semantic Labeling: 2-D Case
5 Conclusions
References
Generative Image Modeling Using Style and Structure Adversarial Networks
1 Introduction
2 Related Work
3 Background for Generative Adversarial Networks
4 Style and Structure GAN
4.1 Structure-GAN
4.2 Style-GAN
4.3 Multi-task Learning with Pixel-Wise Constraints
4.4 Joint Learning for S2-GAN
5 Experiments
5.1 Qualitative Results for Image Generation
5.2 Quantitative Results for Image Generation
5.3 Representation Learning for Recognition Tasks
6 Conclusion
References
Joint Learning of Semantic and Latent Attributes
1 Introduction
2 Related Work
3 Methodology
3.1 Formulation
3.2 Optimisation
3.3 Application to Person Re-ID
3.4 Application to User-Defined Attribute Prediction
4 Experiments
4.1 Person Re-ID
4.2 User-Defined Attribute Prediction
4.3 Zero-Shot Learning
4.4 Further Evaluations
5 Conclusions
References
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
1 Introduction
2 Related Work
3 Multi-scale Object Proposal Network
3.1 Multi-scale Detection
3.2 Architecture
3.3 Sampling
3.4 Implementation Details
4 Object Detection Network
4.1 CNN Feature Map Approximation
4.2 Context Embedding
4.3 Implementation Details
5 Experimental Evaluation
5.1 Proposal Evaluation
5.2 Object Detection Evaluation
6 Conclusions
References
Deep Specialized Network for Illuminant Estimation
1 Introduction
2 Related Work
3 Illuminant Estimation by Convolutional Network
3.1 Hypothesis Network - a Branch-Level Ensemble Network
3.2 Selection Network - A Hypothesis Selection Network
3.3 Local to Global Estimation
4 Experiments
4.1 Global-Illuminant Setting
4.2 Multi-illuminant Setting
5 Conclusion
References
Weakly-Supervised Semantic Segmentation Using Motion Cues
1 Introduction
2 Related Work
3 Learning Semantic Segmentation from Video
3.1 Network Architecture
3.2 Estimating Latent Variables with Label Prediction
3.3 Fine-Tuning M-CNN
4 Results and Evaluation
4.1 Experimental Protocol
4.2 Implementation Details
4.3 Evaluation of M-CNN
4.4 Training on Weakly-Annotated Videos and Images
4.5 Co-localization
5 Summary
References
Human-in-the-Loop Person Re-identification
1 Introduction
2 Human-in-the-Loop Incremental Learning
2.1 Problem Formulation
2.2 Modelling Human Feedback as a Loss Function
2.3 Real-Time Model Update for Instant Feedback Reward
3 Metric Ensemble Learning for Automated Re-id
4 Experiments
4.1 Evaluation on Human-in-the-Loop Person Re-id
4.2 Evaluation on Automated Person Re-Id
5 Conclusions
References
Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects
1 Introduction
1.1 Related Work
1.2 Motivation
1.3 Contributions
2 Method
2.1 Pixel-Wise Posterior Object Segmentation
2.2 Level-Set Pose Embedding
2.3 Iterative Pose Optimization
2.4 Initialization
3 Implementation
3.1 Rendering Engine
3.2 Image Processing
3.3 Occlusion Handling
4 Evaluation
4.1 Performance Analysis
4.2 Experimental Comparison
5 Conclusions
References
Estimation of Human Body Shape in Motion with Wide Clothing
1 Introduction
2 Related Work
3 S-SCAPE Model
4 Estimating Model Parameters for a Motion Sequence
4.1 Prior Model for
4.2 Landmark Energy
4.3 Data Energy
4.4 Clothing Energy
4.5 Optimization Schedule
5 Implementation Details
6 Evaluation
6.1 Dataset
6.2 Evaluation of Posture and Shape Fitting
6.3 Comparative Evaluation
7 Conclusion
References
A Shape-Based Approach for Salient Object Detection Using Deep Learning
1 Introduction
2 Related Work
3 Proposed Method
3.1 Saliency Representation
3.2 Convolutional Neural Networks for Shape Prediction
3.3 Refinement of Saliency Maps Using Hierarchical Segmentations
4 Experimental Results
4.1 Experimental Settings
4.2 Experimental Results
5 Conclusions
References
Fast Optical Flow Using Dense Inverse Search
1 Introduction
1.1 Related Work
1.2 Contributions
2 Proposed Method
2.1 Fast Inverse Search for Correspondences
2.2 Fast Optical Flow with Multi-scale Reasoning
2.3 Fast Variational Refinement
2.4 Extensions
3 Experiments
3.1 Implementation and Parameter Selection
3.2 Evaluation of Inverse Search
3.3 MPI Sintel Optical Flow Results
3.4 KITTI Optical Flow Results
3.5 High Frame-Rate Optical Flow
4 Conclusions
References
Global Registration of 3D Point Sets via LRS Decomposition
1 Introduction
2 Low-Rank and Sparse Decomposition
3 Problem Definition
4 Proposed Approach
5 Experiments
5.1 Simulated Data
5.2 Real Data
6 Conclusions
References
Recognition from Hand Cameras: A Revisit with Deep Learning
1 Introduction
2 Related Work
2.1 Egocentric Recognition
2.2 Hand Detection and Pose Estimation
2.3 Camera for Hands
3 Our System
3.1 Wearable Cues
3.2 Hand Alignment
3.3 Hand States Recognition
3.4 State Change Detection
3.5 Full Model
3.6 Deep Feature
3.7 Object Discovery
3.8 Combining HandCam with HeadCam
4 Dataset
5 Implementation Details
6 Experiment Results
6.1 Free Vs. Active Recognition
6.2 Gesture Recognition
6.3 Object Category Recognition
6.4 Combining HandCam with HeadCam
7 Conclusion
References
Learning
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
1 Introduction
2 Related Work
3 Binary Convolutional Neural Network
3.1 Binary-Weight-Networks
3.2 XNOR-Networks
4 Experiments
4.1 Efficiency Analysis
4.2 Image Classification
4.3 Ablation Studies
5 Conclusion
References
Top-Down Neural Attention by Excitation Backprop
1 Introduction
2 Related Work
3 Method
3.1 Top-Down Neural Attention Based on Probabilistic WTA
3.2 Excitation Backprop
3.3 Contrastive Top-Down Attention
4 Experiments
4.1 The Pointing Game
4.2 Localizing Dominant Objects
4.3 Text-to-Region Association
5 Conclusion
References
Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network
1 Introduction
2 Related Work
3 Recursive Filter via RNNs
3.1 Preliminaries of Recursive Filters
3.2 Recursive Decomposition
3.3 Constructing Recursive Filter via Linear RNNs
4 Learning Spatially Variant Recursive Filters
4.1 Spatially Variant LRNN
4.2 Learning LRNN Weight Maps via CNN
5 Experimental Results
5.1 Edge-Preserving Smoothing
5.2 Image Denoising
5.3 Image Propagation Examples
6 Conclusion
References
Learning Representations for Automatic Colorization
1 Introduction
2 Related Work
3 Method
3.1 Color Spaces
3.2 Loss
3.3 Inference
3.4 Histogram Transfer from Ground-Truth
3.5 Neural Network Architecture and Training
4 Experiments
4.1 Representation Learning
5 Conclusion
References
Poster Session 5
Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation
1 Introduction
2 Related Work
3 Deep Reconstruction-Classification Networks
4 Experiments and Results
4.1 Experiment I: SVHN, MNIST, USPS, CIFAR, and STL
4.2 Experiments II: Office Dataset
5 Analysis
6 Conclusions
References
Learning Without Forgetting
1 Introduction
2 Related Work
3 Learning Without Forgetting
4 Experiments
4.1 Main Experiments
4.2 Design Choices and Alternatives
5 Discussion
References
Identity Mappings in Deep Residual Networks
1 Introduction
2 Analysis of Deep Residual Networks
3 On the Importance of Identity Skip Connections
3.1 Experiments on Skip Connections
3.2 Discussions
4 On the Usage of Activation Functions
4.1 Experiments on Activation
4.2 Analysis
5 Results
6 Conclusions
References
Deep Networks with Stochastic Depth
1 Introduction
2 Background
3 Deep Networks with Stochastic Depth
4 Results
5 Analytic Experiments
6 Conclusion
References
Less Is More: Towards Compact CNNs
1 Introduction
2 Related Work
3 Sparse Constrained Convolutional Neural Networks
3.1 Training a Sparse Constrained CNN
3.2 Forward-Backward Splitting
4 Sparse Constraints
4.1 Tensor Low Rank Constraints
4.2 Group Sparse Constraints
5 Importance of Rectified Linear Units in Sparse Constrained CNNs
6 Experiments
6.1 LeNet on MNIST
6.2 CIFAR-10 Quick on CIFAR-10
6.3 AlexNet and VGG on ImageNet
7 Conclusion
References
Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints
1 Introduction
2 Related Work
3 Overview
4 Unsupervised Constraint Mining
4.1 Positive Constraint Mining
4.2 Negative Constraint Mining
5 Visual Representation Learning
5.1 Unsupervised Feature Learning
5.2 Semi-supervised Learning
6 Experiments
6.1 Implementation Details
6.2 Datasets and Evaluation Metrics
6.3 Controlled Experiments
6.4 Unsupervised Learning Results
6.5 Semi-supervised Learning Results
7 Conclusions
References
Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation
1 Introduction
2 Related Work
3 Weakly Supervised Segmentation from Image-Level Labels
3.1 The SEC Loss for Weakly Supervised Image Segmentation
3.2 Training
4 Experiments
4.1 Experimental Setup
4.2 Results
4.3 Detailed Discussion
5 Conclusion
References
Patch-Based Low-Rank Matrix Completion for Learning of Shape and Motion Models from Few Training Samples
1 Introduction
1.1 Related Work
2 Methods
2.1 Low-Rank Matrix Completion of Ill-Conditioned matrices
2.2 Patch-Based Model Generation
2.3 Patch Selection and Domain Partitioning
3 Experiments and Applications
3.1 2D Contour Data of the IMM Face Database
3.2 3D Lung Surfaces of the LIDC Database
3.3 Respiratory Lung Motion
4 Results
5 Discussion and Conclusion
References
Chained Predictions Using Convolutional Neural Networks
1 Introduction
2 Related Work
3 Chain Models for Structured Tasks
3.1 Chain Models for Single Images
3.2 Chain Models for Videos
3.3 Improved Learning with Scheduled Sampling
4 Experimental Evaluation
4.1 Pose Estimation from a Single Image
4.2 Pose Estimation from Videos
5 Conclusions
References
Multi-region Two-Stream R-CNN for Action Detection
1 Introduction
2 Related Work
3 End-to-end Two-Stream Faster R-CNN
4 Multi-region Two-Stream Faster R-CNN
5 Linking and Temporal Localization
6 Experiments
6.1 Datasets and Evaluation Metrics
6.2 Implementation Details
6.3 Evaluation of Multi-region Two-Stream Faster R-CNN
6.4 Comparison to the State of the Art
7 Conclusion
References
Semantic Co-segmentation in Videos
1 Introduction
2 Related Work
3 Proposed Algorithm
3.1 Overview
3.2 Semantic Tracklet Generation
3.3 Semantic Tracklet Co-selection via Submodular Function
4 Experimental Results
4.1 Experimental Settings
4.2 Youtube-Objects Dataset
4.3 MOViCS Dataset
4.4 Safari Dataset
5 Concluding Remarks
References
Attribute2Image: Conditional Image Generation from Visual Attributes
1 Introduction
2 Related Work
3 Attribute-Conditioned Generative Modeling of Images
3.1 Base Model: Conditional Variational Auto-Encoder (CVAE)
3.2 Disentangling CVAE with a Layered Representation
4 Posterior Inference via Optimization
5 Experiments
5.1 Attribute-Conditioned Image Generation
5.2 Attribute-Conditioned Image Reconstruction and Completion
6 Conclusion
References
Modeling Context Between Objects for Referring Expression Understanding
1 Introduction
2 Related Work
3 Modeling Context Between Objects
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Comparison of Different Techniques
4.4 Ablation Experiments
5 Conclusions
References
Friction from Reflectance: Deep Reflectance Codes for Predicting Physical Surface Properties from One-Shot In-Field Reflectance
1 Introduction
2 Related Work
3 One-Shot In-Field Reflectance Disks
4 Deep Reflectance Codes
5 Friction from Reflectance
5.1 Friction-Reflectance Database
5.2 Hashing for Friction Prediction
6 Experimental Results
6.1 Hashing for Material Recognition
6.2 Friction Prediction
7 Conclusions
References
Saliency Detection with Recurrent Fully Convolutional Networks
1 Introduction
2 Related Work
3 Saliency Prediction by Recurrent Networks
3.1 Fully Convolutional Networks for Saliency Detection
3.2 Recurrent Network for Saliency Detection
3.3 Training RFCN for Saliency Detection
3.4 Post-processing
4 Experiments
4.1 Experimental Setup
4.2 Performance Comparison with State-of-the-art
4.3 Ablation Studies
5 Conclusions
References
Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks
1 Introduction
2 Related Work
3 Method
3.1 Model Architecture
3.2 Reconstruction with Selection Layer
3.3 Scaling up to Full Resolution
4 Dataset
5 Experiments
5.1 Implementation Details
5.2 Comparison Algorithms
5.3 Results
5.4 Algorithm Analysis
6 Conclusions
References
Temporal Model Adaptation for Person Re-identification
1 Introduction
2 Relation to Existing Work
3 Temporal Model Adaptation for Re-identification
3.1 Preliminaries
3.2 Low-Rank Sparse Similarity-Dissimilarity Learning
3.3 Model Adaptation with Reduced Human Effort
3.4 Discussion
4 Experimental Results
4.1 State-of-the-art Comparisons
4.2 Influence of the Temporal Model Adaptation Components
4.3 Computational Complexity
5 Conclusion
References
Author Index