Preface
Organization
Contents – Part I
Oral Session O1: Learning
Dual Generator Generative Adversarial Networks for Multi-domain Image-to-Image Translation
1 Introduction
2 Related Work
3 G2GAN: Dual Generator Generative Adversary Networks
3.1 Model Formulation
3.2 Model Optimization
3.3 Implementation Details
4 Experiments
4.1 Experimental Setup
4.2 Comparison with the State-of-the-Art on Different Tasks
4.3 Model Analysis
5 Conclusion
References
Pioneer Networks: Progressively Growing Generative Autoencoder
1 Introduction
2 Related Work
3 Pioneer Networks
3.1 Intuition
3.2 Encoder–Decoder Losses
3.3 Model and Training
4 Experiments
4.1 CelebA and CelebA-HQ
4.2 LSUN Bedrooms
4.3 Cifar-10
5 Discussion and Conclusion
References
Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously
1 Introduction
2 Related Work
2.1 Conditional GANs
2.2 Facial Attribute and Generation
3 Background
3.1 Connection Network
3.2 Selective Learning for Classification
4 Editable Generative Adversarial Networks
4.1 Formulation
5 Experiments
5.1 Qualitative Evaluation
5.2 Quantitative Evaluation
6 Conclusion
References
Cross Connected Network for Efficient Image Recognition
1 Introduction
2 Related Work
3 Cross Connected Network
3.1 Cross Connection
3.2 Pod Structure
3.3 Receptive Fields of Depthwise Convolutions
3.4 CrossNet Architecture
4 Implementation
5 Experiments
5.1 ImageNet-1k Dataset
5.2 MSCOCO Dataset
6 Conclusions
References
Answer Distillation for Visual Question Answering
1 Introduction
2 Related Work
3 Answer Distillation for Visual Question Answering
3.1 Overview
3.2 Answer Distillation
3.3 Answer Prediction
4 Experiments
4.1 Basic Configuration
4.2 Results
5 Conclusion
References
Spiral-Net with F1-Based Optimization for Image-Based Crack Detection
1 Introduction
1.1 Related Works
2 Spiral-Net
3 F1-Based Optimization
3.1 F1-Based Loss
3.2 F1-Guided Gradient Weighting
4 Experimental Results
4.1 Crack Dataset
4.2 Implementation Details
4.3 Performance Analysis on Spiral-Net
4.4 Comparison to Other Methods
5 Conclusion
References
Flex-Convolution
1 Introduction
2 Related Work
3 Method
3.1 Convolution Layer
3.2 Extending Sub-sampling to Irregular Data
4 Implementation
4.1 Neighborhood Processing
4.2 Efficient Implementation of Layer Primitives
4.3 Network Architecture for Large-Scale Semantic Segmentation
5 Experiments
5.1 Synthetic Data
5.2 Real-World Semantic Point Cloud Segmentation
6 Conclusion
References
Poster Session P1
Extreme Reverse Projection Learning for Zero-Shot Recognition
1 Introduction
2 Related Work
3 The Proposed Model
3.1 Problem Definition
3.2 Projection Learning
3.3 Model Formulation
3.4 Optimization
3.5 Discussion
4 Experiments
4.1 Datasets and Settings
4.2 Results Under Standard ZSL Setting
4.3 Results Under Pure ZSL Setting
4.4 Results Under Generalized ZSL Setting
4.5 Further Evaluations
5 Conclusion
References
A Defect Inspection Method for Machine Vision Using Defect Probability Image with Deep Convolutional Neural Network
1 Introduction
2 Machine Vision System
3 Related Work
4 Method
4.1 Defect Probability Image
4.2 Machine Vision and Deep Convolutional Neural Network Integration
5 Experiments
5.1 Database and Experimental Setup
5.2 Results and Discussion
6 Conclusion
References
3D Pick & Mix: Object Part Blending in Joint Shape and Image Manifolds
1 Introduction
2 Related Work
3 Overview
4 Methodology
4.1 Building Shape Manifolds for Object Parts
4.2 Building Shape Manifolds of Parts
4.3 Learning to Embed Images into the Shape Manifolds
4.4 Shape Blending Through Cross-Manifold Optimization
5 Results
5.1 Quantitative Results on Image-Based Shape Retrieval
5.2 Qualitative Results on Image-Based Shape Retrieval
6 Conclusions
References
Minutiae-Based Gender Estimation for Full and Partial Fingerprints of Arbitrary Size and Shape
1 Introduction
2 Related Work
3 Methodology
3.1 Proposed Concept
3.2 Feature Extraction and Model Building
3.3 Minutiae Ensemble Fusion
4 Experimental Setup
5 Results
6 Conclusion
References
Simultaneous Face Detection and Head Pose Estimation: A Fast and Unified Framework
1 Introduction
2 Related Work
2.1 Face Detection
2.2 Head Pose Estimation
3 Proposed Framework
3.1 Network Architecture
3.2 Pose Class
3.3 Training
3.4 Testing
4 Experimental Validations
4.1 Datasets and Evaluation Metric
4.2 Setting
4.3 Face Detection
4.4 Head Pose Estimation
4.5 Does Pose Class Help?
5 Conclusions
References
Progressive Feature Fusion Network for Realistic Image Dehazing
1 Introduction
2 Related Work
3 Progressive Feature Fusion Network (PFFNet) for Image Dehazing
4 Experiments
4.1 Dataset
4.2 Comparisons and Analysis
5 Conclusion
References
Semi-supervised Learning for Face Sketch Synthesis in the Wild
1 Introduction
2 Related Works
2.1 Exemplar Based Methods
2.2 Learning Based Methods
3 Semi-supervised Face Sketch Synthesis
3.1 Overview
3.2 Pseudo Sketch Feature Generator
3.3 Loss Functions
4 Implementation Details
4.1 Datasets
4.2 Patch Matching
4.3 Training Details
5 Evaluation on Public Benchmarks
5.1 Qualitative Comparison
5.2 Quantitative Comparison
6 Sketch Synthesis in the Wild
6.1 Qualitative Comparison
6.2 Effectiveness of Additional Training Photos
6.3 Mean Opinion Score Test
7 Conclusion
References
Evolvement Constrained Adversarial Learning for Video Style Transfer
1 Introduction
2 Related Works
3 Overview
4 The Evolve-Sync Loss
5 Video Style Transfer GAN (VST-GAN)
5.1 Architecture
5.2 Training
6 Experiments
6.1 Qualitative Comparison
6.2 Quantitative Comparison
7 Conclusion
References
An Unsupervised Deep Learning Framework via Integrated Optimization of Representation Learning and GMM-Based Modeling
1 Introduction
2 Related Work
3 The Proposed Approach
3.1 GMM
3.2 Representation Learning and GMM-Based Modeling
3.3 Network Structure
4 Experiments
4.1 Dataset
4.2 Implementation Details
4.3 Benchmarks
4.4 Performance
4.5 Discussion
5 Conclusion
References
Recovering Affine Features from Orientation- and Scale-Invariant Ones
1 Introduction
2 Theoretical Background
3 Recovering Affine Correspondences
3.1 Affine Transformation Model
3.2 Affine Correspondence from Epipolar Geometry
4 Experimental Results
4.1 Comparing Techniques to Estimate Affine Correspondences
4.2 Application: Homography Estimation
5 Conclusion
References
Totally Looks Like - How Humans Compare, Compared to Machines
1 Introduction
2 Related Work
3 Method
3.1 Dataset
3.2 Image Retrieval
4 Experiments
4.1 Feature Extraction
4.2 Matching Images
4.3 Human Experiments
5 Discussion
References
Generation of Virtual Dual Energy Images from Standard Single-Shot Radiographs Using Multi-scale and Conditional Adversarial Network
1 Introduction
1.1 Related Work
1.2 Contributions
2 Methods
2.1 Multi-scale and Conditional Adversarial Network (MCA-Net)
2.2 Bone suppression with Cross Projection Tensor
3 Data and Training Details
4 Experiments and Results
4.1 Algorithm Evaluation
4.2 Clinical Evaluation
5 Discussion
6 Conclusion
References
GD-GAN: Generative Adversarial Networks for Trajectory Prediction and Group Detection in Crowds
1 Introduction
2 Related Work
2.1 Human Behaviour Prediction
2.2 Group Detection
3 Architecture
3.1 Neighbourhood Modelling
3.2 Trajectory Prediction
3.3 Group Detection
4 Evaluation and Discussion
4.1 Implementation Details
4.2 Evaluation of the Trajectory Prediction
4.3 Evaluation of the Group Detection
4.4 Ablation Experiment
4.5 Time Efficiency
5 Conclusions
References
Multi-level Sequence GAN for Group Activity Recognition
1 Introduction
2 Related Work
3 Methodology
3.1 Action Codes
3.2 Semi-supervised GAN Architecture
3.3 GAN Objectives
4 Experiments
4.1 Datasets
4.2 Metrics
4.3 Network Architecture and Training
4.4 Results
4.5 Ablation Experiments
4.6 Time Efficiency
5 Conclusions
References
Spatio-Temporal Fusion Networks for Action Recognition
1 Introduction
2 Related Work
3 Approach
3.1 Spatio-Temporal Fusion Networks
3.2 STFN Components
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Evaluation of Different Designs
4.4 Evaluation of Fusion Operations
4.5 Evaluation of Fusion Directions
4.6 Evaluation of a Number of Segments
4.7 Base Performance of Two-Stream Network
4.8 Comparison with the State-of-the-art
5 Conclusion
References
Image2Mesh: A Learning Framework for Single Image 3D Reconstruction
1 Introduction
2 Related Work
3 Proposed Learning Framework
3.1 3D Mesh Embedding
3.2 Collecting Synthetic Data
3.3 Learning the Image Latent Space
3.4 Learning the Index to Select a Model
3.5 Learning the Shape Parameters
4 Experiments
4.1 Estimating the Image Latent Space
4.2 Selecting a 3D Mesh
4.3 Estimating the Shape Parameters
4.4 3D Reconstruction from a Single Image
4.5 Implementation Details and Limitations
5 Discussion
6 Conclusion
References
Face Completion with Semantic Knowledge and Collaborative Adversarial Learning
1 Introduction
2 Related Work
3 Collaborative Face Completion
3.1 Collaborative GAN
3.2 Reconstruction Loss
3.3 Inpainting Concentrated Generation
3.4 Network Architecture
4 Experimental Results
4.1 Datasets
4.2 Models
4.3 Face Completion
5 Conclusion
References
Water-Filling: An Efficient Algorithm for Digitized Document Shadow Removal
1 Introduction
2 Related Work
2.1 Classical Image Binarization Based Approaches
2.2 Parametric Reconstruction Based Approaches
2.3 Background Shading Estimation Based Approaches
3 Water-Filling Algorithm
3.1 Modeling by Diffusion Equation
4 Experiments and Performance Evaluation
4.1 Datasets
4.2 Methods for Comparison
4.3 Parameters Tuning
4.4 Results Analysis
5 Conclusions
References
Matchable Image Retrieval by Learning from Surface Reconstruction
1 Introduction
2 Related Works
3 The GL3D Benchmark Dataset
4 Method
4.1 Problem Formulation
4.2 Network Architecture
4.3 Fine-Grained Training Data Generation
4.4 Learning with Batched Hard Mining
4.5 Pre-matching Regional Code (PRC)
5 Experiments
5.1 Distinctiveness of Matchable Image Retrieval
5.2 Experiments for Matchable Image Retrieval
5.3 Integration of Matchable Image Retrieval with SfM
6 Conclusions
References
Thinking Outside the Box: Generation of Unconstrained 3D Room Layouts
1 Introduction
2 Related Work
3 Planar R-CNN
3.1 Network Architecture
3.2 Implementation
4 3D Room Layout Generation
4.1 Spatial Voting
5 Experimental Evaluation
5.1 Training Using SUN RGB-D
5.2 2D Room Layout Estimation with NYUDv2 303
5.3 Whole Room Layout Estimation Using ScanNet
5.4 Results
6 Conclusions
References
VIENA2: A Driving Anticipation Dataset
1 Introduction
2 VIENA2
2.1 Scenarios and Data Collection
2.2 Comparison to Other Datasets
3 Benchmark Algorithms
3.1 Baseline Methods
3.2 A New Multi-modal LSTM
3.3 Action Modeling
3.4 Implementation Details
4 Benchmark Evaluation and Analysis
4.1 Action Anticipation on VIENA2
4.2 Challenges of VIENA2
4.3 Benefits of VIENA2 for Anticipation from Real Images
4.4 Bias Analysis
5 Conclusion
References
Multilevel Collaborative Attention Network for Person Search
1 Introduction
2 Related Work
3 Multilevel Collaborative Attention Network
3.1 Multilevel Selective Learning
3.2 Collaborative Attention Learning
3.3 Online Hard Mined Random Sampling Softmax
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Comparison with State-of-the-art Methods
4.4 Ablation Study
5 Conclusions
References
Enhancing Perceptual Attributes with Bayesian Style Generation
1 Introduction
2 Related Works
3 Enhancing Perceptual Attributes with BAE
3.1 Arbitrary Style Transfer
3.2 Bayesian Attribute Enhancement
3.3 Implementation
4 Experimental Validation
4.1 Experimental Setup and Datasets
4.2 Results
5 Conclusions
References
Deep Convolutional Compressed Sensing for LiDAR Depth Completion
1 Introduction
2 Related Work
2.1 Compressed Sensing
2.2 Deep Learning
3 Preliminary
3.1 Compressed Sensing
3.2 Deep Component Analysis
4 Deep Convolutional Compressed Sensing
4.1 Inference
4.2 Learning
5 Experiments
5.1 Implementation Details
5.2 KITTI Depth Completion Benchmark
5.3 Effect of Amount of Training Data
5.4 Effect of Iterative Optimization
6 Conclusion
References
Learning for Video Super-Resolution Through HR Optical Flow Estimation
1 Introduction
2 Related Work
2.1 Single Image SR
2.2 Video SR
3 Network Architecture
3.1 Optical Flow Reconstruction Net (OFRnet)
3.2 Motion Compensation
3.3 Super-Resolution Net (SRnet)
3.4 Loss Function
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparisons to the State-of-the-Art
5 Conclusions and Future Work
References
Deep Multiple Instance Learning for Zero-Shot Image Tagging
1 Introduction
2 Related Work
3 Our Method
3.1 Problem Formulation
3.2 Network Architecture
4 Experiment
4.1 Setup
4.2 Tagging Performance
4.3 Zero Shot Recognition (ZSR)
4.4 Discussion
5 Conclusion
References
Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts
1 Introduction
2 Problem Description
3 Zero-Shot Detection
3.1 Model Architecture
3.2 Training and Inference
3.3 ZSD Without Pre-defined Unseen
4 Experiments
4.1 Dataset and Experiment Protocol
4.2 ZSD Performance
4.3 Zero Shot Recognition (ZSR)
4.4 Challenges and New Directions
5 Conclusion
References
Vision-Based Freezing of Gait Detection with Anatomic Patch Based Representation
1 Introduction
2 Related Works
2.1 Vision Based Parkinsonian Gait Analysis
2.2 Deep Learning Based Human Action Recognition
3 Proposed Method
3.1 Anatomic Joint Patches
3.2 Weakly-Supervised Learning for Patch Proposals
3.3 Patch and Global Feature Fusion
4 Experimental Results and Discussions
4.1 Dataset
4.2 Implementation Details
4.3 FoG Detection Results
4.4 Key Patch Localization
4.5 Comparison
5 Conclusion
References
Fast Video Shot Transition Localization with Deep Structured Models
1 Introduction
2 Related Work
3 Our Approach
3.1 An Overview
3.2 Initial Filtering
3.3 Cut Model
3.4 Gradual Model
4 ClipShots
5 Experiments
5.1 Databases and Evaluation Metrics
5.2 Experiments Configuration
5.3 Experiments on ClipShots
5.4 Experiments on TRECVID07
5.5 Experiments on RAI
6 Conclusion
References
Traversing Latent Space Using Decision Ferns
1 Introduction
2 Related Work
2.1 Learning Representations for Complex Inference Tasks
2.2 Decision Forests and Ferns
3 Background
3.1 Variational Autoencoders
3.2 Decision Forests and Ferns
4 Operating in Latent Space
4.1 Constructing a Latent Space
4.2 Traversing the Latent Space
4.3 Latent Space Traversal Network
5 Experiments
5.1 Imposing Spatial Transformation
5.2 Imposing Kinematics
5.3 Latent Space for Prediction
6 Conclusion
References
Parallel Convolutional Networks for Image Recognition via a Discriminator
1 Introduction
2 Related Work
3 D-PCN
3.1 Motivation
3.2 Architecture
3.3 Training Method
4 Experiments
4.1 Classification Results on CIFAR-100
4.2 Classification Results on ImageNet
4.3 Visualization
4.4 Segmentation on PASCAL VOC 2012
5 Conclusion
References
ENG: End-to-End Neural Geometry for Robust Depth and Pose Estimation Using CNNs
1 Introduction
2 Related Work
2.1 Single Image Depth Prediction
2.2 Optical Flow Prediction
2.3 Pose Estimation
3 Method
3.1 Network Architecture
3.2 Depth Prediction
3.3 Flow Prediction
3.4 Pose Estimation
3.5 Loss Functions
3.6 Training Regime
4 Results
4.1 Depth Estimation
4.2 Pose Estimation
4.3 Ablation Experiments
5 Conclusion and Further Work
References
Multi-scale Adaptive Structure Network for Human Pose Estimation from Color Images
1 Introduction
2 Related Works
3 Method
3.1 Adaptive Heatmaps
3.2 Limb Region
3.3 Multi-scale Adaptive Structure Supervision
4 Experiments
4.1 Implementation Details
4.2 Results
4.3 Ablation Study
4.4 Discussion
5 Conclusions
References
Full Explicit Consistency Constraints in Uncalibrated Multiple Homography Estimation
1 Introduction
2 Path to Constraints
3 Problem
4 Algebraic Prerequisites
4.1 The Characteristic Polynomial
4.2 A Double Root of the Characteristic Polynomial of a Cubic Polynomial
5 Full Constraints
6 Maximum Likelihood Estimation
7 Experiments
8 Conclusion
References
A New Method for Computing the Principal Point of an Optical Sensor by Means of Sphere Images
1 Introduction
2 How the Perspective Image of a Sphere Is Related to the Camera Intrinsics
3 The Locus of the Principal Point Determined by Two Sphere Images
4 Determination of the Principal Point by Three Sphere Images
5 Synthetic Experiments
6 Real Experiments
7 Conclusions
References
NightOwls: A Pedestrians at Night Dataset
1 Introduction
2 Related Work
3 Dataset
4 Experiments
5 Conclusion
References
Multi-view Consensus CNN for 3D Facial Landmark Placement
1 Introduction
2 Methods
2.1 Multi-view Rendering
2.2 Geometric Derivatives
2.3 Network Architecture and Loss Function
2.4 Landmark Detection and Consensus Estimation
3 Data
4 Results and Discussion
5 Conclusion
References
Author Index