A YEAR IN COMPUTER VISION
DEFINITION OF COMPUTER VISION
The automatic extraction, analysis and understanding of useful
information from a single image or a sequence of images.
— BMVA (The British Machine Vision Association)
CLASSIFICATION / LOCALISATION
¡ Classification: Assign a label to the
whole image
¡ Localisation: Output a bounding box
around the object in the image.
¡ Single object in the image.
IMAGENET LARGE SCALE VISUAL RECOGNITION CHALLENGE
(ILSVRC)
• Classification Error and Localisation error are 0.023 and 0.062 in ILSVRC2017.
INTERESTING TAKEAWAYS FROM ILSVRC 2016
• Scene Classification: Label an image with certain scene like “greenhouse”, “stadium”, etc.
• Hikvision won scene classification with 9% top-5 error with an ensemble of deep Inception-style networks and
not-so-deep residuals networks.
• Trimps-Soushen use ensemble for classification, including Inception, Inception-Resnet, RestNet and Wide
Residual Networks. For Localisation, Faster R-CNN is used.
• ResNeXt extends the original ResNet architecture.
INTERESTING TAKEAWAYS FROM ILSVRC 2017
• WMW: Squeeze-and-Excitation (SE) Building Block
INTERESTING TAKEAWAYS FROM ILSVRC 2017
• NUS-Qihoo-DPNs: Dual Path
Network, ResNet + DenseNet
• The residual path implicitly
reuses features, but it is not good
at exploring new features. In
contrast the densely connected
network keeps exploring new
features but suffers from higher
redundancy.
OBJECT DETECTION
¡ Object Detection: Outputting bounding boxes and labels for each individual objects in an image.
¡ One of the persistent issues in Object Detection to be the detection of small objects.