Huazhong University of Science & Technology
Deep Neural Networks for
Scene Text Reading
Xiang Bai
Huazhong University of Science and Technology
Problem definitions
p Definitions
End-to-end
recognition
Scene text
detection
Scene text
recognition
Summary Booklet
Predicting the presence of text
and localizing each instance
(if any), usually at word or
line level, in natural scenes
Converting text regions into
computer readable and
editable symbols
Xiang Bai, Kyoto, November 15
Outline
Ø Background
Ø Scene Text Detection
Ø Scene Text Recognition
Ø Applications
Ø Future Trends
Xiang Bai, Kyoto, November 15
Background
Document
image
VS
Scene text
image
p Scattered and sparse
p Multi-oriented
p Multi-lingual
Xiang Bai, Kyoto, November 15
Background
Scene text detection methods before 2016
Proposals
Filtering
Regression
• Generate candidates
using hand-craft features
• Text / non-text classification
using CNN/Random forest
• Refine locations
using CNN
[1] Jaderberg et al. Deep features for text spotting. ECCV, 2014.
[2] Jaderberg et al. Reading text in the wild with convolutional neural networks. IJCV, 2016.
[3] Huang et al. Robust scene text detection with convolution neural network induced mser trees. ECCV, 2014.
[4] Zhang et al. Symmetry-based text line detection in natural scenes. CVPPR, 2015.
[5] LGómez, D Karatzas. Textproposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recognition
70, 60-74
Xiang Bai, Kyoto, November 15
Background
Scene text detectionmethods after 2016
Segmentation-based method[1]
Proposal-based method[2]
Hybrid method[3]
[1] Zhang Z, et al. Multi-oriented text detection with fully convolutional networks. CVPR, 2016.
[2] Gupta A, et al. Synthetic data for text localisation in natural images. CVPR, 2016.
[3] He W, et al. Deep Direct Regression for Multi-Oriented Scene Text Detection. ICCV, 2017
[4] Liao et al. TextBoxes: A fast text detector with a single deep neural network. AAAI, 2017.
Xiang Bai, Kyoto, November 15
Background
Scene text recognition methods
Word
Classifier
#words
apple
ball
coffee
.
.
.
yellow
zoo
(
)
Char
Classifier
.
.
.
.
.
.
a
f
y
z
Sequence feature
Extractor
RNN + CTC
Word/Char Level[1]
l Multi-class classification
with one class per word/char
Sequence Level[2][3][4]
l Text is a sequence of chars
l The whole sequence is recognized
-
a
-
n
n
d
-
“and”
[1] M. Jaderberg et al. Reading text in the wild with convolutional neural networks. IJCV, 2016.
[2] B. Su et al. Accurate scene text recognition based on recurrent neural network. ACCV, 2014.
[3] He et al. Reading Scene Text in Deep Convolutional Sequences. AAAI, 2016.
[4] Shi B et al. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text
recognition. TPAMI, 2017.
Xiang Bai, Kyoto, November 15
Background
Recent Trend
Statistics of related papers published in 2017 top conferences
Conference
Detection
Recognition
End-to-end
recognition
AAAI-17
IJCAI-17
NIPS-17
ICCV-17
CVPR-17
ICDAR-17
TOTAL
0
0
0
5
3
8
16
0
1
1
1
0
2
5
2
0
0
2
0
1
5
p Over 80% text detection papers focus on multi-oriented text detection .
p Scene text recognition and end-to-end recognition are paid less attention to.
p Most papers focus on English text.
Xiang Bai, Kyoto, November 15