KBQA: Learning Question Answering over
QA Corpora and Knowledge Bases
Wanyun Cui@FUDAN, Yanghua Xiao*@FUDAN,
Haixun Wang@Facebook, Yangqiu Song@HKUST,
Seung-won Hwang@Yonsei , Wei Wang@FUDAN
kw.fudan.edu.cn/qa
Backgrounds
• Question Answering (QA) systems answer natural language
questions.
IBM Watson
Google Now
Apple Siri
Amazon Alexa Microsof Cortana
kw.fudan.edu.cn/qa
Why QA
• QA application:
• One of the most natural human-computer
interaction
• Key components of Chatbot, which attracts
wide research interests from industries
• QA for AI:
• One of most important tasks to evaluate the
machine intelligence: Turing test
• Important testbed of many AI techniques,
such as machine learning, natural
language processing, machine cognition
kw.fudan.edu.cn/qa
Turing test
Why KBQA?
More and More Knowledge bases are created
• Google Knowledge graph, Yago,WordNet, FreeBase, Probase, NELL, CYC, DBPedia
• Large scale, clean data
The boost of knowledge bases
A piece of knowledge base, which consist of
triples such as (d, population, 390k)
kw.fudan.edu.cn/qa
How KB-based QA works?
• Convert natural language questions into structured queries over
knowledge bases.
How many people live in Honolulu?
SPARQL
Select ?number
Where {
Res:Honolulu
dbo:population ?num
}
SQL
Select value
From KB
Where subject=‘d’ and
predicate=‘population’
• Key: predicate inference
kw.fudan.edu.cn/qa
Two challenges for predicate inference
• Question Representation
• Identify questions with the same semantics
• Distinguish questions with different intents
• Semantic matching
• Map the question representation to the predicate in the KB
• Vocabulary gap
kw.fudan.edu.cn/qa
Weakness of previous solutions
• Template/rule based approaches
• Neural network based approaches
• Questions are strings
• Represent questions by string based
templates, such as regular expression
• Questions are numeric
• Represent questions by numeric
embeddings
• By human labeling
• By learning from corpus
• PROs:
• User-controllable
• Applicable to industry use
• CONs:
• Costly human efforts.
• Not good at handling the diversity of
questions.
• PROs:
• Feasible to understand diverse
questions
• CONs:
• Poor interpretability
• Not controllable. Unfriendly to industrial
application.
How to retain advantages from both approaches?
kw.fudan.edu.cn/qa
Our approach
• Representation: concept based templates.
• Questions are asking about entities
• Interpretable
• User-controllable
• Learn templates from QA corpus, instead of manfully construction.
• 27 million templates, 2782 intents
• Understand diverse questions
kw.fudan.edu.cn/qa