logo资料库

数据驱动的大规模知识图谱构建方法.pdf

第1页 / 共39页
第2页 / 共39页
第3页 / 共39页
第4页 / 共39页
第5页 / 共39页
第6页 / 共39页
第7页 / 共39页
第8页 / 共39页
资料共39页,剩余部分请下载后查看
Data Driven Approaches for Large-scale Knowledge Graph Construction Yanghua Xiao Fudan University Kowledge Works at Fudan (kw.fudan.edu.cn)
Knowledge Graph • Knowledge graph is a large scale semantic network consisting of entities/concepts as well as the semantic relationships among them • Higher coverage over entities and concept • Richer semantic relationships • Usually organized as RDF • Quality insurance by Crowdsourcing • Why Knowledge Graphs? • Understanding the semantic of text needs background • A robot brain needs knowledge base to understand the knowledge world • Yago,WordNet, FreeBase, Probase, NELL, CYC, DBPedia….
Data Driven vs Hand Crafted • Manually constructed knowledge graph • Examples: WordNet, Cyc • Size: Small • Quality: Almost perfect (Huge human cost) (Each relation is checked by expects) • Auto-constructed knowledge graph • Automatically extracted from huge web corpus • Examples: Probase、WikiTaxonomy, etc • Size: Huge • Quality: Good (The accuracy can’t reach 100%) • Because of the huge size, there are many wrong facts (From huge corpus)
Pipeline of KG construction Extraction • End-to-end • Domain specific Cost: Costly Human Efforts Quality: Wrong data Correction • Graph structure based correction Quality: Missing data Completion • Collaborative filtering based completion • Transitivity inference based completion
Pipeline of KG construction Extraction • End-to-end • Domain specific Cost: Costly Human Efforts Quality: Wrong data Correction • Graph structure based correction Completion • Collaborative filtering based completion • Transitivity inference based completion Quality: Missing data Jiaqing Liang, Yanghua Xiao, et a, Probase+: Inferring Missing Links in Conceptual Taxonomies, to be published in TKDE 2017
Probase • A web-scale taxonomy derived from web pages by Hearst linguistic patterns • “…famous basketball players such as Michael Jordan …” • domestic animals such as cats and dogs ... • Chinais a developing country. • Life is a box of chocolate. • 10M concepts, and 16M isA relations Hearst pattern NP such as NP, NP, ..., and|or NP such NP as NP,* or|and NP NP, NP*, or other NP NP, NP*, and other NP NP, including NP,* or | and NP NP, especially NP,* or|and NP
Missing isA relationships in Probase • “car” and “automobile” are synonyms • They should share hypernyms • “automobile” should beA “wheelbase vehicle” • Missing isA relaiton hurts the understanding the concepts of entities • Is Lincoln zephyr a car?
Solution idea: CF based Missing isA inference • User-based collaborative filtering! • Hypernyms • Concepts • Synonyms or Siblings --- Items --- Users --- Similar users • Concepts with similar meanings tend to share hypernyms/hyponyms in an isA taxonomy • To find missing hypernyms for a concept c • First find c’s synonyms and siblings • Then we transport their hypernyms to c Idea: if most similar terms of c have h as the hypernym, c is likely to have the hypernym h.
分享到:
收藏