logo资料库

研究知识图谱必读Exploiting Linked Data and Knowledge Graphs in Large Org....pdf

第1页 / 共281页
第2页 / 共281页
第3页 / 共281页
第4页 / 共281页
第5页 / 共281页
第6页 / 共281页
第7页 / 共281页
第8页 / 共281页
资料共281页,剩余部分请下载后查看
Foreword
Preface
Contents
Contributors
1 Enterprise Knowledge Graph: An Introduction
1.1 A Brief History of Knowledge Graph
1.1.1 The Arrival of Semantic Networks
1.1.2 From Semantic Networks to Linked Data
1.1.3 Knowledge Graphs: An Entity-Centric View of Linked Data
1.2 Knowledge Graph Technologies in a Nutshell
1.3 Applications of Knowledge Graphs for Enterprise
1.4 How to Read This Book
1.4.1 Structure of This Book
1.4.2 Who This Book Is For
1.4.3 How to Use This Book
Part I Knowledge Graph Foundations & Architecture
2 Knowledge Graph Foundations
2.1 Knowledge Representation and Query Languages
2.1.1 RDF and RDFS
2.1.2 OWL
2.1.3 SPARQL
2.2 Ontologies and Vocabularies
2.2.1 Some Standard Vocabularies
2.2.2 schema.org
2.3 Data Lifting Standards
2.3.1 RDB2RDF
2.3.2 GRDDL
2.4 Knowledge Graph Versus Linked Data
2.5 Knowledge Graph for Web Searching and Knowledge Graph for Enterprise
3 Knowledge Architecture for Organisations
3.1 Architecture Overview
3.2 Acquisition and Integration Layer
3.2.1 Ontology Development
3.2.2 Ontologisation of Non-Ontological Resources
3.2.3 Text Integration via Named Entity and Thematic Scope Resolution
3.2.4 Ontology Learning
3.3 Knowledge Storing and Accessing Layer
3.3.1 Ontology-Based Data Access
3.3.2 RDF Stores
3.3.3 Property Graph-Based Stores
3.3.4 Conclusion: Storing Knowledge Graphs Versus Relational Databases
3.4 Knowledge Consumption Layer
3.4.1 Semantic Search
3.4.2 Summarisation
3.4.3 Query Generation
3.4.4 Question Answering
3.4.5 Conclusion
Part II Constructing, Understanding and Consuming Knowledge Graphs
4 Construction of Enterprise Knowledge Graphs (I)
4.1 Knowledge Construction and Maintenance Lifecycle
4.2 Ontology Authoring: A Competency Question-Driven Approach
4.2.1 Competency Questions
4.2.2 Formulation of Competency Questions
4.2.3 Ontology Authoring Workflow
4.3 Semi-automated Linking of Enterprise Data for Virtual Knowledge Graphs
4.3.1 Virtual Knowledge Graph for Knowledge Discovery
4.3.2 Semantic Tagging and Data Interlinking
4.3.3 Usage Scenarios
4.3.4 Conclusion
5 Construction of Enterprise Knowledge Graphs (II)*
5.1 Scenario-Driven Named Entity and Thematic Scope Resolution of Unstructured Data*
5.1.1 Framework Description
5.1.2 Framework Application Evaluation
5.2 Open-World Schema Learning for Knowledge Graphs*
5.2.1 Motivation
5.2.2 BelNet+
5.2.3 TBox Learning as Inference
5.2.4 A Novel Evaluation Framework
5.2.5 Experiments
5.2.6 Experimental Results
5.2.7 Related Work and Summary
5.2.8 Conclusion
6 Understanding Knowledge Graphs
6.1 Understanding Things in KGs: The Summary of Individual Entities
6.1.1 Entity Data Organisation
6.1.2 Summarisation of Entity Data
6.1.3 Conclusion
6.2 Exploring KGs: The Summary of Entity Description Patterns
6.2.1 What Is the Entity Description Pattern?
6.2.2 How Can the Entity Description Pattern Help in Knowledge Exploitations?
6.2.3 Conclusion
6.3 Profiling KGs: A Goal-Driven Summarisation
6.3.1 Motivating Scenario and Problem Definition
6.3.2 Framework Description
6.3.3 Implementation
6.3.4 Application Example
6.3.5 Conclusion
6.4 Revealing Insights from KGs: A Query Generation Approach*
6.4.1 Candidate Insightful Queries
6.4.2 Query Generation Framework
6.4.3 Evaluation of the Query Generation Method
6.4.4 Conclusion and Future Work
7 Question Answering and Knowledge Graphs
7.1 Question Answering over Text Documents
7.1.1 Realising a QA System: Approaches and Key Steps
7.2 Question Answering over Knowledge Graphs
7.2.1 State-of-the-Art Approaches for Question Answering Over Knowledge Graphs
7.2.2 Question Answering in the Enterprise
7.3 Knowledge Graph and Watson DeepQA
7.3.1 What Is Watson DeepQA?
7.3.2 What Are the Knowledge Graphs Used in Watson DeepQA?
7.3.3 How Knowledge Graphs Are Used in Watson DeepQA?
7.3.4 Lessons Learnt from Watson DeepQA
7.4 Using Knowledge Graphs for Improving Textual Question Answering*
7.4.1 A Flexible QA Pipeline
7.4.2 Exploiting External Knowledge (Graphs) for Re-ranking
7.4.3 Evaluation: Impact of Knowledge Graphs in Semantic Structures
7.4.4 Conclusion
Part III Industrial Applications and Successful Stories
8 Success Stories
8.1 A Knowledge Graph for Innovation in the Media Industry
8.1.1 The Business Problem
8.1.2 The HAVAS 18 Knowledge Graph
8.1.3 Value Proposition
8.1.4 Challenges
8.2 Applying Knowledge Graphs in Cultural Heritage
8.2.1 Digital Cultural Heritage and Linked Data
8.2.2 The Challenges
8.2.3 The CURIOS Project
8.2.4 Constructing the Knowledge Graph
8.2.5 CURIOS---A Linked Data Adaptor for Content Management Systems
8.2.6 Presenting and Visualising Cultural Heritage Knowledge Graphs
8.2.7 Collaborative Construction and Maintenance of Cultural Heritage Knowledge Graphs
8.3 Applying Knowledge Graphs in Healthcare
8.3.1 The Problem in Clinical Practice Guidelines
8.3.2 Preparing the Data and Building the Knowledge Graphs
8.3.3 Services Based on the Knowledge Graphs
8.3.4 Contributions to Healthcare Practice
9 Enterprise Knowledge Graph: Looking into the Future
9.1 Conclusion
9.2 Get Started with Knowledge Graphs
9.2.1 A Small but Powerful Knowledge Graph
9.2.2 Troubleshooting
9.2.3 Variations
9.3 What is Next: Experts' Predictions into the Future of Knowledge Graph
9.3.1 Future Visions
9.3.2 Foreseeable Obstacles
9.3.3 Suggestions on Next Steps
Appendix References
Index
Jeff Z. Pan · Guido Vetere Jose Manuel Gomez-Perez Honghan Wu Editors Exploiting Linked Data and Knowledge Graphs in Large Organizations
Exploiting Linked Data and Knowledge Graphs in Large Organizations
Jeff Z. Pan Guido Vetere Jose Manuel Gomez-Perez Honghan Wu Editors Exploiting Linked Data and Knowledge Graphs in Large Organizations 123
Editors Jeff Z. Pan University of Aberdeen Aberdeen UK Guido Vetere IBM Italia Rome Italy Jose Manuel Gomez-Perez iSOCO Lab Madrid Spain Honghan Wu University of Aberdeen Aberdeen UK ISBN 978-3-319-45652-2 DOI 10.1007/978-3-319-45654-6 ISBN 978-3-319-45654-6 (eBook) Library of Congress Control Number: 2016949103 © Springer International Publishing Switzerland 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. trademarks, service marks, etc. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword When I began my research career as a graduate student at Rensselaer Polytechnic Institute in 1989, the phrase “knowledge graph” was not in use. The use of graphs, however, as a notation for “knowledge representation” (KR) was quite common. CLASSIC, the first real implemented description logic, was just being introduced from Bell Labs, and although it had a linear syntax, the community was still in the habit of drawing graphs that depicted the knowledge that was being represented. This habit traced its history at least as far as M. Ross Quilian’s work on Semantic Networks, and subsequent researchers imagined knowledge to be intrinsic in the design of Artificial Intelligence (AI) systems, universally sketching the role of knowledge in a graphical form. By the late 1980s the community had more or less taken up the call for formalisation proposed by Bill Woods and later his student, Ron Brachman; graph formalisms were perhaps the central focus of AI at the time, and stayed that way for another decade. Despite this attention and focus, by the time I moved from academia to industrial research at IBM’s Watson Research Centre in 2002, the knowledge representation community had never really solved any problems other than our own. Knowledge representation and reasoning evolved, or perhaps devolved, into a form of mathe- matics, in which researchers posed difficult-to-solve puzzles that arose more from syntactic properties of various formalisms than consideration of anyone else’s actual use cases. Even though we tended to use the words, “semantic” and “knowledge”, there was nothing particularly semantic about any of it, and indeed the co-opting by the KR community of terms like semantics, ontology, episte- mology, etc. to refer to our largely algorithmic work, reliably confused the hell out of people who actually knew what those terms meant. In my 12-year career at IBM, I found myself shifting with the times as a revolution was happening in AI. Many researchers roundly rejected the assumptions of the KR field, finding the focus on computation rather than data to be problematic. A new generation of data scientists who wanted to instrument and measure everything began to take over. I spent a lot of my time at IBM trying to convince others that the KR technology was useful, and even helping them use it. It was a v
vi Foreword losing battle, and like the field in general I began to become enamoured of the influential power of empirical evidence—it made me feel like a scientist. Still, however, my allegiance to the KR vision, that knowledge was intrinsic to the design of AI systems, could not be completely dispelled. In 2007, a group of 12 researchers at IBM began working on a top secret moonshot project which we code-named “BlueJ”—building a natural language question answering system capable of the speed and accuracy necessary to achieve expert human-level performance on the TV quiz show, Jeopardy! It was the most compelling and interesting project I have ever worked on, and it gave me an opportunity to prove that knowledge—human created and curated knowledge—is a valuable tool. At the start of the project, Dave Ferrucci, the team leader, challenged us all to “make bets” on what we thought would work and commit to being measured on how well our bets impacted the ability to find the right answer as well as to understand if the answer is correct. I bet on KR, and for the first year, working alone on this particular bet, I failed, much as the KR community had failed more broadly to have any impact on any real problems other people had. But in the following year, Ferrucci agreed to put a few more people on it (partly because of my persuasive arguments, but mostly because he believed in the KR vision, too) and with the diversity of ideas and perspectives that naturally comes from having more people, we started to show impact. After our widely publicised and viewed victory over the two greatest Jeopardy! players in history, my team published the results of our experiments that demonstrated more than 10 % of Watson’s winning performance (again, in terms of both finding answers and determining if they were correct) came from represented knowledge. Knowledge is not the destination In order to make this contribution to IBM’s Watson, my team and I had to abandon our traditional notion of KR and adopt a new one, that I later came to call, “Knowledge is not the destination”. The abject failure of KR to have any mea- surable impact on anything up to that point in time was due, I claim, to a subtle shift in that research community, sometime in the 1980s, from knowledge representation and reasoning as an integral part of some larger system, to KR&R as the ultimate engine of AI. This is where we were when I came into the field, and this was tacit in how I approached AI when I was working in Digital Libraries, Web Systems, and my early efforts at IBM in natural language question answering. The most ambitious KR&R activity before that time was Cyc, which prided itself on being able to conclude, “If you leave a snowman outside in the sun it will melt”. But Cyc could never possibly answer any of the myriad possible questions that might get asked about snowmen melting, because it would need a person to find the relevant Cyc micro-theory, look up the actual names and labels used in the axioms, type them in the correct and rather peculiar syntax, debug the reasoner and find the right set of heuristics that would make it give an answer, and even with all that it still probably could not answer a question like, “If your snowman starts to do this, turn on the air conditioner”, Watson might actually have had a shot at answering something like this, but only because it knew from large language corpora that
Foreword vii ‘snowman starts to melt’ is a common n-gram, not because it understands thermodynamics. Working with people from Cycorp, or with anyone in the KR&R world, we became so enamoured of our elegant logic that, without a doubt, the knowledge became our focus. We—and I can say this with total confidence—we absolutely believed that getting the right answer was a trivial matter as long as you had the knowledge and reasoning right. The knowledge was the point. “Knowledge is not the destination” refers to the epiphany that I had while working on Watson. The knowledge was important, but it wasn't the point—the point was to get answers right and to have confidence in them. If knowledge could not help with this, then it really was useless. But what kind of knowledge would help? Axioms about all the most general possible things in the world? Näive physics? Expert Physics? Deep Aristotelean theories? No. What mattered for Watson was having millions of simple “propositional” facts available at very high speed. Recognising entities by their names, knowing some basic type of information, knowing about very simple geospatial relationships like capitals and borders, where famous people were born and when, and much much more. Knowing all this was useful not because we looked up answers this way— Jeopardy! never asked about a person’s age—but because these little facts could be stitched together with many other pieces of evidence from other sources to understand how confident we were in each answer. This knowledge, a giant collection of subject-property-object triples, can be viewed as a graph. A very simple one, especially by KR&R standards, but this knowledge graph was not itself the goal of the project. The goal—the destination— of the project was winning Jeopardy! So, in fact, we made absolutely no effort to improve the knowledge we used from DBpedia and freebase. We needed to understand how well it worked for our problem in the general case, because there was no way to know what actual questions would be asked in the ultimate test in front of 50 million people. Knowledge Graphs are Everywhere! As of the publication of this book, most major IT companies—more accurately, most major information companies—including Bloomberg, NY Times, Microsoft, Facebook, Twitter and many more, have significant knowledge graphs like Watson did, and have invested in their curation. Not because any of these graphs is their business, but because using this knowledge helps them in their business. After Watson I moved to Google Research, where freebase lives on in our own humongous knowledge graph. And while Google invests a lot in its curation and maintenance, Google’s purpose is not to build the greatest and most comprehensive knowledge graph on Earth, but to make a search, email, youtube, personal assistants and all the rest of our Web-scale services, better. That’s our destination. Many believe that the success of this kind of simplistic, propositional, knowl- edge graph proves that the original KR&R vision was a misguided mistake, but an outspoken few have gone so far as to claim it was a 40+ year waste of some great minds. As much as I appreciate being described as a great mind, I prefer a different
viii Foreword explanation: the work in KR for the past 40 years was not a waste of time, it was just the wrong place to start. It was solving a problem no one yet had, because no one had yet built systems that used this much explicit and declared knowledge. Now, knowledge graphs are everywhere. Now industry is investing in the knowledge that drives their core systems. The editors of this volume, Jeff Pan, Guido Vetere, José Manuel Gómez Pérez and Honghan Wu, all themselves experts in this old yet burgeoning area of research, have gone to great lengths to put together research that matters today, in this world of large-scale graphs representing knowledge that makes a difference in the systems we use on the Web, on our phones, at work and at home. The editorial team members have unique backgrounds, yet have worked together before, such as in the EU Marie Curie K-Drive project, and this book is a natural extension of their recent work on studying the properties of knowledge graphs. Jeff started at Manchester and has done a widely published work in formal reasoning systems, and moved to Aberdeen where his portfolio broadened considerably to include Machine Learning, large data analysis, and others, although he never strayed too far from practical reasoning, such as approximate reasoning, and querying for knowledge graphs. Guido has run several successful schema man- agement projects on large data systems at IBM, and was part of the team that worked to bring Watson to Italy. Jose has done important research in the area of distributed systems, semantic data management and NLP, making knowledge easier to understand, access and consume by real users, and Honghan has been doing research in the area of medical knowledge systems. After you finish this book, try to find a faded red copy of Readings in Knowledge Representation lest we forget and reinvent the Semantic Network. May 2016 Dr. Christopher Welty Google Research NYC
分享到:
收藏