NEURAL READING COMPREHENSION AND BEYOND
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Danqi Chen
December 2018
http://creativecommons.org/licenses/by-nc/3.0/us/
© 2018 by Danqi Chen. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-
Noncommercial 3.0 United States License.
This dissertation is online at: http://purl.stanford.edu/gd576xb1833
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequate
in scope and quality as a dissertation for the degree of Doctor of Philosophy.
Christopher Manning, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequate
in scope and quality as a dissertation for the degree of Doctor of Philosophy.
Dan Jurafsky
I certify that I have read this dissertation and that, in my opinion, it is fully adequate
in scope and quality as a dissertation for the degree of Doctor of Philosophy.
Percy Liang
I certify that I have read this dissertation and that, in my opinion, it is fully adequate
in scope and quality as a dissertation for the degree of Doctor of Philosophy.
Luke Zettlemoyer
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost for Graduate Education
This signature page was generated electronically upon submission of this dissertation in
electronic format. An original signed hard copy of the signature page is on file in
University Archives.
iii
Abstract
Teaching machines to understand human language documents is one of the most elusive
and long-standing challenges in Artificial Intelligence. This thesis tackles the problem
of reading comprehension: how to build computer systems to read a passage of text and
answer comprehension questions. On the one hand, we think that reading comprehension is
an important task for evaluating how well computer systems understand human language.
On the other hand, if we can build high-performing reading comprehension systems, they
would be a crucial technology for applications such as question answering and dialogue
systems.
In this thesis, we focus on neural reading comprehension: a class of reading com-
prehension models built on top of deep neural networks. Compared to traditional sparse,
hand-designed feature-based models, these end-to-end neural models have proven to be
more effective in learning rich linguistic phenomena and improved performance on all the
modern reading comprehension benchmarks by a large margin.
This thesis consists of two parts. In the first part, we aim to cover the essence of neural
reading comprehension and present our efforts at building effective neural reading compre-
hension models, and more importantly, understanding what neural reading comprehension
models have actually learned, and what depth of language understanding is needed to solve
current tasks. We also summarize recent advances and discuss future directions and open
questions in this field.
In the second part of this thesis, we investigate how we can build practical applications
based on the recent success of neural reading comprehension. In particular, we pioneered
two new research directions: 1) how we can combine information retrieval techniques with
neural reading comprehension to tackle large-scale open-domain question answering; and
iv
2) how we can build conversational question answering systems from current single-turn,
span-based reading comprehension models. We implemented these ideas in the DRQA and
COQA projects and we demonstrate the effectiveness of these approaches. We believe that
they hold great promise for future language technologies.
v
Acknowledgments
The past six years at Stanford have been an unforgettable and invaluable experience to
me. When I first started my PhD in 2012, I could barely speak fluent English (I was
required to take five English courses at Stanford), knew little about this country and had
never heard of the term “natural language processing”.
It is unbelievable that over the
following years I have actually been doing research about language and training computer
systems to understand human languages (English in most cases), as well as training myself
to speak and write in English. At the same time, 2012 is the year that deep neural networks
(also called deep learning) started to take off and dominate almost all the AI applications
we are seeing today. I witnessed how fast Artificial Intelligence has been developing from
the beginning of the journey and feel quite excited —– and occasionally panicked —– to
be a part of this trend. I would not have been able to make this journey without the help
and support of many, many people and I feel deeply indebted to them.
First and foremost, my greatest thanks go to my advisor Christopher Manning. I really
didn’t know Chris when I first came to Stanford — only after a couple of years that I
worked with him and learned about NLP, did I realize how privileged I am to get to work
with one of the most brilliant minds in our field. He always has a very insightful, high-
level view about the field while he is also uncommonly detail oriented and understands the
nature of the problems very well. More importantly, Chris is an extremely kind, caring and
supportive advisor that I could not have asked for more. He is like an older friend of mine
(if he doesn’t mind me saying so) and I can talk with him about everything. He always
believes in me even though I am not always that confident about myself.
I am forever
grateful to him and I have already started to miss him.
I would like to thank Dan Jurafsky and Percy Liang — the other two giants of the
vi
Stanford NLP group — for being on my thesis committee and for a lot of guidance and
help throughout my PhD studies. Dan is an extremely charming, enthusiastic and knowl-
edgeable person and I always feel my passion getting ignited after talking to him. Percy
is a superman and a role model for all the NLP PhD students (at least myself). I never
understand how one can accomplish so many things at the same time and a big part of this
dissertation is built on top of his research. I want to thank Chris, Dan and Percy, for setting
up the Stanford NLP Group, my home at Stanford, and I will always be proud to be a part
of this family.
It is also my great honor to have Luke Zettlemoyer on my thesis committee. The work
presented in this dissertation is very relevant to his research and I learned a lot from his
papers. I look forward to working with him in the near future. I also would like to thank
Yinyu Ye for his time chairing my thesis defense.
During my PhD, I have done two wonderful internships at Microsoft Research and
Facebook AI Research. I thank my mentors Kristina Toutanova, Antoine Bordes and Jason
Weston when I worked at these places. My internship project at Facebook eventually leads
to the DRQA project and a part of this dissertation. I also would like to thank Microsoft
and Facebook for providing me with fellowships.
Collaboration is a big lesson that I learned, and also a fun part of graduate school. I
thank my fellow collaborators: Gabor Angeli, Jason Bolton, Arun Chaganty, Adam Fisch,
Jon Gauthier, Shayne Longpre, Jesse Mu, Siva Reddy, Richard Socher, Yuhao Zhang, Vic-
tor Zhong, and others.
In particular, Richard — with him I finished my first paper in
graduate school. He had very clear sense about how to define an impactful research project
while I had little experience at the time. Adam and Siva — with them I finished the DRQA
and COQA projects respectively. Not only am I proud of these two projects, but also I
greatly enjoyed the collaborations. We have become good friends afterwards. The KBP
team, especially Yuhao, Gabor and Arun — I enjoyed the teamwork during those two sum-
mers. Jon, Victor, Shayne and Jesse, the younger people that I got to work with, although I
wish I could have done a better job. I also want to thank the two teaching teams (7 and 25
people respectively) for the NLP class that I’ve worked on and that was a very unique and
rewarding experience for me.
I thank the whole Stanford NLP Group, especially Sida Wang, Will Monroe, Angel
vii
Chang, Gabor Angeli, Siva Reddy, Arun Chaganty, Yuhao Zhang, Peng Qi, Jacob Stein-
hardt, Jiwei Li, He He, Robin Jia and Ziang Xie, who gave me a lot of support at various
times. I am even not sure if there could be another research group in the world better than
our group (I hope I can create a similar one in the future). The NLP retreat, NLP BBQ and
those paper swap nights were among my most vivid memories in graduate school.
Outside of the NLP group, I have been extremely lucky to be surrounded by many
great friends. Just to name a few (and forgive me for not being able to list all of them):
Yanting Zhao, my close friend for many years, who keeps pulling me out from my stressful
PhD life, and I share a lot of joyous moments with her. Xueqing Liu, my classmate and
roommate in college who started her PhD at UIUC in the same year and she is the person
that I can keep talking to and exchanging my feelings and thoughts with, especially on
those bad days. Tao Lei, a brilliant NLP PhD and my algorithms “teacher” in high school
and I keep learning from him and getting inspired from every discussion. Thanh-Vy Hua,
my mentor and “elder sister” who always makes sure that I am still on the right track of my
life and taught me many meta-skills to survive this journey (even though we have only met
3 times in the real world). Everyone in the “cˇao y´u” group, I am so happy that I have spent
many Friday evenings with you.
During the past year, I visited a great number of U.S. universities seeking an academic
job position. There are so many people I want to thank for assistance along the way —–
I either received great help and advice from them, or I felt extremely welcomed during
my visit —– including Sanjeev Arora, Yoav Artzi, Regina Barzilay, Chris Callison-Burch,
Kai-Wei Chang, Kyunghyun Cho, William Cohen, Michael Collins, Chris Dyer, Jacob
Eisenstein, Julia Hirschberg, Julia Hockenmaier, Tengyu Ma, Andrew McCallum, Kathy
McKeown, Rada Mihalcea, Tom Mitchell, Ray Mooney, Karthik Narasimhan, Graham
Neubig, Christos Papadimitriou, Nanyun Peng, Drago Radev, Sasha Rush, Fei Sha, Yulia
Tsvetkov, Luke Zettlemoyer and many others. These people are really a big part of the
reasons that I love our research community so much, therefore I want to follow their paths
and dedicate myself to an academic career. I hope to continue to contribute to our research
community in the future.
A special thanks to Andrew Chi-Chih Yao for creating the Special Pilot CS Class where
I did my undergraduate studies. I am super proud of being a part of the “Yao class” family.
viii