Mining the Social Web
THIRD EDITION
Matthew A. Russell and Mikhail Klassen
istory
opics
utorials
ffers & Deals
ighlights
ettings
Support
Sign Out
Playlists
Mining the Social Web
History
by Matthew A. Russell and Mikhail Klassen
Topics
Copyright © 2019 Matthew Russell, Mikhail Klassen. All rights reserved.
Tutorials
Printed in Canada.
Offers & Deals
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
Highlights
O’Reilly books may be purchased for educational, business, or sales promotional use. Online
editions are also available for most titles (http://oreilly.com/safari). For more information,
contact our corporate/institutional sales department: 8009989938 or corporate@oreilly.com.
Settings
Support
Sign Out
Acquistions Editor: Mary Treseler
Development Editor: Alicia Young
Production Editor: Nan Barber
Copyeditor: Rachel Head
Proofreader: Kim Cofer
Indexer: WordCo Indexing Services, Inc.
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
December 2018: Third Edition
Revision History for the Third Edition
.
20181129: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781491985045 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Mining the Social Web, the
cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publisher’s
views. While the publisher and the authors have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and the authors
disclaim all responsibility for errors or omissions, including without limitation responsibility for
damages resulting from the use of or reliance on this work. Use of the information and
instructions contained in this work is at your own risk. If any code samples or other technology
this work contains or describes is subject to open source licenses or the intellectual property
rights of others, it is your responsibility to ensure that your use thereof complies with such
licenses and/or rights.
9781491985045
[MBP]
Preface
Safari Home
Recommended
The Web is more a social creation than a technical one.
Playlists
History
I designed it for a social effect—to help people work together—and not as a technical toy. The
ultimate goal of the Web is to support and improve our weblike existence in the world. We
clump into families, associations, and companies. We develop trust across the miles and
distrust around the corner.
Topics
—Tim BernersLee, Weaving the Web (Harper)
A Note from Matthew Russell
Tutorials
Offers & Deals
It’s been more than five years since I put the final finishing touches on the manuscript for
Mining the Social Web, 2nd Edition, and a lot has changed since then. I have lived and learned a
lot of new things, technology has continued to evolve at a blistering pace, and the social web
itself has matured to such an extent that governments are now formulating legal policy around
how data can be collected, shared, and used.
Highlights
Settings
Support
Sign Out
Knowing that my own schedule could not possibly allow for the immense commitment needed
to produce a new edition to freshen up and expand on the content, but believing wholeheartedly
that there has never been a better moment for the message this book delivers, I knew that it was
time to find a coauthor to help deliver it to the next wave of entrepreneurs, technologists, and
hackers who are curious about mining the social web. It took well over a year for me to find a
coauthor who shared the same passion for the subject and possessed the skill and determination
that’s required to write a book.
I can’t even begin to tell you how grateful I am for Mikhail Klassen and his incredible
contributions in keeping this labor of love alive for many more years to come. In the pages
ahead, you’ll see that he’s done a tremendous job of modernizing the code, improving the
accessibility of its runtime environment, and expanding the content with a substantial new
chapter—all in addition to editing and freshening up the overall manuscript itself and
enthusiastically carrying the mantle forward for the next wave of entrepreneurs, technologists,
and hackers who are curious about mining the social web.
README.1st
,
.
.
ion
f di
o
This book has been carefully designed to provide an incredible learning experience for a
particular target audience, and in order to avoid any unnecessary confus
or
its
scope
b d b k
l d
b
h
i
i
il
can
oo
ema
purpose
way
y
ews,
ot
er
m
a
sgrunt
rev
e
s,
that
sunderstandings
part
remainder
come
up,
the
determine
you
of
help
tries
this
preface
to
whether
you
are
of
that
audience.
target
we
and
valuable
our
we
professionals,
most
busy
As
our
time
consider
asset,
beginning
the
from
you
want
right
know
to
believe
we
that
that
the
you.
true
of
ourselves
really
Although
often
fail,
we
we
above
honor
our
to
neighbors
try
as
we
do
walk
out
our
this
or
whether
the
you,
honor
attempt
life,
clear
preface
this
reader,
to
and
is
by
your
book
can
expectations.
this
meet
not
making
about
same
is
it
or
to
book
want
Your
Expectations
Managing
assumptions
most
Some
you
reader
as
you
about
of
are
that
a
this
the
basic
makes
avoid
how
learn
when
hassles
technology
properties,
social
popular
from
to
mine
data
web
you
could
sample
running
this
Although
of
and
code,
have
along
fun
way.
read
the
book
lots
possible,
learning
solely
been
has
front
know
you
purpose
what
that
should
for
up
is
of
the
it
exercises
that
become
the
along
could
such
written
in
and
a
way
really
follow
with
many
a
you
environment.
data
If
development
up
simple
completed
to
steps
few
once
the
miner
you’ve
set
a
get
to
painless
programming
relatively
some
done
before,
you
should
find
that
it’s
you’ve
up
you
programmed
code
and
consider
if
before,
never
if
with
running
you’ve
Even
the
yourself
daresay
point
starting
a
this
as
techsavvy
use
could
you
bit
the
least
that
book
probably
imagined
even
haven’t
mind
will
that
journey
ways
that
you
remarkable
your
in
yet.
examples.
stretch
to
I
a
interested
need
the
vast
it
all
that
has
and
to
offer,
you
book
enjoy
fully
To
this
to
be
in
social
popular
Twitter,
away
data
the
for
possibilities
such
tucked
websites
as
in
rich
mining
use
motivated
Facebook,
Docker,
enough
to
to
Instagram,
you
need
and
LinkedIn,
install
and
be
book’s
code
book’s
follow
experience,
example
with
along
and
run
virtual
this
it
machine
the
to
every
for
examples
of
that
tool
webbased
Notebook,
all
in
Jupyter
fantastic
the
the
a
features
examples
Executing
the
the
of
few
usually
easy
as
keys,
is
as
a
all
chapter.
since
code
interface.
user
friendly
in
you
a
is
presented
to
pressing
thankful
you’ll
This
few
be
things
book
you
teach
will
a
few
a
add
will
and
to
that
importantly,
even
indispensable
but
your
tools
to
toolbox,
tell
you
it
more
will
perhaps
story
a
social
story
way.
It’s
you
along
the
entertain
and
a
science
about
data
involving
websites,
the
data
(or
possibilities
the
some
them,
inside
of
tucked
away
you
of
intriguing
of
and
that’s
with
data.
else)
could
this
do
anyone
learn
what
If
you
to
were
read
this
book
from
cover
cover,
to
you’d
notice
that
this
story
unfolds
on
a
chapterbychapter basis. While each chapter roughly follows a predictable template that
introduces a social website, teaches you how to use its API to fetch data, and presents some
techniques for data analysis, the broader story the book tells crescendos in complexity. Earlier
chapters in the book take a little more time to introduce fundamental concepts, while later
chapters systematically build upon the foundation from earlier chapters and gradually introduce
a broad array of tools and techniques for mining the social web that you can take with you into
other aspects of your life as a data scientist, analyst, visionary thinker, or curious reader.
Some of the most popular social websites have transitioned from fad to mainstream to
household names over recent years, changing the way we live our lives on and off the web and
enabling technology to bring out the best (and sometimes the worst) in us. Generally speaking,
each chapter of this book interlaces slivers of the social web along with data mining, analysis,
and visualization techniques to explore data and answer the following representative questions:
Who knows whom, and which people are common to their social networks?
How frequently are particular people communicating with one another?
Which social network connections generate the most value for a particular niche?
How does geography affect your social connections in an online world?
Who are the most influential/popular people in a social network?
What are people chatting about (and is it valuable)?
What are people interested in based upon the human language that they use in a digital
world?
The answers to these basic kinds of questions often yield valuable insights and present
(sometimes lucrative) opportunities for entrepreneurs, social scientists, and other curious
practitioners who are trying to understand a problem space and find solutions. Activities such as
building a turnkey killer app from scratch to answer these questions, venturing far beyond the
typical usage of visualization libraries, and constructing just about anything stateoftheart are
not within the scope of this book. You’ll be really disappointed if you purchase this book
because you want to do one of those things. However, the book does provide the fundamental
building blocks to answer these questions and provide a springboard that might be exactly what
you need to build that killer app or conduct that research study. Skim a few chapters and see for
yourself. This book covers a lot of ground.
One important thing to note is that APIs are constantly changing. Social media hasn’t been
around all that long, and even the platforms that appear the most established today are still
adapting to how people use them and confronting new threats to security and privacy. As such,
the interfaces between our code and their platforms (the APIs) are liable to change too, which
means that the code examples provided in this book may not work as intended in the future.
We’ve tried to create realistic examples that are useful for general purposes and app developers,
and therefore some of them will require submitting an application for review and approval.
We’ll do our best to flag those with notes, but be advised API terms of service can change at
any time. Nevertheless, as long as your app abides by the terms of service, it will likely get
approved, so it’s worth the effort.
Python-Centric Technology
This book intentionally takes advantage of the Python programming language for all of its
example code. Python’s intuitive syntax, amazing ecosystem of packages that trivialize API
access and data manipulation, and core data structures that are practically JSON make it an
excellent teaching tool that’s powerful yet also very easy to get up and running. As if that
weren’t enough to make Python both a great pedagogical choice and a very pragmatic choice for
mining the social web, there’s the Jupyter Notebook, a powerful, interactive code interpreter
that provides a notebooklike user experience from within your web browser and combines code
execution, code output, text, mathematical typesetting, plots, and more. It’s difficult to imagine
a better user experience for a learning environment, because it trivializes the problem of
delivering sample code that you as the reader can follow along with and execute with no hassles.
Figure P1 provides an illustration of the Jupyter Notebook experience, demonstrating the
dashboard of notebooks for each chapter of the book. Figure P2 shows a view of one notebook.
Figure P1. Overview of the Jupyter Notebook; a dashboard of notebooks