Advanced Information and Knowledge Processing
Mohammed Zuhair Al-Taie
Seifedine Kadry
Python for
Graph and
Network
Analysis
Advanced Information and Knowledge
Processing
Series editors
Lakhmi C. Jain
Bournemouth University, Poole, UK and
University of South Australia, Adelaide, Australia
Xindong Wu
University of Vermont
Information systems and intelligent knowledge processing are playing an increasing
role in business, science and technology. Recently, advanced information systems
have evolved to facilitate the co-evolution of human and information networks
within communities. These advanced information systems use various paradigms
including artificial intelligence, knowledge management, and neural science as well
as conventional information processing paradigms. The aim of this series is to
publish books on new designs and applications of advanced information and
knowledge processing paradigms in areas including but not limited to aviation,
business, security, education, engineering, health, management, and science. Books
in the series should have a strong focus on information processing—preferably
combined with, or extended by, new results from adjacent sciences. Proposals for
research monographs, reference books, coherently integrated multi-author edited
books, and handbooks will be considered for the series and each proposal will be
reviewed by the Series Editors, with additional reviews from the editorial board and
independent reviewers where appropriate. Titles published within the Advanced
Information and Knowledge Processing series are included in Thomson Reuters’
Book Citation Index.
More information about this series at http://www.springer.com/series/4738
Mohammed Zuhair Al-Taie • Seifedine Kadry
Python for Graph and
Network Analysis
Mohammed Zuhair Al-Taie
Faculty of Computing
Universiti Teknologi Malaysia
Kuala Lumpur, Malaysia
Seifedine Kadry
School of Engineering and Technology
American University of the Middle East
Kuwait
ISSN 1610-3947
Advanced Information and Knowledge Processing
ISBN 978-3-319-53003-1
DOI 10.1007/978-3-319-53004-8
ISSN 2197-8441
ISBN 978-3-319-53004-8
(eBook)
(electronic)
Library of Congress Control Number: 2017935544
© Springer International Publishing AG 2017
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by Springer Nature
The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
New Age of Web Usage
The fast developments in the Web and Internet in the last decade and the advance-
ments in computing and communication have drawn people in innovative ways.
Huge participatory social sites have emerged, empowering new shapes of collabora-
tion communication. Sites, such as Twitter, Facebook, LinkedIn, and Myspace,
allow people to make new virtual relationships. Wikis, blogs, and video blogs pro-
vide users with convenience and assistance in every possible way to help them pub-
lish their ideas and thoughts, no need to worry about publishing costs. A tremendous
number of volunteers can today write articles and share photos, videos, and links at
a scope and scale never imagined before. Product recommendations provided by
online marketplaces such as eBay and Amazon (after analyzing user behavior) can
tempt online consumers to make more orders. Tagging mechanisms on the Web help
users to express their preferences. Sending and receiving e-mails, visiting a
Webpage, or posting a comment on a blog site leaves a digital footprint that can be
traced back to the person or group behind it. Political movements can also use the
Web today to create new forms of collaboration between supporters.
All these changes would not have taken place without the help of Web 2.0 tech-
nology—a term coined by Tim O’Reilly to show that Internet users are more pre-
pared than before to reformulate the Web content.
Social networking is a major factor in the emergence of such interactions since
most Internet users are players of social sites and use them regularly and actively.
Recent studies have shown that social networking has become one of three popu-
lar uses of the Internet, alongside the Internet search and e-mail, which points to the
importance of this social trend and the role it plays in communities.
In the study of social networks, social network analysis makes an interesting
interdisciplinary research area, where computer scientists and sociologists bring
their competence to a level that will enable them to meet the challenges of this fast-
developing field. Computer scientists have the knowledge to parse and process data,
v
vi
Preface
while sociologists have the experience that is required for efficient data editing and
interpretation.
Social network analysis techniques, which are included in this book, will help
readers to efficiently analyze social data from Twitter, Facebook, LiveJournal,
GitHub, and many others at three levels of depth: ego, group, and community. They
will be able to analyze militant and revolutionary networks and candidate networks
during elections. They will even learn how the Ebola virus spread through
communities.
Social network analysis was successfully applied in different fields such as
health, cyber security, business, animal social networks, information retrieval, and
communications. For example, in animal social networks, social network analysis
was used to investigate relationships and social structures of animal gatherings and
the direct and indirect interactions between animal groups. It was also applied by
security agencies, particularly after the 9/11/2001 attacks, to study the structure and
dynamics of militant groups.
Learn, in Simple Words, Theory and Practice of Social
Network Analysis
This is a book on graph and network analysis integrating theory and applications for
performing the analysis. Step by step, the book introduces the main structural con-
cepts and their applications in social research. It is aimed at tackling problems on
graphs and social networks by exploring tens of examples ranging in difficulty from
simple to intermediate, which makes the book a practical introduction to the field.
In each of the eight chapters (except for chapter one), each theoretical section is
followed by examples explaining how to perform graph and network analysis with
Python, a general-purpose programming language that is becoming more and more
popular to do data science. Companies worldwide are using Python to harvest
insights from their data and get a competitive edge. The book also includes the use
of NetworkX library, a Python language software package and an open-source tool
for the creation, manipulation, and study of the structure, dynamics, and functions
of complex networks. Side by side with Matplotlib package for data visualization,
these three open source tools are used to analyze and visualize social data. In the
end, the reader has the knowledge, skills, and tools to apply social network analysis
in all reachable fields, ranging from social media to business administration and
history.
The book is intended for readers who want to learn theory and practice of graph
and network analysis using a programming language, which is Python, without
going too far into its mathematical or statistical methods. In fact, the book is suit-
able for courses on social network analysis in all disciplines that use social
methodology. We believe that many of the readers are more interested in the imple-
mentation of social network analysis than in its mathematical properties.
Preface
vii
The book contains eight chapters. Chapter 1: Theoretical Concepts of Network
Analysis. This is the longest chapter, it gives an introduction to the major theoretical
concepts of network analysis, with emphasis on these used throughout this book.
Chapter 2: Graph theory. This chapter presents the main features of graph theory,
the mathematical study of the application and properties of graphs, initially moti-
vated by the study of games of chance. It addresses topics such as origins of graph
theory, graph basics, types of graphs, graph traversals, and types of operations on
graphs.
Chapter 3: Network basics. This chapter introduces the concept of a network,
which is, of course, the core object of network analysis. We will discuss topics such
as types of networks, network measures, installation and use of NetworkX library,
network data representation, basic matrix operations, and data visualization.
Chapter 4: Social networks. This chapter introduces the main concepts of social
networks such as properties of social networks, data collection in social networks,
data sampling, and social network analysis.
Chapter 5: Node-level analysis. This chapter is concerned with building an
understanding of how to do network analysis at the node (ego) level. It shows how
to create social networks from scratch, how to import networks, how to find key
players in social networks using centrality measures, and how to visualize networks.
We will also introduce the important algorithms that are used to gain insights from
graphs.
Chapter 6: Group-level analysis. In this chapter, we are going to present a num-
ber of techniques for detecting cohesive groups in networks such as cliques, cluster-
ing coefficient, triadic analysis, structural holes, brokerage, transitivity, hierarchical
clustering, and blockmodels, all of which are based on how nodes in a network
interconnect. However, among all, cohesion and brokerage types of analysis are two
major research topics in social network analysis.
Chapter 7: Network-level analysis. In this chapter, we are going to study graphs
and networks as a whole, which is different from what we have done in the previous
chapters when we analyzed graphs at the node level and the group level. Hence, this
chapter addresses concepts such as components and isolates, cores and periphery,
network density, shortest paths, reciprocity, affiliation networks and two-mode net-
works, and homophily.
Chapter 8: Information diffusion in social networks. This chapter discusses con-
cepts of information diffusion in social networks. Information diffusion methods
are commonly used in viral marketing, in collaborative filtering systems, in emer-
gency management, in community detection, and in the study of citation networks.
Johor, Malaysia
Egaila, Kuwait
Mohammed Zuhair Al-Taie
Seifedine Kadry