The Neo4j Graph Algorithms User
Guide v3.4
Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Graph loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Building locally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
The Yelp example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
The Yelp Open Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Graph model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
The PageRank algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
The Betweenness Centrality algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
The Closeness Centrality algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
The Harmonic Centrality algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
The Minimum Weight Spanning Tree algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
The Shortest Path algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
The Single Source Shortest Path algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
The A* algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
The Yen’s K-shortest paths algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
The All Pairs Shortest Path algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
The Triangle Counting / Clustering Coefficient algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
The Label Propagation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
The Louvain algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
The Connected Components algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
The Strongly Connected Components algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
License: Creative Commons 4.0
This is the user guide for Neo4j Graph Algorithms version 3.4, authored by
the Neo4j Team.
The guide covers the following areas:
• Introduction — An introduction to Neo4j Graph Algorithms.
• The Yelp example — An illustration of how to use graph algorithms on a social network of
friends.
• Procedures — A list of Neo4j Graph Algorithm procedures.
• Algorithms — A detailed guide to each of the Neo4j Graph Algorithms, including use-cases and
examples.
1
Introduction
This chapter provides an introduction to the available graph algorithms,
and instructions for installation and use.
This library provides efficiently implemented, parallel versions of common graph algorithms for
Neo4j 3.x, exposed as Cypher procedures.
Releases are available here: https://github.com/neo4j-contrib/neo4j-graph-algorithms/releases
Algorithms
Centralities
These algorithms determine the importance of distinct nodes in a network:
• PageRank (algo.pageRank)
• Betweenness Centrality (algo.betweenness)
• Closeness Centrality (algo.closeness)
Community detection
These algorithms evaluate how a group is clustered or partitioned, as well as its tendency to
strengthen or break apart:
• Louvain (algo.louvain)
• Label Propagation (algo.labelPropagation)
• (Weakly) Connected Components (algo.unionFind)
• Strongly Connected Components (algo.scc)
• Triangle Count / Clustering Coefficient (algo.triangleCount)
Path finding
These algorithms help find the shortest path or evaluate the availability and quality of routes:
• Minimum Weight Spanning Tree (algo.mst)
• All Pairs- and Single Source - Shortest Path (algo.shortestPath, algo.allShortestPaths)
• A* Algorithm (algo.shortestPath.astar)
• Yen’s K-Shortest Paths (algo.kShortestPaths)
These procedures work either on the whole graph, or on a subgraph filtered by label and
relationship-type. You can also use filtering and projection using Cypher queries.
2
Installation
Download graph-algorithms-algo-[version].jar from the matching release and copy it into your
$NEO4J_HOME/plugins directory.
Because the algorithms use the lower level Kernel API to read from, and to write to Neo4j, for
security purposes you will also have to enable them in the configuration:
1. Add the following to your $NEO4J_HOME/conf/neo4j.conf file:
dbms.security.procedures.unrestricted=algo.*
2. Restart Neo4j
3. To see a list of all the algorithms, run the following query:
CALL algo.list()
Usage
These algorithms are exposed as Neo4j procedures. They can be called directly from Cypher in your
Neo4j Browser, from cypher-shell, or from your client code.
For most algorithms there are two procedures:
• algo. - this procedure writes results back to the graph as node-properties, and reports
statistics.
• algo..stream - this procedure returns a stream of data. For example, node-ids and
computed values.
For large graphs, the streaming procedure might return millions, or even billions of results. In
this case it may be more convenient to store the results of the algorithm, and then use them
with later queries.
We can project the graph we want to run algorithms on with either label and relationship-type
projection, or cypher projection.
The projected graph model is separate from Neo4j’s stored graph model to enable fast caching for
the topology of the graph, containing only relevant nodes, relationships and weights. The projected
graph model does not support multiple relationships between a single pair of nodes. During
3
projection, only one relationship between a pair of nodes per direction (in, out) is allowed in the
directed case, but two relationships are allowed for BOTH the undirected cases.
Label and relationship-type projection
We can project the subgraph we want to run the algorithm on by using the label parameter to
describe nodes, and relationship-type to describe relationships.
The general call syntax is:
CALL algo.('NodeLabel', "RelationshipType", {config})
For example, PageRank on DBpedia (11M nodes, 116M relationships):
CALL algo.pageRank('Page','Link',{iterations:5, dampingFactor:0.85, write: true,
writeProperty:'pagerank'});
// YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor,
write, writeProperty
CALL algo.pageRank.stream('Page','Link',{iterations:5, dampingFactor:0.85})
YIELD node, score
RETURN node.title, score
ORDER BY score DESC LIMIT 10;
Huge graph projection
The default label and relationship-type projection has a limitation of 2 billion nodes and 2 billion
relationships, so if our project graph is bigger than this we need to use a huge graph projection.
This can be enabled by setting graph:'huge' in the config.
The general call syntax is:
CALL algo.('NodeLabel', "RelationshipType", {graph: "huge"})
For example, PageRank on DBpedia:
CALL algo.pageRank('Page','Link',{iterations:5, dampingFactor:0.85,
writeProperty:'pagerank',graph:'huge'});
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor,
writeProperty
Cypher projection
If label and relationship-type projection is not selective enough to describe our subgraph to run the
algorithm on, we can use Cypher statements to project subsets of our graph. Use a node-statement
instead of the label parameter and a relationship-statement instead of the relationship-type, and
4
use graph:'cypher' in the config.
Relationships described in the relationship-statement will only be projected if both source and
target nodes are described in the node-statement. Relationships that don’t have both source and
target nodes described in the node-statement will be ignored.
We can also return a property value or weight (according to our config) in addition to the ids from
these statements.
Cypher projection enables us to be more expressive in describing our subgraph that we want to
analyse, but might take longer to project the graph with more complex cypher queries.
The general call syntax is:
CALL algo.(
'MATCH (n) RETURN id(n) AS id',
"MATCH (n)-->(m) RETURN id(n) AS source, id(m) AS target",
{graph: "cypher"})
For example, PageRank on DBpedia:
CALL algo.pageRank(
'MATCH (p:Page) RETURN id(p) as id',
'MATCH (p1:Page)-[:Link]->(p2:Page) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:5, write: true});
Cypher projection can also be used to project a virtual (non-stored) graph. Here is an example of
how to project an undirected graph of people who visited the same web page and run the Louvain
community detection algorithm on it, using the number of common visited web pages between
pairs of people as relationship weight:
CALL algo.louvain(
'MATCH (p:Person) RETURN id(p) as id',
'MATCH (p1:Person)-[:Visit]->(:Page)<-[:Visit]-(p2:Person)
RETURN id(p1) as source, id(p2) as target, count(*) as weight',
{graph:'cypher', iterations:5, write: true});
Graph loading
As it can take some time to load large graphs into the algorithm data structures, you can pre-load
graphs and then later refer to them by name for several graph algorithms. After usage they can be
removed from memory to free resources used:
5
// Load graph
CALL algo.graph.load('my-graph','Label','REL_TYPE',{graph:'heavy',..other config...})
YIELD name, graph, direction, undirected, sorted, nodes, loadMillis, alreadyLoaded,
nodeWeight, relationshipWeight, nodeProperty, loadNodes, loadRelationships;
// Info on loaded graph
CALL algo.graph.info('my-graph')
YIELD name, type, exists, removed, nodes;
// Use graph
CALL algo.pageRank(null,null,{graph:'my-graph',...})
// Remove graph
CALL algo.graph.remove('my-graph')
YIELD name, type, exists, removed, nodes;
Building locally
Currently aiming at Neo4j 3.x (with a branch per version):
git clone https://github.com/neo4j-contrib/neo4j-graph-algorithms
cd neo4j-graph-algorithms
git checkout 3.3
mvn clean install
cp algo/target/graph-algorithms-*.jar $NEO4J_HOME/plugins/
$NEO4J_HOME/bin/neo4j restart
6