Cluster-GCN: An Efficient Algorithm for
Training Deep and Large Graph Convolutional Networks
Wei-Lin Chiang (National Taiwan University);Xuanqing Liu (University of California, Los
Angeles);Si Si (Google);Yang Li (Google);Samy Bengio (Google);Cho-Jui Hsieh (University of
California, Los Angeles)
KDD 19
Content
⚫ 1. Background
⚫ 2. Contributions
⚫ 3. Methodology
⚫ 4. Experiments
⚫ 5. Conclusions
1. Background
Previous GCNs Training Algorithms
– memory requirement
– time per epoch
– convergence speed (loss reduction) per epoch
GCN
([memory: bad; convergence: bad])
– O(NFL) memory
Full-batch gradient descent(storing all the intermediate embeddings)
– the convergence of gradient descent is slow
parameters are updated only once per epoch
SAGE:Inductive Representation Learning on Large Graphs
([memory: good; convergence: bad])
– Mini-batch SGD
1. Background
Previous GCN Training Algorithms
– memory requirement
– time per epoch
– convergence speed (loss reduction) per epoch
VR-GCN
[memory: bad; convergence: good]
– proposes to use a variance reduction technique to reduce the size of neighborhood
sampling nodes
– O(NFL) memory (storing all the intermediate embeddings)
Cluster-GCN
([memory: good; convergence: good])
– mini-batch gradient decent
– O(bFL) memory
– only compute matrix multiplication and no neighborhood sampling is needed
2. Contributions
(1)Cluster-GCN achieves the best memory usage on large-scale
graphs, especially on deep GCN
– construct a new large-scale graph dataset Amazon2M( 2 millions nodes and 61
millions edges)
(2) Cluster-GCN is faster than VR-GCN when the network goes deeper
– Cluster-GCN’s complexity is linear to the number of layers L
– VR-GCN’s complexity is exponential to L
(3) Cluster-GCN is able to train a very deep network that has a large
embedding size
– a 5-layer GCN: obtain a new benchmark accuracy 99.36 for PPI dataset
3. Methodology
Vanilla Cluster-GCN
3. Methodology
Vanilla Cluster-GCN
Layer-wise propagation rule
loss function
reduce the size of neighborhood sampling nodes
3. Methodology
unbalanced label distribution
This increases the variance across different batches
and may affect the convergence of SGD
random partition versus clustering partition