logo资料库

Cluster-GCN An Efficient Algorithm for Training Deep and Large G....pdf

第1页 / 共19页
第2页 / 共19页
第3页 / 共19页
第4页 / 共19页
第5页 / 共19页
第6页 / 共19页
第7页 / 共19页
第8页 / 共19页
资料共19页,剩余部分请下载后查看
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Wei-Lin Chiang (National Taiwan University);Xuanqing Liu (University of California, Los Angeles);Si Si (Google);Yang Li (Google);Samy Bengio (Google);Cho-Jui Hsieh (University of California, Los Angeles) KDD 19
Content ⚫ 1. Background ⚫ 2. Contributions ⚫ 3. Methodology ⚫ 4. Experiments ⚫ 5. Conclusions
1. Background Previous GCNs Training Algorithms – memory requirement – time per epoch – convergence speed (loss reduction) per epoch GCN ([memory: bad; convergence: bad]) – O(NFL) memory Full-batch gradient descent(storing all the intermediate embeddings) – the convergence of gradient descent is slow parameters are updated only once per epoch SAGE:Inductive Representation Learning on Large Graphs ([memory: good; convergence: bad]) – Mini-batch SGD
1. Background Previous GCN Training Algorithms – memory requirement – time per epoch – convergence speed (loss reduction) per epoch VR-GCN [memory: bad; convergence: good] – proposes to use a variance reduction technique to reduce the size of neighborhood sampling nodes – O(NFL) memory (storing all the intermediate embeddings) Cluster-GCN ([memory: good; convergence: good]) – mini-batch gradient decent – O(bFL) memory – only compute matrix multiplication and no neighborhood sampling is needed
2. Contributions (1)Cluster-GCN achieves the best memory usage on large-scale graphs, especially on deep GCN – construct a new large-scale graph dataset Amazon2M( 2 millions nodes and 61 millions edges) (2) Cluster-GCN is faster than VR-GCN when the network goes deeper – Cluster-GCN’s complexity is linear to the number of layers L – VR-GCN’s complexity is exponential to L (3) Cluster-GCN is able to train a very deep network that has a large embedding size – a 5-layer GCN: obtain a new benchmark accuracy 99.36 for PPI dataset
3. Methodology Vanilla Cluster-GCN
3. Methodology Vanilla Cluster-GCN Layer-wise propagation rule loss function reduce the size of neighborhood sampling nodes
3. Methodology unbalanced label distribution This increases the variance across different batches and may affect the convergence of SGD random partition versus clustering partition
分享到:
收藏