图卷积神经网络中的池化综述.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：0.32M 资料格式：pdf 举报版权申诉

syp_net-12315237-4744300845395092876.pdf-第1页.png

第1页 / 共5页

syp_net-12315237-4744300845395092876.pdf-第2页.png

第2页 / 共5页

syp_net-12315237-4744300845395092876.pdf-第3页.png

第3页 / 共5页

syp_net-12315237-4744300845395092876.pdf-第4页.png

第4页 / 共5页

syp_net-12315237-4744300845395092876.pdf-第5页.png

第5页 / 共5页

I Introduction

I-A Graph Signal Processing Perspective

II Related Work

II-A Graph Convolutional Layer

II-B Graph Pooling Layer

II-B1 Sort Pooling

II-B2 Differentiable Pooling

II-B3 Top-k Pooling

II-B4 Self-Attention Graph Pooling

III Proposed Method

IV Experiments

IV-A Datasets

IV-B Network Training

IV-C Results

IV-C1 Graph Convolution Comparison

IV-C2 Graph Convolution and Graph Pooling Comparison

V Conclusion

References

Pooling in Graph Convolutional Neural Networks Mark Cheung John Shi Lavender Jiang Electrical and Computer Engineering Electrical and Computer Engineering Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA, USA markcheu@andrew.cmu.edu Carnegie Mellon University Pittsburgh, PA, USA jshi3@andrew.cmu.edu Carnegie Mellon University Pittsburgh, PA, USA yaoj@andrew.cmu.edu Oren Wright Software Engineering Institute Carnegie Mellon University Pittsburgh, PA, USA owright@sei.cmu.edu Jos´e M.F. Moura Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA, USA moura@andrew.cmu.edu 0 2 0 2 r p A 7 ] P S . s s e e [ 1 v 9 1 5 3 0 . 4 0 0 2 : v i X r a Abstract—Graph convolutional neural networks (GCNNs) are a powerful extension of deep learning techniques to graph- structured data problems. We empirically evaluate several pool- ing methods for GCNNs, and combinations of those graph pooling methods with three different architectures: GCN, TAGCN, and GraphSAGE. We conﬁrm that graph pooling, especially DiffPool, improves classiﬁcation accuracy on popular graph classiﬁcation datasets and ﬁnd that, on average, TAGCN achieves comparable or better accuracy than GCN and GraphSAGE, particularly for datasets with larger and sparser graph structures. Index Terms—graph convolutional neural network, graph pooling, TAGCN, graph classiﬁcation, graph signal processing I. INTRODUCTION Over the past decade, deep learning techniques such as convolutional neural networks (CNNs) have transformed ﬁelds like computer vision and other Euclidean data domains (i.e., domains in which data have a uniform, grid-like structure). Many important domains, however, are comprised of non- Euclidean data (i.e., data have irregular relationships that require mathematical concepts like graphs or manifolds to ex- plicitly model). Such domains include social networks, sensor feeds, web trafﬁc, supply chains, and biological systems. As these data grow in size and complexity, deep learning seems to recommend itself as a tool for classiﬁcation and pattern recognition, but conventional deep learning approaches are often sharply limited when data lack a Euclidean structure to exploit. There are ongoing efforts to extend deep learning to these non-Euclidean domains, and such techniques have been dubbed geometric deep learning [1]. In parallel with advances in geometric deep learning are advances in graph signal processing (GSP) [2], [3]. Research in GSP attempts to generalize classical signal processing theory for irregular data deﬁned on graphs. One attraction This material is based upon work partially funded and supported by the Department of Defense under Contract No. FA8702-15-D-0002 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. This work is also partially supported by NSF grants CCF 1837607 and CCN 1513936. of GSP is that it provides a uniﬁed mathematical framework through which to view the spectral and vertex domains of a graph. Concepts like frequency or smoothness, which can be understood intuitively in classical signal processing, can be explicitly deﬁned for data on graphs. Graph convolutional neural networks (GCNNs), an exten- sion of CNNs to graph-structured data, were ﬁrst implemented with concepts from spectral graph theory [4], and methods based on the spectral approach have since been reﬁned and expanded [5], [6]. Reference [7] proposes the topology adap- tive graph convolutional network (TAGCN) that deﬁnes graph convolution directly in the vertex domain as multiplication by polynomials of the graph adjacency matrix. This is consistent with the concept of convolution in graph signal processing [2]. TAGCN designs a set of ﬁxed-size learnable ﬁlters whose topologies are adaptive to the topology of the graph as the ﬁlters scan the graph to perform convolution, see also [8], [9]. Other implementations, such as GraphSAGE [10] and graph attention networks (GATs) [11], are also deﬁned directly in the vertex domain of the graph and apply a learned, convolution- like aggregation function. An important operation in conventional CNNs is pooling, a nonlinear downsampling operation. Pooling layers in a CNN shrink the number of dimensions of the feature representation, thereby reducing the computation cost, memory footprint, and number of learned parameters. As a result, pooling allows for deeper networks in practice and can help control overﬁtting. Additionally, pooling has translation invariance properties that are desirable in many applications. Recently, the use of pooling in CNNs has come into question, but it remains popular. Just as convolution and convolution-like methods have been proposed to create graph convolutional layers in GCNNs, sev- eral methods have been proposed in order to perform pooling with GCNNs [12], [13], [14]. Unlike convolution, which has been derived in GSP [2], pooling has not been rigorously deﬁned. Therefore, the current generation of pooling methods are based on ad hoc rather than systematic approaches. They nonetheless have shown improved accuracy on popular graph

classiﬁcation datasets. In this paper, we perform experiments on graph classiﬁca- tion datasets, conditionally on graph convolution and graph pooling in GCNNs. This is a supervised learning task in which previously unseen graphs are classiﬁed based on labeled graphs. This task is analogous to image classiﬁcation. Like with CNNs and image classiﬁcation, tools like pooling layers are important for constructing high-level representations from node-level information. The paper is divided as follows: we ﬁrst present the back- ground and related work in section II. Section III provides our proposed approach. In section IV, we discuss the datasets used and present the results and analysis. Finally, we conclude the paper in section V. A. Graph Signal Processing Perspective The convolutional and pooling operator in graph neural network have a theoretical foundation in GSP. GSP [2] extends traditional discrete signal processing to graph signals, signals that are indexed by the nodes in a graph. Let G(A) = (V, E) be a graph with adjacency matrix A ∈ CN×N , where V is the set of N nodes and a nonzero entry [A]ij denotes a directed edge eij ∈ E from node i to node j. s : V → S on G is a graph signal where S is the signal space over the nodes of G and S ⊆ CN . s = (s1, s2, ..., sN ) ∈ CN , and si represents a measurement at node vi ∈ V. The heart of GCNNs is applying convolutional ﬁlters to graph signals. In GSP, convolution is a matrix-vector multi- plication of a polynomial of the adjacency matrix P (A) and the graph signal x. This deﬁnition is used to create the graph convolutional layer in GCNNs. The GSP literature includes [15] and [16]. In [15] and [16], several sampling set selection and sampling methods are proposed. The pooling methods explored herein are not based speciﬁcally on these sampling methods, but we observe that there is a relationship between sampling in GSP and pooling in GCNNs. Both reduce the number of values in the signal and can reduce the number of nodes in the graph. The key difference is that, in sampling, we focus on how to recover the original signal given the sampled signal. However, recoverability is not required in pooling algorithms in GCNNs. II. RELATED WORK In this section, we describe the infrastructure for graph convolutional and pooling layers and the related literature. A. Graph Convolutional Layer We concentrate on three implementations of GCNNs, de- rived from different deﬁnitions of graph convolution: graph convolutional networks (GCNs) [6], GraphSAGE [10], and topology-adaptive graph convolutional networks (TAGCNs) [7], [9]. In GCN [6], given a graph signal X (0) ∈ Rn×c (where 0 denotes the input layer, n is the number of nodes, and c is Fig. 1: Graph pooling, yielding a new signal and Adjacency matrix 2 X (l)W (l)) the number of features/input channels) and a graph structure A ∈ Rn×n, a graph convolutional layer is deﬁned as follows: (1) 2 AD− 1 X (l+1) = σ(D− 1 where A = A + In, Dii = j Aij, W (l) ∈ Rc×c is the trainable weight matrix, σ is the nonlinear activation function, and c is the number of output channels. l = 0 for the ﬁrst layer, and we can propagate the graph signal through additional layers in the network. This approach is based on a ﬁrst-order approximation of localized spectral ﬁlters on graphs [17]. In GraphSAGE [10], graph convolution is deﬁned as fol- lows, for each node v of X: X (l+1) v = σ(W (l) · fk X (l) v , where fk is an aggregator function (e.g., sum, mean, or max), and SN (v) is a random sample of the node v’s neighbors. In TAGCN [7], it is deﬁned as follows: u ,∀u ∈ SN (v) X (l) (2) (3) N i=0 X (l+1) = σ where Dii = j Aij and W (l) 0...n ∈ Rc×c . (D− 1 2 AD− 1 2 )iX (l)W (l) i B. Graph Pooling Layer Similar to graph convolution, graph pooling is inspired by pooling in CNNs. In addition to static pooling methods [18], [19], various differentiable methods have been proposed. Using the same notation as (1), a graph pooling operator should yield a new signal x ∈ Rn×c and adjacency matrix A ∈ Rn×n , usually with n ≥ n. See Fig. 1 for an example. An important beneﬁt of graph pooling is the hierarchical representation of data and structure. Otherwise, global patterns in the data are usually not considered until the ﬁnal aggrega- tion layer of a network. Below we describe four recent graph pooling algorithms. 1) Sort Pooling: Sort Pooling (SortPool) [14] operates after the last graph convolution layer. Instead of summing or averaging features, SortPool arranges the vertices in a consistent order and outputs a representation with a ﬁxed set, so that further training using CNN can be done. The vertices are sorted based on their structural roles within the graph. Using the connection between graph convolution

and the Weisfeiler-Lehman subtree kernel [20], SortPool sorts the node features of the last layer individually, then sorts in descending order based on the layer before, and ﬁnally selects the top k nodes. 2) Differentiable Pooling: Differentiable Pooling (Diff- Pool) [12] is a differentiable graph pooling module that learns hierarchical representations of the graphs by aggregating nodes through several pooling layers. It uses a learned assignment matrix S ∈ Rn1×n2 and updates the graph signal and topology as follows: Z (l) A(l)S(l) X (l+1) = S(l)T A(l+1) = S(l)T (4) (5) where Z (l) ∈ Rn1×n1 is the GraphSAGE [10] operator with the mean aggregator. DiffPool achieves signiﬁcantly better prediction accuracy than GraphSAGE, SortPool, and certain kernel methods, especially when global features are important for classiﬁcation [12]. 3) Top-k Pooling: Top-k Pool [13] pools using a trainable projection vector p(l) ∈ Rf and select the top-k indices of the projection X (l)p(l) and the corresponding edges in A(l). Top-k pool is inspired by encoder-decoder architectures like U-Nets. In addition to the Top-k pool operation, there is also an Unpool operation that reverses the process. These two combined create the encoder-decoder model on graph, known as the graph U-Nets [13]. Reference [13] shows that Top-k pool with the U-net structure performs better than DiffPool, but we will show if it works well standalone vs. other pooling algorithms. 4) Self-Attention Graph Pooling: Self-Attention Graph Pooling (SagPool) [13] uses an attention mechanism to select the important nodes: y = GNN(X (l), A(l)) i = topk(y) X (l+1) = (X (l) tanh(y))i A(l+1) = A(l) i,i (6) (7) (8) (9) The attention score is calculated from GCN and the top k nodes are selected from it. Since graph convolution is used to obtain the self-attention score, SagPool uses both the graph features and structure [13]. Reference [13] shows that SAGPool performs better than DiffPool and Top-k Pool across some biochemical datasets. III. PROPOSED METHOD We ﬁrst compare GCN, GraphSAGE, and TAGCN for graph classiﬁcation across four benchmark datasets. We then investigate how pooling affects these results, by combining the different convolutional architectures with the four pooling techniques described above, i.e., SortPool [14], DiffPool [12], Top-k Pool [13], and SagPool [21]. In each instance, the pool- ing method is paired with GCN or GraphSAGE (determined by that used in each pooling paper), and compared with the pooling method paired with TAGCN. IV. EXPERIMENTS A. Datasets To evaluate the efﬁcacy of the different methods, we apply our methods on real-world graph kernel benchmarks. See Ta- ble I for the properties of these datasets. We evaluate our meth- ods on bioinfomatics datasets and social network datassets. Both MUTAG and Proteins datasets are bioinformatics data. MUTAG [22] is a dataset consisting of chemical compounds represented by graphs. The task is to predict whether the chemical compound is mutagenic. Proteins [23] is a dataset consisting of proteins represented by graphs. The objective is to predict whether a protein functions as an enzyme. In both of the datasets, the nodes are structure elements, and two nodes are connected if there is a chemical bond between the structure elements represented by the nodes. For social network datasets, we chose IMDB-Binary and Reddit-Binary. IMDB-Binary [24] is a set of graphs corre- sponding to ego-networks of actors and actresses. An edge is drawn between two actors if they were cast in the same movie. The task is to predict whether a movie is romance or action. In Reddit-Binary [24], each graph corresponds to an online discussion thread. An edge is drawn between two users if one has replied to the other. The task is to predict whether a thread belongs to a discussion forum or a question answering forum. TABLE I: Properties of Graph Classiﬁcation Datasets Dataset MUTAG Proteins IMDB-Binary Reddit-Binary Graphs 188 1113 1000 2000 Classes 2 2 2 2 Avg Nodes 17.7 39.06 19.77 429.63 Avg Edges 38.9 72.82 96.53 497.75 B. Network Training We perform 5-fold cross-validation to select the hyper- parameters from the validation accuracy and estimate the test accuracy. For the baselines, the hyperparameters are the number of graph convolutional layers, number of channels in each layer, dropout rates, pooling rate (number or percentage of nodes to keep), and (for TAGCN) order of polynomial ﬁlter. For a fairer comparison, we considered 1-5 layers for TAGCN vs. 1-15 layers for GCN and GraphSAGE when using graph polynomial ﬁlters of degree 3 (to show that 1 layer of TAGCN with degree n is not n layers of GCN/GraphSAGE). We use cross-entropy loss and ADAM optimization with a starting learning rate of 0.01, a decay factor of 0.5, and a decay step size of 50. Experiments were performed in PyTorch using code from the PyTorch Geometric Library [25]. C. Results Fig. 2 shows the results of GCNN variants with no pooling, DiffPool, SagPool, SortPool, and Top-K Pool. The green, orange, and blue bars are the means of the cross-validated accuracy and the smaller black error bars are their standard deviations.

Fig. 2: Comparison of graph classiﬁcation accuracies of GCNN Variant (TAGCN, GCN, GraphSAGE) for no pooling, SortPool, DiffPool, Top-k Pool, and SagPool across 4 datasets (MUTAG, Proteins, IMDB-Binary, Reddit-Binary)

1) Graph Convolution Comparison: In general, TAGCN performs better than GCN and GraphSAGE on the four graph classiﬁcation benchmarks. However, due to the increase in complexity, TAGCN has high variance, especially denser graph structures. TAGCN performs better as graphs become less sparse, i.e., as average degree increases. We also showed empirically that simply increasing num- ber of layers in GCN and GraphSAGE is not analogous to increasing the order of the polynomial ﬁlter in TAGCN. We attribute the the advantage of TAGCN mainly to: 1) Passing a residual connection of the graph signal, and 2) Having weights associated with each polynomial of the adjacency matrix. In comparison, GCN and GraphSAGE do not improve much after ﬁve layers, perhaps also suffering from oversmoothing. 2) Graph Convolution and Graph Pooling Comparison: Among the pooling algorithms, DiffPool generally performs the best. SagPool and SortPool perform better for MUTAG and Proteins, but similar or worse for IMDB-Binary and Reddit- Binary. Top-k pool performs poorly, suggesting that it requires the auto-encoder structure to perform better. In general, only Diffpool is consistently better than no pooling. The results for graph convolution apply to graph pooling with graph convolution. TAGCN with pooling generally per- forms better than GCN and GraphSAGE with pooling and more prone to overﬁtting, likely due to the same reasons. V. CONCLUSION On average, TAGCN generally performs well against GCN and GraphSAGE on graph classiﬁcation datasets with and without pooling for sparser and larger graphs. We also ﬁnd that DiffPool generally outperforms the other pooling methods evaluated. For future work, we would like to develop a better theoretical understanding of GCNNs, by studying different problems like oversmoothing and the design of different parameters. REFERENCES [1] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric Deep Learning: Going Beyond Euclidean Data,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18–42, July 2017. [2] A. Sandryhaila and J. M. F. Moura, “Discrete Signal Processing on Graphs,” IEEE Trans. Signal Proc., vol. 61, no. 7, pp. 1644–1656, April 2013. [3] A. Ortega, P. Frossard, J. Kovaevi, J. M. F. Moura, and P. Vandergheynst, “Graph Signal Processing: Overview, Challenges, and Applications,” Proceedings of the IEEE, vol. 106, no. 5, pp. 808–828, May 2018. [4] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral Networks and Locally Connected Networks on Graphs,” in International Conference on Learning Representations (ICLR), 2014. [5] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering,” in Advances in Neural Information Processing Systems (NIPS), 2016. [6] T. N. Kipf and M. Welling, “Semi-Supervised Classiﬁcation with Graph Convolutional Networks,” in 5th International Conference on Learning Representations, (ICLR) 2017, Toulon, France, dApril 24-26, 2017, Conference Track Proceedings, 2017. [7] J. Du, S. Zhang, G. Wu, J. M. F. Moura, and S. Kar, “Topology Adaptive Graph Convolutional Networks,” Computing Research Repository, vol. abs/1710.10370, 2017. [Online]. Available: http://arxiv.org/abs/1710. 10370 [8] J. Du, J. Shi, S. Kar, and J. M. F. Moura, “On Graph Convolution For Graph CNNs,” in 2018 IEEE Data Science Workshop (DSW), June 2018, pp. 1–5. [9] J. Shi, M. Cheung, J. Du, and J. M. F. Moura, “Classiﬁcation with Vertex-Based Graph Convolutional Neural Networks,” in 2018 52nd Asilomar Conference on Signals, Systems, and Computers, Oct 2018, pp. 752–756. [10] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive Representation Learning on Large Graphs,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 1024–1034. [Online]. Available: http://papers.nips.cc/ paper/6703-inductive-representation-learning-on-large-graphs.pdf [11] P. Velikovi, G. Cucurull, A. Casanova, A. Romero, P. Li`o, and Y. Ben- gio, “Graph Attention Networks,” in 6th International Conference on Learning Representations (ICLR), 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018. [12] R. Ying, J. You, C. Morris, X. Ren, W. L. Hamilton, and J. Leskovec, “Hierarchical Graph Representation Learning with Differentiable Pooling,” in 32nd International Conference on Neural Information Processing Systems (NIPS). Curran Associates Inc., 2018, pp. 4805– 4815. [Online]. Available: http://dl.acm.org/citation.cfm?id=3327345. 3327389 [13] H. Gao and S. Ji, “Graph U-Nets,” in 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. Long Beach, California, USA: PMLR, 09–15 Jun 2019, pp. 2083–2092. [Online]. Available: http://proceedings.mlr.press/v97/gao19a.html [14] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An End-to-End Deep Learning Architecture for Graph Classiﬁcation,” in 32nd AAAI Conference on Artiﬁcial Intelligence, 2018. [Online]. Available: https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17146 [15] S. Chen, R. Varma, A. Sandryhaila, and J. Kovaevi, “Discrete Signal Processing on Graphs: Sampling Theory,” IEEE Transactions on Signal Processing, vol. 63, no. 24, pp. 6510–6523, Dec 2015. [16] A. Anis, A. Gadde, and A. Ortega, “Efﬁcient Sampling Set Selection for Bandlimited Graph Signals Using Graph Spectral Proxies,” IEEE Transactions on Signal Processing, vol. 64, no. 14, pp. 3775–3789, July 2016. [17] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on Graphs via Spectral Graph Theory,” Applied and Computational Har- monic Analysis, vol. 30, no. 2, pp. 129 – 150, 2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1063520310000552 [18] E. Luzhnica, B. Day, and P. Lio’, “Clique Pooling for Graph Classiﬁ- cation,” Computing Research Repository, vol. abs/1904.00374, 2019. [19] Y. Boykov and O. Veksler, “Graph Cuts in Vision and Graphics:Theories and Application,” in Handbook of Mathematical Models in Computer Vision, N. Paragios, Y. Chen, and O. D. Faugeras, Eds. Springer, 2006, pp. 79–96. [20] N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-Lehman Graph Kernels,” J. Mach. Learn. Res., vol. 12, pp. 2539–2561, Nov. 2011. [Online]. Available: http://dl.acm.org/citation.cfm?id=1953048.2078187 [21] J. Lee, I. Lee, and J. Kang, “Self-Attention Graph Pooling,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. Long Beach, California, USA: PMLR, 09–15 Jun 2019, pp. 3734–3743. [Online]. Available: http://proceedings.mlr.press/v97/lee19c.html [22] A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch, “Structure-activity Relationship of Mutagenic Aromatic and Heteroaromatic Nitro Compounds. Correlation with Molecular Or- bital Energies and Hydrophobicity,” Journal of Medicinal Chemistry, vol. 34, no. 2, pp. 786–797, 1991. [23] B. Karsten, C. S. Ong, S. Schnauer, S. Vishwanathan, A. J. Smola, and H.-P. Kriegel, “Protein Function Prediction via Graph Kernels,” Intelligent Systems for Molecular Biology, 2005. [24] P. Yanardag and S. Vishwanathan, “Deep Graph Kernels,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’15. New York, NY, USA: ACM, 2015, pp. 1365–1374. [25] M. Fey and J. E. Lenssen, “Fast Graph Representation Learning with PyTorch Geometric,” in ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.

分享到：

赞收藏

资料库

图卷积神经网络中的池化综述.pdf

相关推荐

人工智能

热门标签

最新资料