Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 80
5.2.1 RecGNNs
ОглавлениеThese networks go back to the beginnings of GNNs. They apply the same set of parameters recurrently over nodes in a graph to extract high‐level node representations. Limited by computational power, earlier research mainly focused on directed acyclic graphs, while later works extended these models to handle general types of graphs, for example, acyclic, cyclic, directed, and undirected graphs [1]. Based on an information diffusion mechanism, this particular GNN (in the sequel referred to as GNN*) updates nodes’ states by exchanging neighborhood information recurrently until a stable equilibrium is reached. A node’s hidden state is recurrently updated by
(5.33)
where f(·) is a parametric function, and is initialized randomly. The sum operation enables GNN* to be applicable to all nodes, even if the number of neighbors differs and no neighborhood ordering is known. To ensure convergence, the recurrent function f(·) must be a contraction mapping, which shrinks the distance between two points after projecting them into a latent space. If f(·) is a neural network, a penalty term has to be imposed on the Jacobian matrix of parameters. When a convergence criterion is satisfied, the last step node’s hidden states are forwarded to a readout layer. GNN* alternates the stage of node state propagation and the stage of parameter gradient computation to minimize a training objective. This strategy enables GNN to handle cyclic graphs.
Graph Echo State Network (GraphESN), developed in follow‐up works [40], extends echo state networks to improve the training efficiency of GNN*. GraphESN consists of an encoder and an output layer. The encoder is randomly initialized and requires no training. It implements a contractive state transition function to recurrently update node states until the global graph state reaches convergence. Afterward, the output layer is trained by taking the fixed node states as inputs.
Figure 5.2 Illustrations of ConvGNN network: (a) A ConvGNN with multiple graph convolutional layers. A graph convolutional layer encapsulates each node’s hidden representation by aggregating feature information from its neighbors. After feature aggregation, a nonlinear transformation is applied to the resulted outputs. By stacking multiple layers, the final hidden representation of each node receives messages from a further neighborhood. (b) Recurrent Graph Neural Networks (RecGNNs) use the same graph recurrent layer (Grec) to update node representations. (c) Convolutional Graph Neural Networks (ConvGNNs) use a different graph convolutional layer (Gconv) to update node representations.
Source: Wu et al. [38].
Figure 5.3 2D Convolution versus graph convolution: (a) 2D convolution. Analogous to a graph, each pixel in an image is taken as a node where neighbors are determined by the filter size. The 2D convolution takes the weighted average of pixel values of the gray node along with its neighbors. The neighbors of a node are ordered and have a fixed size. (b) Graph convolution. To get a hidden representation of the gray node, one simple solution of the graph convolutional operation is to take the average value of the node features of the gray node along with its neighbors. Unlike image data, the neighbors of a node are unordered and variable in size.
Source: Wu et al. [38].
GGNN [41] employs a GRU [42] as a recurrent function, reducing the recurrence to a fixed number of steps. The advantage is that it no longer needs to constrain parameters to ensure convergence. A node’s hidden state is updated by its previous hidden states and its neighboring hidden states, defined as
Figure 5.4 Parametric graph convolution: (a) Conventional graph convolutional concept. (b) Parametric graph convolution, where r controls the maximum distance of the considered neighborhood, and the dimensionality of the output [39].
Figure 5.5 A ConvGNN with pooling and readout layers for graph classification. A graph convolutional layer is followed by a pooling layer to coarsen a graph into subgraphs so that node representations on coarsened graphs represent higher graph‐level representations. A readout layer summarizes the final graph representation by taking the sum/mean of hidden representations of subgraphs.
Source: Wu et al. [38].
(5.34)
where . Unlike GNN and GraphESN, GGNN uses the backpropagation through time (BPTT) algorithm to learn the model parameters. This can be problematic for large graphs, as GGNN needs to run the recurrent function multiple times over all nodes, requiring the intermediate states of all nodes to be stored in memory.
Stochastic Steady‐state Embedding (SSE) uses a learning algorithm that is more scalable to large graphs [43]. It updates a node’s hidden states recurrently in a stochastic and asynchronous fashion. It alternatively samples a batch of nodes for state update and a batch of nodes for gradient computation. To maintain stability, the recurrent function of SSE is defined as a weighted average of the historical states and new states, which takes the form
where α is a hyperparameter, and is initialized randomly. Although conceptually important, SSE does not theoretically prove that the node states will gradually converge to fixed points by applying Eq. (5.35) repeatedly.