Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 82
5.2.3 Graph Autoencoders (GAEs)
ОглавлениеThese are deep neural architectures that map nodes into a latent feature space and decode graph information from latent representations. GAEs can be used to learn network embeddings or generate new graphs.
Network embedding (encoding) is a low‐dimensional vector representation of a node that preserves a node’s topological information. GAEs learn network embeddings using an encoder to extract network embeddings and using a decoder to enforce network embeddings to preserve the graph topological information such as the PPMI matrix and the adjacency matrix (see Figure 5.6).
Earlier approaches mainly employ multi‐layer perceptrons to build GAEs for network embedding learning. Deep Neural Network for Graph Representations (DNGR) uses a stacked denoising autoencoder to encode and decode the PPMI matrix via multi‐layer perceptrons. Concurrently, Structural Deep Network Embedding (SDNE) uses a stacked autoencoder to jointly preserve the node first‐order proximity and second‐order proximity. SDNE proposes two loss functions on the outputs of the encoder and the outputs of the decoder separately. The first loss function enables the learned network embeddings to preserve the node’s first‐order proximity by minimizing the distance between a node’s network embedding and its neighbors’ network embeddings. The first loss function L1st is defined as
Figure 5.6 A graph autoencoder (GAE) for network embedding. The encoder uses graph convolutional layers to get a network embedding for each node. The decoder computes the pairwise distance given network embeddings. After applying a nonlinear activation function, the decoder reconstructs the graph adjacency matrix. The network is trained by minimizing the discrepancy between the real adjacency matrix and the reconstructed adjacency matrix.
Source: Wu et al. [38].
(5.60)
where xv = Av, : and enc(·) is an encoder that consists of a multi‐layer perceptron. The second loss function enables the learned network embeddings to preserve the node’s second‐order proximity by minimizing the distance between a node’s inputs and its reconstructed inputs and is defined as
(5.61)
where bv, u = 1 if Av, u = 0, bv, u = β > 1 if Av, u = 1, and dec(·) is a decoder that consists of a multi‐layer perceptron.
DNGR [54] and SDNE [55] only consider node structural information about the connectivity between pairs of nodes. They ignore the fact that the nodes may contain feature information that depicts the attributes of nodes themselves. Graph Autoencoder (GAE*) [56] leverages GCN [14] to encode node structural information and node feature information at the same time. The encoder of GAE* consists of two graph convolutional layers, which takes the form
where Z denotes the network embedding matrix of a graph, f(·) is a ReLU activation function, and the Gconv (·) function is a graph convolutional layer defined by Eq. (5.44). The decoder of GAE* aims to decode node relational information from their embeddings by reconstructing the graph adjacency matrix, which is defined as
(5.63)
where zv is the embedding of node v. GAE* is trained by minimizing the negative cross‐entropy given the real adjacency matrix A and the reconstructed adjacency matrix
Simply reconstructing the graph adjacency matrix may lead to overfitting due to the capacity of the autoencoders. The variational graph autoencoder (VGAE) [56] is a variational version of GAE that was developed to learn the distribution of data. The VGAE optimizes the variational lower bound L:
where KL(·) is the Kullback–Leibler divergence function, which measures the distance between two distributions; p(Z) is a Gaussian prior , with q(zi ∣ X, A) = N(zi ∣ μi , diag .
The mean vector μi is the i‐th row of an encoder’s outputs defined by Eq. (5.62), and log σi is derived similarly as μi with another encoder. According to Eq. (5.64), VGAE assumes that the empirical distribution q(Z ∣ X, A) should be as close as possible to the prior distribution p(Z). To further enforce this, the empirical distribution q(Z ∣ X, A) is chosen to approximate the prior distribution p(Z).
Like GAE*, GraphSAGE [23] encodes node features with two graph convolutional layers. Instead of optimizing the reconstruction error, GraphSAGE shows that the relational information between two nodes can be preserved by negative sampling with the loss:
(5.65)
where node u is a neighbor of node v, node vn is a distant node to node v and is sampled from a negative sampling distribution Pn(v), and Q is the number of negative samples. This loss function essentially imposes similar representations on close nodes and dissimilar representations on distant nodes.
Deep Recursive Network Embedding (DRNE) [57] assumes that a node’s network embedding should approximate the aggregation of its neighborhood network embeddings. It adopts an LSTM network [26] to aggregate a node’s neighbors. The reconstruction error of DRNE is defined as
where zv is the network embedding of node v obtained by a dictionary look‐up, and the LSTM network takes a random sequence of node v′s neighbors ordered by their node degree as inputs. As suggested by Eq. (5.66), DRNE implicitly learns network embeddings via an LSTM network rather than by using the LSTM network to generate network embeddings. It avoids the problem that the LSTM network is not invariant to the permutation of node sequences.Network representations with adversarially regularized autoencoders (NetRA) [58] proposes a graph encoder‐decoder framework with a general loss function, defined as
(5.67)
where dist (·) is the distance measure between the node embedding z and the reconstructed z. The encoder and decoder of NetRA are LSTM networks with random walks rooted on each node v ∈ V as inputs.
Graph generation (decoding): With multiple graphs, GAEs are able to learn the generative distribution of graphs by encoding graphs into hidden representations and decoding a graph structure given the hidden representations. These methods either propose a new graph in a sequential manner or in a global manner.
Sequential approaches generate a graph by proposing nodes and edges step by step.
Deep Generative Model of Graphs (DeepGMG) [59] assumes that the probability of a graph is the sum over all possible node permutations:
(5.68)
where π denotes a node ordering. It captures the complex joint probability of all nodes and edges in the graph. DeepGMG generates graphs by making a sequence of decisions, namely, whether to add a node, which node to add, whether to add an edge, and which node to connect to the new node. The decision process of generating nodes and edges is conditioned on the node states and the graph state of a growing graph updated by an RecGNN.
Global approaches output a graph all at once.
Graph variational autoencoder (GraphVAE) [60] models the existence of nodes and edges as independent random variables. By assuming the posterior distribution qφ (z∣ G) defined by an encoder and the generative distribution pθ(G ∣ z) defined by a decoder, GraphVAE optimizes the variational lower bound:
(5.69)
where p(z) follows a Gaussian prior, φ and θ are learnable parameters. With a ConvGNN as the encoder and a simple multi‐layer perception as the decoder, GraphVAE outputs a generated graph with its adjacency matrix, node attributes, and edge attributes