Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 75
5.1 Concept of Graph Neural Network (GNN)
ОглавлениеAs a start, in this section we present a brief overview of the and then in the subsequent sections discuss some of the most popular variants in more detail.
The basic idea here is to extend existing neural networks for the purpose of processing the data represented in graph domains [1]. In a graph, each node is defined by its own features and the features of the related nodes. The target of GNN is to learn a state embedding hv ∈ ℝs that contains information on the neighborhood for each node. The state embedding hv is an s‐dimension vector of node v and can be used to produce an output ov . Let f be a parametric function, called the local transition function, that is shared among all nodes and updates the node state according to the input neighborhood. Let g be the local output function that describes how the output is produced. Then, hv and ov are defined as
where xv, xco[v], hne[v], and xne[v] are the features of v, the features of its edges, the states, and the features of the nodes in the neighborhood of v, respectively. If H, O, X, and XN are the vectors constructed by stacking all the states, all the outputs, all the features, and all the node features, respectively, then we can write
In Eq. (5.2), F, the global transition function, and G, the global output function, are stacked versions of f and g for all nodes in a graph, respectively. The value of H is the fixed point of Eq. (5.2) and is uniquely defined with the assumption that F is a contraction map. GNN uses the following iterative scheme for computing the state (Banach’s fixed‐point theorem [2])
where Ht denotes the t‐th iteration of H. The dynamical system (Eq. (5.3)) converges exponentially fast to the solution of Eq. (5.2) for any initial value H(0) . Note that the computations described in f and g can be interpreted as feedforward neural networks (). Given the framework of GNN, how do we learn the parameters of f and g? With the target information (tv for a specific node) for the supervision, the loss can be written as follows:
(5.4)
where p is the number of supervised nodes. The learning algorithm is based on a gradient‐descent strategy and is composed of the following steps:
1 The states are iteratively updated by Eq. (5.1) until a time T.They approach the fixed‐point solution of Eq. (5.2) H(T)≈ H.
2 The gradient of weights W is computed from the loss.
3 The weights W are updated according to the gradient computed in the last step.