Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 87

Theorem 5.2

Оглавление

(Backpropagation): Let Fw and Gw be the transition and the output functions of a GNN, respectively, and assume that Fw(x, l) and Gw(x, lN) are continuously differentiable w.r.t. x and w. Let z(t) be defined by

(5.80)

Then, the sequence z(T), z(T − 1)…. . converges to a vector z = limt → − ∞ z(t), and the convergence is exponential and independent of the initial state (T). In addition, we have

(5.81)

where x is the stable state of the GNN (for the proof, see [1]).

The relationship between the gradient defined by Eq. (5.81) and the gradient computed by the Almeida–Pineda algorithm can be easily recognized. The first term on the right‐hand side of Eq. (5.81) represents the contribution to the gradient due to the output function Gw . Backpropagation calculates the first term while it is propagating the derivatives through the layer of the functions gw (see Chapter 3, Figure 3.10). The second term represents the contribution due to the transition function Fw . In fact, from Eq. (5.80)


If we assume z (T) = ∂ew(T)/∂o(T) · (∂Gw/∂x(T))(x(T), lN) and x (t) = x, for t 0 ≤ tT, it follows that


So, Eq. (5.80) accumulates the ∂ew(T)/∂x(i) into the variable z. This mechanism corresponds to backpropagating the gradients through the layers containing the fw units. The learning algorithm is detailed in Table 5.1 [1]. It consists of a main procedure and of two functions: FORWARD and BACKWARD. The function FORWARD takes as input the current set of parameters w and iterates to find the convergent fixed point. The iteration is stopped when ‖x(t) − x(t − 1)‖ is less than a given threshold εf according to a given norm ‖ · ‖. The function BACKWARD computes the gradient: system Eq. (5.80) is iterated until ‖z(t − 1) − z(t)‖ is smaller than a threshold εb ; then, the gradient is calculated by Eq. (5.81).

The function FORWARD computes the states, whereas BACKWARD calculates the gradient. The procedure MAIN minimizes the error by calling FORWARD and BACKWARD iteratively.Transition and output function implementations: The implementation of the local output function gw does not need to fulfill any particular constraint. In GNNs, gw is a multilayered FNN. On the other hand, the local transition function fw plays a crucial role in the proposed model, since its implementation determines the number and the existence of the solutions of Eq. (5.74). The assumption underlying GNN is that the design of fw is such that the global transition function Fw is a contraction map with respect to the state x. In the following, we describe two neural network models that fulfill this purpose using different strategies. These models are based on the nonpositional form described by Eq. (5.76). It can be easily observed that there exist two corresponding models based on the positional form as well.

1 Linear (nonpositional) GNN. Eq. (5.76) can naturally be implemented by(5.82) where the vector bn ∈ Rs and the matrix An, u ∈ Rs × s are defined by the output of two FNNs, whose parameters correspond to the parameters of the GNN. More precisely, let us call the transition network an FNN that has to generate An, u and the forcing network another FNN that has to generate bn . Let φw : and ρw : be the functions implemented by the transition and the forcing network, respectively. Then, we define(5.83) where μ ∈ (0, 1) and Θ =resize((φw(ln, l(n, u), lu)) hold, and resize (·) denotes the operator that allocates the elements of an s2‐dimensional vector into an s × s matrix. Thus, An, u is obtained by arranging the outputs of the transition network into the square matrix Θ and by multiplication with the factor μ/s ∣ ne[u]∣. On the other hand, bn is just a vector that contains the outputs of the forcing network. In the following, we denote the 1‐norm of a matrix M = {mi, j} as ‖M‖1 = maxj ∑ ∣ mi, j∣ and assume that ‖φw(ln, l(n, u), lu)‖1 ≤ s holds; this can be straightforwardly verified if the output neurons of the transition network use an appropriately bounded activation function, for example, a hyperbolic tangent.

Table 5.1 [1] Learning algorithm.

M ain initialize w; x = Forward(w) ; repeat until (a stopping criterion); return w; end F orward ( w ) Initialize x (0) , t = 0 ; repeat x (t + 1) = Fw ( x (t), l ) ; t = t + 1 ; until ‖ x (t) − x (t − 1)‖ ≤ εf return x (t) ; end Backward ( x , w ) end

The function FORWARD computes the states, whereas BACKWARD calculates the gradient. The procedure MAIN minimizes the error by calling FORWARD and BACKWARD iteratively.

Note that in this case Fw(x, l) = Ax + b, where b is the vector constructed by stacking all the bn , and A is a block matrix , with if u is a neighbor of otherwise. Vectors bn and matrices An,u do not depend on the state x, but only on the node and edge labels. Thus, ∂Fw/∂x = A, and, by simple algebra we have

which implies that Fw is a contraction map (w.r. t. ‖ ‖1 ) for any set of parameters w.

1 Nonlinear (nonpositional) GNN. In this case, hw is realized by a multilayered feedforward NN. Since three‐layered neural networks are universal approximators [67], hw can approximate any desired function. However, not all the parameters w can be used, because it must be ensured that the corresponding transition function Fw is a contraction map. This can be achieved by adding a penalty term to Eq. (5.79), that is where the penalty term L(y) is (y − μ)2 if y > μ and 0 otherwise, and the parameter μ ∈ (0, 1) defines the desired contraction constant of Fw . More generally, the penalty term can be any expression, differentiable with respect to w, that is monotone increasing with respect to the norm of the Jacobian. For example, in our experiments, we use the penalty term , where Ai is the i‐th column of ∂Fw/∂x. In fact, such an expression is an approximation of L(‖∂Fw/∂x‖1) = L(maxi‖Ai‖1).

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks

Подняться наверх