Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 47
3.4.2 Feedback Options in Recurrent Neural Networks
ОглавлениеFeedbacks in recurrent neural networks: In Figure 3.11, the inputs to the network are drawn from the discrete time signal (k). Conceptually, it is straightforward to consider connecting the delayed versions of the output, , of the network to its input. Such connections, however, introduce feedback into the network, and therefore the stability of such networks must be considered. The provision of feedback, with delay, introduces memory to the network and so is appropriate for prediction. The feedback within recurrent neural networks can be achieved in either a local or global manner. An example of a recurrent neural network is shown in Figure 3.11 with connections for both local and global feedback. The local feedback is achieved by the introduction of feedback within the hidden layer, whereas the global feedback is produced by the connection of the network output to the network input. Interneuron connections can also exist in the hidden layer, but they are not shown in Figure 3.11. Although explicit delays are not shown in the feedback connections, they are assumed to be present within the neurons for the network to be realizable. The operation of a recurrent neural network predictor that employs global feedback can now be represented by
(3.60)
Figure 3.11 Recurrent neural network.
where again Φ(·) represents the nonlinear mapping of the neural network and ê , j = 1, … , q.
State‐space representation and canonical form: Any feedback network can be cast into a canonical form that consists of a feedforward (static) network (FFSN) (i) whose outputs are the outputs of the neurons that have the desired values, and the values of the state variables, and (ii) whose inputs are the inputs of the network and the values of the state variables, the latter being delayed by one time unit.
The general canonical form of a recurrent neural network is represented in Figure 3.12. If the state is assumed to contain N variables, then a state vector is defined as s(k) = [s1(k), s2(k), … , sN(k)]T, and a vector of p external inputs is given by y(k − 1) = [y(k − 1), y(k − 2), … , y(k − p)]T. The state evolution and output equations of the recurrent network for prediction are given, respectively, by
Figure 3.12 Canonical form of a recurrent neural network for prediction.
Figure 3.13 Recurrent neural network (RNN) architectures: (a) activation feedback and (b) output feedback.
(3.61)
(3.62)
where φ and Ψ represent general classes of nonlinearities.
Recurrent neural network (RNN) architectures: Activation feedback and output feedback are two ways to include recurrent connections in neural networks, as shown in Figure 3.13a and b, respectively.
The output of a neuron shown in Figure 3.13a can be expressed as
(3.63)
where ωu,i and ωv,i are the weights associated with u and v, respectively. In the case of Figure 3.13b, we have
(3.64)
where ωy, j are the weights associated with the delayed outputs. The previous networks exhibit a locally recurrent structure, but when connected into a larger network, they have a feedforward architecture and are referred to as locally recurrent–globally feedforward (LRGF) architectures. A general LRGF architecture is shown in Figure 3.14. It allows dynamic synapses to be included within both the input (represented by H1, … , HM) and the output feedback (represented by HFB), some of the aforementioned schemes. Some typical examples of these networks are shown in Figures 3.15–3.18.
The following equations fully describe the RNN from Figure 3.17
(3.65)
Figure 3.14 General locally recurrent–globally feedforward (LRGF) architecture.
Figure 3.15 An example of Elman recurrent neural network (RNN).
Figure 3.16 An example of Jordan recurrent neural network (RNN).
where the (p + N + 1) × 1 dimensional vector u comprises both the external and feedback inputs to a neuron, as well as the unity valued constant bias input.
Training: Here, we discuss training the single fully connected RNN shown in Figure 3.17. The nonlinear time series prediction uses only one output neuron of the RNN. Training of the RNN is based on minimizing the instantaneous squared error at the output of the first neuron of the RNN which can be expressed as
(3.66)
where e(k) denotes the error at the output y1 of the RNN, and s(k) is the training signal. Hence, the correction for the l‐th weight of neuron k at the time instant k is
(3.67)
Figure 3.17 A fully connected recurrent neural network (RNN; Williams–Zipser network) The neurons (nodes) are depicted by circles and incorporate the operation Φ (sum of inputs).
Since the external signal vector s does not depend on the elements of W, the error gradient becomes ∂e(k)/∂ωn,l(k) = − ∂y1(k)/∂ωn,l(k). Using the chain rule gives
(3.68)
where δnl = 1 if n = l and 0 otherwise. When the learning rate η is sufficiently small, we have ∂yα(k − 1)/∂ωn, l(k) ≈ ∂yα(k − 1)/∂ωn, l(k − 1). By introducing the notation we have recursively for every time step k and all appropriate j, n and l
(3.69)
Figure 3.18 Nonlinear IIR filter structures. (a) A recurrent nonlinear neural filter, (b) a recurrent linear/nonlinear neural filter structure.
with the initial conditions . We introduce three new matrices, the N × (N + p + 1) matrix Θj(k), the N × (N + p + 1) matrix Uj (k), and the N × N diagonal matrix F(k), as
(3.70)
With this notation, the gradient updating equation regarding the recurrent neuron can be symbolically expressed as
(3.71)
where Wα denotes the set of those entries in W that correspond to the feedback connections.