Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 38

3.1.2 Weights Optimization

Оглавление

The specific mapping with a network is obtained by an appropriate choice of weight values. Optimizing a set of weights is referred to as network training. An example of supervised learning scheme is shown in Figure 3.3. A training set of input vectors associated with the desired output vector, {(x1, d1), … (xP, dP)}, is provided. The difference between the desired output and the actual output of the network, for a given input sequence x, is defined as the error

(3.3)

The overall objective function to be minimized over the training set is the given squared error

(3.4)

The training should find the set of weights w that minimizes the cost J subject to the constraint of the network topology. We see that training a neural network represent a standard optimization problem.

A stochastic gradient descent (SGD) algorithm is an option as an optimization method. For each sample from the training set, the weights are adapted as

(3.5)

where is the error gradient for the current input pattern, and μ is the learning rate.

Backpropagation: This is a standard way to find in Eq. (3.5). Here we provide a formal derivation.

Single neuron case – Consider first a single linear neuron, which we may describe compactly as

(3.6)

where w = [w0, w1, … wN] and x = [1, x1, … xN]. In this simple setup


Figure 3.3 Schematic representation of supervised learning.

(3.7)

so that Δw = 2μex. From this, we have Δwi = 2μexi , which is the least mean square (LMS) algorithm.

In a multi‐layer network, we just formally extend this procedure. For this we use the chain rule

(3.8)

with leading to the weight update .

Parameters δ are derived recursively starting from the output layer:

(3.9)

where f ′ is the derivative of the sigmoid function of s. We have also used for the output layer . With this, at the output layer, each neuron has an explicit desired response, so we can write

(3.10)

Substituting into Eq. (3.9) yields .

To calculate the δ s, we note that eTe is influenced through indirectly through all node values in the next layer. Referring to the upper part of Figure 3.3, we again employ the chain rule

(3.11)

with

(3.12)

Recalling that , we get In summary, we have

(3.13)

(3.14)

For the bias weight we note that in Eq. (3.13). The above processing is illustrated in Figure 3.4, indicating the symmetry between the forward propagation of neuron activation values and the backward propagation of δ terms.

Figure 3.4 Illustration of backpropagation.

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks

Подняться наверх