Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 37
3 Artificial Neural Networks 3.1 Multi‐layer Feedforward Neural Networks 3.1.1 Single Neurons
ОглавлениеA biological [1] and mathematical model of a neuron can be represented as shown in Figure 3.1 with the output of the neuron modeled as
where xi are the inputs to the neuron, wi are the synaptic weights, and wb models a bias. In general, f represents the nonlinear activation function. Early models used a sign function for the activation. In this case, the output y would be +1 or −1 depending on whether the total input at the node s exceeds 0 or not. Nowadays, a sigmoid function is used rather than a hard threshold. One should immediately notice the similarity of Eqs. (3.1) and (3.2) with Eqs. (2.1) and (2.2) defining the operation of a linear predictor. This should suggest that in this chapter we will take the problem of parameter estimation to the next level. The sigmoid, shown in Figure 3.1, is a differentiable squashing function usually evaluated as y = tanh (s). This engineering model is an oversimplified approximation to the biological model. It neglects temporal relations. This is because the goals of the engineer differ from that of the neurobiologist. The former must use the models feasible for practical implementation. The computational abilities of an isolated neuron are extremely limited.
For electrical engineers, the most popular applications of single neurons are in adaptive finite impulse response (FIR) filters. Here, , where k represents a discrete time index. Usually, a linear activation function is used. In electrical engineering, adaptive filters are used in signal processing with practical applications like adaptive equalization, and active noise cancelation.
Multi‐layer neural networks: A neural network is built up by incorporating the basic neuron model into different configurations. One example is the Hopfield network, where the output of each neuron can have a connection to the input of all neurons in the network, including a self‐feedback connection. Another option is the multi‐layer feedforward network illustrated in Figure 3.2. Here, we have layers of neurons where the output of a neuron in a given layer is input to all the neurons in the next layer. We may also have sparse connections or direct connections that may bypass layers. In these networks, no feedback loops exist within the structure. These network are sometimes referred to as backpropagation networks.
Figure 3.1 From biological to mathematical simplified model of a neuron.
Source: CS231n Convolutional Neural Networks for Visual Recognition [1].
Figure 3.2 Block diagram of feedforward network.
Notation: A single neuron extracted from the l‐th layer of an L‐layer network is also depicted in Figure 3.2. Parameters denote the weights on the links between neuron i in the previous layer and neuron j in layer l. The output of the j‐th neuron in layer l is represented by the variable . The outputs in the last L‐th layer represent the overall outputs of the network. Here, we use notation yi for the outputs as . Parameters xi , defined as inputs to the network, may be viewed as a 0‐th layer with notation . These definitions are summarized in Table 3.1.
Table 3.1 Multi‐layer network notation.
Weight connecting neuron i in layer l − 1 to neuron j in layer l | |
Bias weight for neuron j in layer l | |
Summing junction for neuron j in layer l | |
Activation (output) value for neuron j in layer l | |
i‐th external input to network | |
i‐th output to network |
Define an input vector x = [x0, x1, x2, … xN] and output vector y = [y0, y1, y2, … yM]. The network maps, y = N(w, x), the input x to the outputs y using the weights w. Since fixed weights are used, this mapping is static; there are no internal dynamics. Still, this network is a powerful tool for computation.
It has been shown that with two or more layers and a sufficient number of internal neurons, any uniformly continuous function can be represented with acceptable accuracy. The performance rests on the ways in which this “universal function approximator” is utilized.