Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 49
3.5 Cellular Neural Networks (CeNN)
ОглавлениеA spatially invariant CeNN architecture [14, 15] is an M × N array of identical cells (Figure 3.21 [top]). Each cell, Cij, (i, j) ∈ {1, M} × {1, N}, has identical connections with adjacent cells in a predefined neighborhood, Nr(i, j), of radius r. The size of the neighborhood is m = (2r + 1)2, where r is a positive integer.
A conventional analog CeNN cell consists of one resistor, one capacitor, 2m linear voltage‐controlled current sources (VCCSs), one fixed current source, and one specific type of nonlinear voltage‐controlled voltage source (Figure 3.21 [bottom]). The input, state, and output of a given cell Cij correspond to the nodal voltages uij, xij , and yij respectively. VCCSs controlled by the input and output voltages of each neighbor deliver feedback and feedforward currents to a given cell. The dynamics of a CeNN are captured by a system of M × N ordinary differential equations, each of which is simply the Kirchhoff’s current law (KCL) at the state nodes of the corresponding cells per Eq. (3.74).
Figure 3.21 (Top) Cellular neural networks (CeNN) architecture, (bottom) circuitry in CeNN cell.
CeNN cells typically employ a nonlinear sigmoid‐like transfer function at the output to ensure fixed binary output levels. The parameters aij,kl and bij,kl serve as weights for the feedback and feedforward currents from cell Ckl to cell Cij. Parameters aij,kl and bij,kl are space invariant and are denoted by two (2r + 1) × (2r + 1) matrices. (If r = 1, they are captured by 3 × 3 matrices.) The matrices of a and b parameters are typically referred to as the feedback template (A) and the feedforward template (B), respectively. Design flexibility is further enhanced by the fixed bias current Z that provides a means to adjust the total current flowing into a cell. A CeNN can solve a wide range of image processing problems by carefully selecting the values of the A and B templates (as well as Z). Various circuits, including inverters, Gilbert multipliers, operational transconductance amplifiers (OTAs), etc. [15, 16], can be used to realize VCCSs. OTAs provide a large linear range for voltage‐to‐current conversion, and can implement a wide range of transconductances allowing for different CeNN templates. Nonlinear templates/OTAs can lead to CeNNs with richer functionality. For more information, see [14,17–19].
Memristor‐based cellular nonlinear/neural network (MCeNN): The memristor was theoretically defined in the late 1970s, but it garnered renewed research interest due to the recent much‐acclaimed discovery of nanocrossbar memories by engineers at the Hewlett‐Packard Labs. The memristor is a nonlinear passive device with variable resistance states. It is mathematically defined by its constitutive relationship of the charge q and the flux ϕ, that is, dϕ/dt = (dϕ(q)/dq)·dq/dt. Based on the basic circuit law, this leads to v(t) = (dϕ(q)/dq)·i(t) = M(q)i(t), where M(q) is defined as the resistance of a memristor, called the memristance, which is a function of the internal current i and the state variable x. The Simmons tunnel barrier model is the most accurate physical model of TiO2/TiO2−x memristor, reported by Hewlett‐Packard Labs [20].
The memristor is expected to be co‐integrated with nanoscale CMOS technology to revolutionize conventional von Neumann as well as neuromorphic computing. In Figure 3.22, a compact convolutional neural network (CNN) model based on memristors is presented along with its performance analysis and applications. In the new CNN design, the memristor bridge circuit acts as the synaptic circuit element and substitutes the complex multiplication circuit used in traditional CNN architectures. In addition, the negative differential resistance and nonlinear current–voltage characteristics of the memristor have been leveraged to replace the linear resistor in conventional CNNs. The proposed CNN design [21–26] has several merits, for example, high density, nonvolatility, and programmability of synaptic weights. The proposed memristor‐based CNN design operations for implementing several image processing functions are illustrated through simulation and contrasted with conventional CNNs.
Training MCeNN: In the classical representation, the conductance of a memristor G depends directly on the integral over time of the voltage across the device, sometimes referred to as the flux. Formally, a memristor obeys i(t) = G(s(t))v(t) and . A generalization of the memristor model, called a memristive system, was proposed in [27]. In memristive devices, s is a general state variable, rather than an integral of the voltage. Such memristive models, which are more commonly used to model actual physical devices, are discussed in [20, 28, 29]. For simplicity, we assume that the variations in the value of s(t) are restricted to be small so that G(s(t)) can be linearized around some point s*, and the conductivity of the memristor is given, to first order, by , where and . Such a linearization is formally justified if sufficiently small inputs are used, so that s does not stray far from the fixed point (i.e. , making second‐order contributions negligible). The only (rather mild) assumption is that G(s) is differentiable near s*. Despite this linearization, the memristor is still a nonlinear component, since from the previous relations we have . Importantly, this nonlinear product s(t)v(t) underscores the key role of the memristor in the proposed design, where an input signal v(t) is being multiplied by an adjustable internal value s(t). Thus, the memristor enables an efficient implementation of trainable multilayered neural networks (MNNs) in hardware, as explained below.
Figure 3.22 Memristor‐based cellular nonlinear/neural network (MCeNN).
Online gradient descent learning, which was described earlier, can be used here as well. With the notation used here, we assume a learning system that operates on K discrete presentations of inputs (trials), indexed by k = 1, 2, … , K. For brevity, the iteration number is sometimes not indexed when it is clear from the context. On each trial k, the system receives empirical data, a pair of two real column vectors of sizes M and N : a pattern x(k) ∈ ℝM; and a desired label d(k) ∈ ℝN, with all pairs sharing the same desired relation d(k) = f(x(k)). Note that two distinct patterns can have the same label. The objective of the system is to estimate (learn) the function f(·) using the empirical data. As a simple example, suppose W is a tunable N × M matrix of parameters, and consider the estimator
(3.75)
which is a single‐layer NN. The result of the estimator r = Wx should aim to predict the correct desired labels d = f(x) for new unseen patterns x. As before, to solve this problem, W is tuned to minimize some measure of error between the estimated and desired labels, over a K0 ‐long subset of the training set (for which k = 1, … , K0 ). If we define the error vector as y(k)Δ = d(k) − r(k), then a common measure is the mean square error: MSE . Other error measures can be also be used. The performance of the resulting estimator is then tested over a different subset called the test set (k = K0 + 1, … , K).
As before, a reasonable iterative algorithm for minimizing this objective is the online gradient descent (SGD) iteration , where the 1/2 coefficient is written for mathematical convenience, η is the learning rate, a (usually small) positive constant; and at each iteration k, a single empirical sample x(k) is chosen randomly and presented at the input of the system. Using and defining ΔW(k) = W(k + 1) − W(k) and (·)T to be the transpose operation, we obtain the outer product
(3.76)
The parameters of more complicated estimators can also be similarly tuned (trained), using backpropagation, as discussed earlier in this chapter.