Читать книгу Machine Learning for Tomographic Imaging - Professor Ge Wang - Страница 42

Sigmoid

Оглавление

Sigmoid is a commonly used activation function, as shown in figure 3.5 (Han and Moraga 1995). The Sigmoid function sets the output value in a range between 0 and 1, where 0 represents not activated at all, and 1 represents fully activated. This binary nature simulates the two states of a biological neuron, where an action potential is transmitted only if the accumulated stimulation strength is above a threshold. A larger output value means a stronger response of the neuron. The mathematical expression is as follows:

σ(x)=11+e−x.(3.3)

The sigmoid function was very popular in the past since it has a clear interpretation as the firing rate of a neuron. However, the sigmoid nonlinearity is rarely used now, because it has the following major drawback: when the activation saturates at either tail of 0 or 1, the gradient in these regions is almost zero, which is undesirable for optimization of the network parameters. During the backpropagation-based training process (see section 3.1.7 for details on backpropagation), a nearly zero gradient for a neuron will be multiplied with other factors according to the chain rule to compute an overall gradient for parametric updating, but such a diminishing gradient will effectively ‘kill’ the overall gradient, and almost no information will flow through the neuron to its weights and recursively to its data. In addition, we need to initialize the weights of sigmoid neurons carefully to avoid saturation. For example, when some neurons’ weights are set as too large initially, they will become saturated and not learn significantly. Also, sigmoid outputs are not zero-centered. This means that the output is always greater than 0, which will make the input values to the next layers all positive. Then, in the gradient derivation for backpropagation, elements in the weighted matrix change in a biased direction, compromising the training efficacy. In addition, the sigmoid function involves the exponential operation, which is computationally demanding.


Figure 3.5. The sigmoid function is differentiable and monotonic with the range [0, 1] and the number axis as the domain.

Machine Learning for Tomographic Imaging

Подняться наверх