Читать книгу Machine Learning for Tomographic Imaging - Professor Ge Wang - Страница 44
ReLU
ОглавлениеCurrently, the rectified linear unit (ReLU) function has become very popular, and is shown in figure 3.7. Instead of sigmoid/tanh, ReLU outputs 0 if its input is less than 0; otherwise, it just reproduces the input. The mechanism of ReLU is more like the biological neurons in the visual cortex. ReLU allows some neurons to output zero while the rest of the neurons respond positively, often giving a sparse response to alleviate overfitting and simplify computation. In the brain, only when there is a suitable stimulus signal, do some specialized neurons respond at a high frequency. Otherwise, the response frequency of the neuron is no more than 1 Hz, which is just like being processed by a half-wave rectifier. The formula of ReLU is as follows:
ReLU(x)=max(0,x).(3.5)
Figure 3.7. The ReLU function, which is equal to zero for a negative input, and otherwise reproduces the input.
As shown in figure 3.7, the ReLU activation is easy to calculate, which simply thresholds the input value at zero. There are several merits of the ReLU function: (i) there is no saturation zone for a positive stimulation, without any gradient diminishing issue; (ii) there is no exponential operation so that the calculation is most efficient; and (iii) in the network training process, the convergence speed of ReLU is much faster than that of sigmoid/tanh. On the other hand, the ReLU function is not perfect. The output of ReLU is not always informative, which affects the efficiency of the network training process. Specifically, the ReLU output is always zero when x<0. As a result, the related network parameters cannot be updated with a zero gradient, leading to the phenomenon of ‘dead neurons’.