definition
When a computer represents a real number, it usually has some approximation errors, which is caused by the representation method of the binary itself. In many cases, the computer will round these values, simply put, if the number can only retain 5 decimal places, then if the number's value is 1 0 − 6 10^{-6} 10−6, then this value may be rounded to 0 by the computer. The problem caused by this rounding isCompound operationEspecially common in.
Underflow: When a number close to zero is rounded to zero, an underflow occurs.
Overflow: When the number of a large number of orders is approximate to
∞
∞
∞When an overflow occurs
In fact, overflow and underflow are in the process of gradient descentGradient disappearanceandGradient explosionNo matter which phenomenon occurs, it will lead to failure of model training.
example
The softmax function is a typical function that is prone to overflow and underflow. The form of this function is as follows:
s o f t m a x ( x ) = e x p ( x i ) ∑ j = 1 n e x p ( x j ) softmax(x)=\frac{exp(x_i)}{\sum_{j=1}^n exp(x_j)} softmax(x)=∑j=1nexp(xj)exp(xi)
Assume all x i x_i xiequal to a constant c, then through the above softmax function, all outputs are 1 n \frac{1}{n} n1 。
We can roughly estimate that when c is very small, e x p ( x i ) exp(x_i) exp(xi)It's very small, close to 0, that is e x p ( x i ) exp(x_i) exp(xi)An underflow may occur, resulting in the output of the softmax function being undefined (the denominator cannot be 0). Similarly, exp© may also overflow, resulting in both numerator and denominator being infinite, which makes the entire expression undefined.
Solution
There are many ways to solve softmax overflow and underflow, such as processing incoming values, that is:
s
o
f
t
m
a
x
(
z
)
z
=
x
−
m
a
x
i
x
i
softmax(z) \\ z = x-max_ix_i
softmax(z)z=x−maxixi
Since all inputs are subtracted or added with a value, the input size ratio remains unchanged, and the output size ratio of the softmax function does not change, that is, large value or large value, small value or small value.
The above equation can ensure that when c is large, the maximum of z is only 0 and exp© is 1. In turn, prevent overflow
It can also be guaranteed that the denominator of the softma function formula must be greater than or equal to 1, because the largest
x
i
x_i
xi -
m
a
x
i
x
i
max_ix_i
maxixiIf equal to 0, then exp(z) is 1, effectively preventing underflow.
Attached
In addition to the denominator, it may cause overflow or underflow, and the numerator may also cause overflow or underflow. For example, log softmax(x), when the numerator is very small, the output will be obtained as ∞ ∞ ∞In turn, it causes overflow.