Читать книгу Informatics and Machine Learning - Stephen Winters-Hilt - Страница 32
2.4 Identifying Emergent/Convergent Statistics and Anomalous Statistics
ОглавлениеExpectation, E(X), of random variable (r.v.) X:
X is the total of rolling two six‐sided dice: X = 2 can occur in one way, rolling “snake eyes,” while rolling X = 7 can be done in six ways, etc. E(X) = 7. Now consider the expectation for rolling a single die, now E(X) = 3.5. Notice that the value of the expectation need not be one of your possible outcomes (it is really hard to roll a 3.5).
The expectation, E(g(X)), of a function g of r.v. X:
Consider special case g(X) where g(xi ) = −log(p(xi )):
which is Shannon Entropy for the discrete distribution p(xi). For Mutual Information, similarly, use g(X,Y) = log(p(xi , yi )/p(xi )p(yi )) :
if p(xi ), p(yi ), p(xi , yi ) are all ∈ℜ+ , which is the Relative Entropy between a joint distribution and the same distribution if r.v.'s independent: D( p(xi , yi ) ‖ p(xi )p(yi ) ).
Jensen's Inequality:
Let φ(⋅) be a convex function on a convex subset of the real line: φ: χ➔ℜ. Convexity by definition: φ(λ1 x1+…yn xn) ≤ λ1 φ(x1)+ … +λn φ(xn), where λi ≥ 0 and ∑λi = 1. Thus, if λ1 = p(x1), we satisfy the relations for line interpolation as well as discrete probability distributions, so can rewrite in terms of the Expectation definition:
Since φ(x) = −log(x) is a convex function:
Variance:
Chebyshev's Inequality:
For k > 0, P(|X − E(X)| > k) ≤ Var(X)/k2
Proof: