Читать книгу Informatics and Machine Learning - Stephen Winters-Hilt - Страница 46

3.1.1 The Khinchin Derivation

Оглавление

In his now famous 1948 paper [106] , Claude Shannon provided a qualitative measure for entropy in connection with communication theory. The Shannon entropy measure was later put on a more formal footing by A. I. Khinchin in an article where he proves that with certain assumptions the Shannon entropy is unique [107] . (Dozens of similar axiomatic proofs have since been made.) A statement of the theorem is as follows:

Khinchine Uniqueness Theorem: Let H(p1, p2, …, pn) be a function defined for any integer n and for all values p1, p2, , pn such that pk≥ 0 (k = 1, 2, …, n), and kpk = 1. If for any function n this function is continuous with respect to its arguments, and if the function obeys the three properties listed below, then H(p1, p2, …, pn) = −λ∑kpklog(pk), where λ is a positive constant (with Shannon entropy recovered for convention λ = 1). The three properties are:

1 For given n and for ∑kpk = 1, the function takes its largest value for pk = 1/n (k = 1, 2, …, n). This is equivalent to Laplace’s principle of insufficient reason, which says if you do not know anything assume the uniform distribution (also agrees with Occam’s Razor assumption of minimum structure).

2 H(ab) = H(a) + Ha(b), where Ha(b) = –∑ap(a)log(p(b|a)), is the conditional entropy. This is consistent with H(ab) = H(a) + H(b), for probabilities of a and b independent, with modifications involving conditional probability being used when not independent.

3 H(p1, p2, …, pn, 0) = H(p1, p2, …, pn). This reductive relationship, or something like it, is implicitly assumed when describing any system in “isolation.”

Note that the above axiomatic derivation is still “weak” in that it assumes the existence of the conditional entropy in property (2).

Informatics and Machine Learning

Подняться наверх