Читать книгу Informatics and Machine Learning - Stephen Winters-Hilt - Страница 45
3.1 Shannon Entropy, Relative Entropy, Maxent, Mutual Information
ОглавлениеIf you have a discrete probability distribution P, with individual components pk, then the rules for probabilities requires that the sum of the probabilities of the individual outcomes must be 1 (as mentioned in Chapter 2). This is written in math shorthand as:
Furthermore, the individual outcome probabilities must always be positive, and by some conventions, nonzero. In the case of hexamers, there are 4096 types, thus the index variable “k” ranges from 1 to 46 = 4096. If we introduce a second discrete probability distribution Q, with individual components qk, those components sum to 1 as well:
The definition of Shannon Entropy in this math notation, for the P distribution, is:
The degree of randomness in a discrete probability distribution P can be measured in terms of Shannon entropy [106] .
Shannon entropy appears in fundamental contexts in communications theory and in statistical physics [100] . Efforts to derive Shannon entropy from some deeper theory drove early efforts to at least obtain axiomatic derivations, with the one used by Khinchine given in the next section being the most popular. The axiomatic approach is limited by the assumptions of its axioms, however, so it was not until the fundamental role of relative entropy was established in an “information geometry” context [113–115], that a path to show that Shannon entropy is uniquely qualified as a measure was established (c. 1999). The fundamental (extremal optimum) aspect of relative entropy (and Shannon entropy as a simple case) is found by differential geometry arguments akin to those of Einstein on Riemannian spaces (here involving spaces defined by the family of exponential distributions). Whereas the “natural” notion of metric and distance locally is given by the Minkowski metric and Euclidean distance, a similar analysis on comparing distributions (evaluating their “distance” from eachother) indicates the natural measure is relative entropy (which reduces to Shannon entropy in variational contexts when the relative entropy is relative to the uniform probability distribution). Further details on this derivation are given in Chapter 8.