Читать книгу Informatics and Machine Learning - Stephen Winters-Hilt - Страница 49
3.1.4 Mutual Information
ОглавлениеOne of the most powerful uses of Relative entropy is in the context of evaluating the statistical linkage between two sets of outcomes, e.g. in determining if two random variables are independent are not. In probability we can talk about the probability of two events happening, such as the probabilities for the outcomes of rolling two dice P(X, Y), where X is the first die, with outcomes x1 = 1, …, x6 = 6, and similarly for the second die Y. If they are both fair dice, they act independently of each other, then their joint probability reduces to: P(X, Y) = P(X)P(Y), if {X, Y} are independent of each other.
If using loaded dice, but with dice that have no interaction, then they are still independent of eachother, and their probabilities are thus still independent, reducing to the product of two simpler probability distributions (with one argument each) as shown above. In games of dice where two dice are rolled (craps) it is possible to have dice that individually roll as fair, having uniform distribution on outcomes, but that when rolled together interact such that their combined rolls are biased. This can be accomplished with use of small bar magnets oriented from the “1” to “6” faces, such that the dice tend to come up with their magnets anti‐aligned one showing its “1” face, the other showing its “6” face, for a total roll count of “7” (where the roll of a “7” has special significance in the game of craps). In the instance of the dice with magnets, the outcomes of the individual die rolls are not independent, and the simplification of P(X, Y) to P(X)P(Y) cannot be made.
In evaluating if there is a statistical linkage between two events we are essentially asking if the probability of those events are independent, e.g. does P(X, Y) = P(X)P(Y)? In this situation we are again in the position of comparing two probability distributions, P(X, Y) and P(X)P(Y), so if relative entropy is best for such comparisons, then why not evaluate D(P(X, Y) ‖ P(X)P(Y))? This is precisely what should be done and in doing so we have arrived at the definition of what is known as “mutual information” (finally a name for an information measure that is perfectly self‐explanatory!).
The use of mutual information is very powerful in bioinformatics, and informatics in general, as it allows statistical linkages to be discovered that are not otherwise apparent. In Section 3.2 we will start with evaluating the mutual information between genomic nucleotides at various degrees of separation. If we see nonzero mutual information in the genome for bases separated by certain, specified, gap distances, we will have uncovered that there is “structure” of some sort.