Читать книгу Statistics and the Evaluation of Evidence for Forensic Scientists - Franco Taroni - Страница 32

1.7.7 Probabilities and Frequencies: The Role of Exchangeability

Оглавление

It is not uncommon for subjective (or personal) probabilities to be considered as a synonym for arbitrariness. This is not so; the use of subjectivism does not mean the use of acquired knowledge that is often available for consideration of relative frequencies is neglected. The main source of misunderstanding is concerned with the relationship between frequencies and beliefs. The two terms are, unfortunately, often regarded as equivalent since frequency data can be used to inform probabilities (Lindley 1991) but they are not equivalent. Dawid and Galavotti (2009, p. 100) quoted de Finetti's view:

every probability evaluation essentially depends on two components: (1) the objective component, consisting of the evidence of known data and facts; and (2) the subjective component, consisting of the opinion concerning unknown facts based on known evidence.

As emphasised more recently by D'Agostini (2016)

It is a matter of fact that relative frequency and probability are somehow connected within probability theory, without the need for identifying the two concepts. (p. 13)

It is reasonable to use relative frequencies to inform measures of belief and the relationship takes the form of a mathematical theorem, de Finetti's Representation theorem. According to the theorem, the convergence of one's personal probability towards the value of observed frequencies, as the number of observations increases, is a logical consequence of Bayes' theorem if a condition called exchangeability is satisfied by the degrees of belief prior to the observations (Dawid 2004).

As an illustration of the connection between frequency and probability, consider again an urn containing a certain number of balls, indistinguishable except by their colour, which is either white or black, and the number of balls of each colour being known. The extraction of a ball from this urn defines an experiment having two and only two possible outcomes that are generally denoted as success (say, the withdrawal of a white ball) or failure (say, the withdrawal of a black ball). Let denote the event ‘a white ball is extracted’. Under the circumstances that balls are all indistinguishable from each other except for the colour, the subjective probability to extract a white ball can be assessed as the known proportion of white balls, that is, . Assuming the urn contains a large number of balls, so that the extraction of a few balls does not alter its composition substantially, individual draws (i.e. sampling9) will be considered as with replacement and the probability of extracting a white ball at subsequent withdrawals will still be , independently on previous observations. In this way one realises a series of so‐called Bernoulli trials (Section A.2.1), where the outcome of each trial has a constant probability independent from previous outcomes.

Suppose now the observer does not know the absolute value of balls present, nor the proportion that are of each colour. De Finetti (1931a) showed that every series of experiments having two and only two possible outcomes that can be taken as exchangeable (i.e. the probability assigned to the outcomes of a sequence of trials is invariant to permutation) can be represented as random withdrawals from an urn of unknown composition. If one can assess one's uncertainty in such a way that labelling of the trials is not relevant, then it can be proved that as the number of observations increases the relative frequencies of successes (i.e. the relative frequency of white balls) tend to a limiting value that is the proportion of white balls. A subjective assessment about the outcome of a sequence of Bernoulli trials is equivalent to placing a prior distribution on . According to this, one only needs to model a prior distribution for the possible values that might take: personal beliefs concerning the colour of the next ball extracted can be computed as

(1.2)

The introduction of a prior probability distribution modelling personal belief about may seem, at first sight, in contradiction with statements that probability is a single number. One can have probabilities for events, or probabilities for propositions, but not probabilities of probabilities, otherwise one would have an infinite regression (de Finetti 1976). Confusion may arise from the fact that parameter is generally termed as ‘probability of success’. However, it is worth noting that, although it is effectively a probability, it represents a chance rather than a belief.

A set of observations is said to be exchangeable – for you, given a knowledge base – if their joint distribution is invariant under permutation. A formal definition is as follows (Bernardo and Smith 2000):

The random quantities , are said to be judged exchangeable under a probability measure if the implied joint degree of belief distribution satisfies for all permutations defined on the set . (p. 169)

Practically, consider the following hypothetical case example. A laboratory receives a consignment of discrete items whose attributes may be relevant within the context of a criminal investigation. The laboratory is requested to conduct analyses in order to gather information that should allow an inference to be drawn, for example about the proportion of items in the consignment that are of a certain kind (e.g. counterfeit products). The term ‘positive’ is used here to refer to the presence of an item's property that is of interest (e.g. counterfeit); otherwise the result of the analysis is termed ‘negative’. This allows the introduction of a random variable that takes the value 1 (i.e. success) if the analysed unit is positive and 0 (i.e. failure) otherwise. This is a generic type of case that applies well to many situations, such as surveys or, more generally, sampling procedures conducted to infer the proportion of individuals or items in a population who share a given property or possess certain characteristics (e.g. that of being counterfeit). Suppose now that units are analysed, so that there are possible outcomes. The forensic scientist should be able to assign a probability to each of the 1024 possible outcomes. At this point, if it was reasonable to assume that only the observed values matter and not the order in which they appear, the forensic scientist would have a sensibly simplified task. In fact, the total number of probability assignments would reduce from 1024 to 11, since it is assumed that all sequences are assigned the same probability if they have the same number of 1's, (i.e. successes). This is possible if it is thought that all the items are indistinguishable in the sense that it does not matter which particular item produced a success (e.g. a positive response) or a failure (e.g. a negative response). Stated otherwise, this means that one's probability assignment is invariant under changes in the order of successes and failures. If the outcomes were permuted in any way, assigned probabilities would be unchanged. For a coin‐tossing experiment, Lindley (2014) has expressed this as follows:

One way of expressing this is to say that any one toss, with its resulting outcome, may be exchanged for any other with the same outcome, in the sense that the exchange will not alter your belief, expressing the idea that the tosses were done under conditions that you feel were identical. (p. 148)

The role of exchangeability in the reconciliation of subjective probabilities and frequencies in forensic science is developed in Taroni et al. (2018). It is possible to give relative frequency an explicit role in probability assignments but this does not mean that probabilities can only be given when relative frequencies are available.

The existence of relative frequencies is not a necessary condition for the assignment of probabilities. Typically, relative frequencies are not available in the case of single (not replicable) events. Other methods of elicitation, such as scoring rules, can be implemented to deal with such situations. An extended discussion on elicitation is given by O'Hagan et al. (2006).

The use of scores for the assessment of forecasts is described in DeGroot and Fienberg (1983). The association of scores for the assessment of forecasts and the use of scores for the assessment of the performance of methods for evidence evaluation will be made clear later in Section 8.4.3. A score is used to evaluate and compare forecasters who present their predictions of whether or not an event will occur as a subjective probability of the occurrence of that event. A common use for forecasts is that of weather from one day to the next. Let denote a forecaster's prediction of rain on the following day. Let be the forecaster's actual subjective probability of rain for that day. Let an arbitrary function be the forecaster's score if rain occurs and let another arbitrary function be their score if rain does not occur. With an assumption that the forecaster wishes to maximise their score, assume that is an increasing function of and is a decreasing function of . For a prediction of and an actual subjective probability of , the expected score of the forecaster is

(1.3)

A proper scoring rule is one for which (1.3) is maximised when . A strictly proper scoring rule is one for which is the only value of that maximises (1.3).

One of the earliest scoring rules, proposed for meteorological forecasts, is the quadratic scoring rule (Brier 1950). This score has the property that the forecaster will minimise their subjective expected Brier score on any particular day with a stated prediction of their actual subjective probability of rain on that day. The expected Brier score is then

(1.4)

This is minimised uniquely when . The negative of the Brier score is a strictly proper scoring rule with and (minimisation of a function corresponds to maximisation of the negative of the function).

The notion of exchangeability is illustrated with the following example of selection without replacement of items of a particular type, say, , from a small population. As an example of what might be, consider tablets in a consignment of drugs; the tablets may be either illicit () or licit. The descriptor ‘small’ for the population size is used to indicate that removal of a member from the population, as in selection without replacement, effects the probability of possession of when the next member is selected for removal.

Denote the population size by . Of the items in the population, possess and do not and is not known. A sample of size is taken. The probability the first item selected from the population is of type is . If the first member selected from the population possesses , the probability the next member selected also possesses is . The population size is sufficiently small that cannot be approximated meaningfully by . Successive draws from the consignment are not independent in that knowledge of the outcome of one draw affects the probability of a particular outcome at the next draw.

Let be the number of members of the sample of size that possess . The probability distribution for is the hypergeometric distribution (Section 4.3.2 and Appendix A.2.5) and


This distribution does not depend on the order in which the members are drawn from the population, only on the number which possess and the number which do not. The property that the distribution is independent of the order is that of exchangeability.

As is not known, it is not possible to determine . However, it is possible given values for , and to make inferences about . A comparison of the frequentist and Bayesian approaches to this small consignment sampling problem is given in Section 4.3.2 and Aitken (1999).

Probabilities based on frequencies may be thought of as objective probabilities. They are considered objective in the sense that there is a well‐defined set of circumstances for the long‐run repetition of the trials, such that the corresponding probabilities are well‐defined and that one's personal or subjective views will not alter the value of the probabilities. Each person considering these circumstances will provide the same values for the probabilities. The frequency model relates to a relative frequency obtained in a long sequence of trials, assumed to be performed in an identical manner, physically independent of each other. Such a circumstance has certain difficulties. This point of view does not allow a statement of probability for any situation that does not happen to be embedded, at least conceptually, in a long sequence of events giving equally likely outcomes. However, note the following words of Lindley (2004):

Objectivity is merely subjectivity when nearly everyone agrees. (p. 87)

Statistics and the Evaluation of Evidence for Forensic Scientists

Подняться наверх