Читать книгу Inquiries into Human Faculty and Its Development - Galton Francis - Страница 24
STATISTICAL METHODS
ОглавлениеThe object of statistical science is to discover methods of condensing information concerning large groups of allied facts into brief and compendious expressions suitable for discussion. The possibility of doing this is based on the constancy and continuity with which objects of the same species are found to vary. That is to say, we always find, after sorting any large number of such objects in the order (let us suppose) of their lengths, beginning with the shortest and ending with the tallest, and setting them side by side like a long row of park palings between the same limits, their upper outline will be identical. Moreover, it will run smoothly and not in irregular steps. The theoretical interpretation of the smoothness of outline is that the individual differences in the objects are caused by different combinations of a large number of minute influences; and as the difference between any two adjacent objects in a long row must depend on the absence in one of them of some single influence, or of only a few such, that were present in the other, the amount of difference will be insensible. Whenever we find on trial that the outline of the row is not a flowing curve, the presumption is that the objects are not all of the same species, but that part are affected by some large influence from which the others are free; consequently there is a confusion of curves. This presumption is never found to be belied.
It is unfortunate for the peace of mind of the statistician that the influences by which the magnitudes, etc., of the objects are determined can seldom if ever be roundly classed into large and small, without intermediates. He is tantalised by the hope of getting hold of sub-groups of sufficient size that shall contain no individuals except those belonging strictly to the same species, and he is almost constantly baffled. In the end he is obliged to exercise his judgment as to the limit at which he should cease to subdivide. If he subdivides very frequently, the groups become too small to have statistical value; if less frequently, the groups will be less truly specific.
A species may be defined as a group of objects whose individual differences are wholly due to different combinations of the same set of minute causes, no one of which is so powerful as to be able by itself to make any sensible difference in the result. A well-known mathematical consequence flows from this, which is also universally observed as a fact, namely, that in all species the number of individuals who differ from the average value, up to any given amount, is much greater than the number who differ more than that amount, and up to the double of it. In short, if an assorted series be represented by upright lines arranged side by side along a horizontal base at equal distances apart, and of lengths proportionate to the magnitude of the quality in the corresponding objects, then their shape will always resemble that shown in Fig. 1.
The form of the bounding curve resembles that which is called in architectural language an ogive, from "augive," an old French word for a cup, the figure being not unlike the upper half of a cup lying sideways with its axis horizontal. In consequence of the multitude of mediocre values, we always find that on either side of the middlemost ordinate Cc, which is the median value and may be accepted as the average, there is a much less rapid change of height than elsewhere. If the figure were pulled out sideways to make it accord with such physical conceptions as that of a row of men standing side by side, the middle part of the curve would be apparently horizontal.
The mathematical conception of the curve is best expressed in Fig. 2, where PQ represents any given deviation from the average value, and the ratio of PO to AB represents the relative probability of its occurrence. The equation to the curve and a discussion of its properties will be found in the Proceedings of the Royal Society, No. 198, 1879, by Dr. M'Alister. The title of the paper is the "Law of the Geometric Mean," and it follows one by myself on "The Geometric Mean in Vital and Social Statistics."
We can lay down the ogive of any quality, physical or mental, whenever we are capable of judging which of any two members of the group we are engaged upon has the larger amount of that quality. I have called this the method of statistics by intercomparison. There is no bodily or mental attribute in any race of individuals that can be so dealt with, whether our judgment in comparing them be guided by common-sense observation or by actual measurement, which cannot be gripped and consolidated into an ogive with a smooth outline, and thenceforward be treated in discussion as a single object.
It is easy to describe any given ogive which has been based upon measurements, so that it may be drawn from the description with approximate truth. Divide AB into a convenient number of fractional parts, and record the height of the ordinates at those parts. In reproducing the ogive from these data, draw a base line of any convenient length, divide it in the same number of fractional parts, erect ordinates of the stated lengths at those parts, connect their tops with a flowing line, and the thing is done. The most convenient fractional parts are the middle (giving the median), the outside quarters (giving the upper and lower quartiles), and similarly the upper and lower octiles or deciles. This is sufficient for most purposes. It leaves only the outer eighths or tenths of the cases undescribed and undetermined, except so far as may be guessed by, the run of the intermediate portion of the curve, and it defines all of the intermediate portion with as close an, approximation as is needed for ordinary or statistical purposes.
Thus the heights of all but the outer tenths of the whole body of adult males of the English professional classes may be derived from the five following ordinates, measured in inches, of which the outer pair are deciles:--
67.2; 67.5; 68.8; 70.3; 71.4.
Many other instances will be found in the Report of the Anthropometric Committee of the British Association in 1881, pp. 245–257.
When we desire to compare any two large statistical groups, we may compare median with median, quartiles with quartiles, and octiles with octiles; or we may proceed on the method to be described in the next paragraph but one.
We are often called upon to define the position of an individual in his own series, in which case it is most conformable to usage to give his centesimal grade--that is, his place on the base line AB--supposing it to be graduated from 0° to 100°. In reckoning this, a confusion ought to be avoided between "graduation" and "rank," though it leads to no sensible error in practice. The first of the "park palings" does not stand at A, which is 0°, nor does the hundredth stand at B, which is 100°, for that would make 101 of them: but they stand at o°.5 and 99°.5 respectively. Similarly, all intermediate ranks stand half a degree short of the graduation bearing the same number. When the class is large, the value of half a place becomes extremely small, and the rank and graduation may be treated as identical.
Examples of method of calculating a centesimal position:--
1. A child A is classed after examination as No. 5 in a class of 27 children; what is his centesimal graduation?
Answer.--If AB be divided into 27 graduations, his rank of No. 5 will correspond to the graduation 4°.5; therefore if AB be graduated afresh into 100 graduations, his centesimal grade, x, will be found by the Rule of Three, thus--
x : 4°.5 :: 100:27; x = 450°/27 = 16°.6.
2. Another child B is classed No. 13 in a class of 25 Answer.--If AB be divided into 25 graduations, the rank of No. 13 will correspond to graduation 12°.5, whence as before--
x : 12°.5 :: 100 : 25; x = 1250°/25 = 50°; i.e. B is the median.
The second method of comparing two statistical groups, to which I alluded in the last paragraph but one, consists in stating the centesimal grade in the one group that corresponds with the median or any other fractional grade in the other. This, it will be remarked, is a very simple method of comparison, absolutely independent of any theory, and applicable to any statistical groups whatever, whether of physical or of mental qualities. Wherever we can sort in order, there we can apply this method. Thus, in the above examples, suppose A and B had been selected because they were equal when compared together, then we can concisely express the relative merits of the two classes to which they respectively belong, by saying that 16°.6 in the one is equal to 50° (the median) in the other.
I frequently make statistical records of form and feature, in the streets or in company, without exciting attention, by means of a fine pricker and a piece of paper. The pricker is a converted silver pencil-case, with the usual sliding piece; it is a very small one, and is attached to my watch chain. The pencil part has been taken out and replaced by a fine short needle, the open mouth of the case is covered with a hemispherical cap having a hole in the centre, and the adjustments are such that when the slide is pushed forward as far as it can go, the needle projects no more than one-tenth of an inch. If I then press it upon a piece of paper, held against the ball of my thumb, the paper is indelibly perforated with a fine hole, and the thumb is not wounded. The perforations will not be found to run into one another unless they are very numerous, and if they happen to do so now and then, it is of little consequence in a statistical inquiry. The holes are easily counted at leisure, by holding the paper against the light, and any scrap of paper will serve the purpose. It will be found that the majority of inquiries take the form of "more," "equal to," or "less," so I arrange the paper in a way to present three distinct compartments to the pricker, and to permit of its being held in the correct position and used by the sense of touch alone.
I do so by tearing the paper into the form of a cross--that is, maimed in one of its arms--and hold it by the maimed part between the thumb and finger, the head of the cross pointing upward. The head of the cross receives the pricks referring to "more"; the solitary arm that is not maimed, those meaning "the same"; the long foot of the cross those meaning "less." It is well to write the subject of the measurement on the paper before beginning to use it, then more than one set of records can be kept in the pocket at the same time, and be severally added to as occasion serves, without fear of mistaking one for the other.