Читать книгу Introduction to Abnormal Child and Adolescent Psychology - Robert Weis - Страница 168
Reliability
ОглавлениеReliability refers to the consistency of a psychological test. Reliable tests yield consistent scores over time and across administrations. Although there are many types of reliability, the three most common are test–retest reliability, inter-rater reliability, and internal consistency (Hogan & Tsushima, 2018).
Test–retest reliability refers to the consistency of test scores over time. Imagine that you purchase a Fitbit to help you get into shape. You wear the Fitbit each morning while walking to your first class. If the number of steps estimated by the Fitbit is approximately the same each day, we would say that the Fitbit shows high test–retest reliability. The device yields consistent scores across repeated administrations. Psychological tests should also have high test–retest reliability. A child who earns a FSIQ of 110 should earn a similar FSIQ score several months later.
Inter-rater reliability refers to the consistency of test scores across two or more raters or observers. Imagine that you are affluent enough to own a Fitbit and a Garmin to measure your daily activity, one on each wrist. If the number of steps were similar for each device, we would say that the devices showed excellent inter-rater reliability; they agree with each other. Similarly, psychological tests should show high inter-rater reliability. For example, on portions of the WISC–V, psychologists assign points based on the thoroughness of children’s answers. If a child defines an elephant as an animal, she might earn 1 point, whereas if she defines it as an animal with four legs, a trunk, and large ears, she might earn 2 points. Different psychologists should assign the same points for the same response, showing high inter-rater reliability.
Internal consistency refers to the degree to which test items yield consistent scores. Imagine that you want to obtain an estimate of your physical activity using your Fitbit. You decide to measure activity in three ways: (1) using the Fitbit’s step count, (2) using GPS data, and (3) by manually recording your activity. If you exercise a lot that day, all three scores should be high, because they all measure the same construct (i.e., activity). On the other hand, if you are sedentary that day, all three scores should be low. Such data would indicate good internal consistency; items measuring the same construct should yield consistent results.
Psychological tests should also have high internal consistency. For example, the WISC–V verbal comprehension tests show very high internal consistency. Children with excellent verbal skills tend to answer most test items correctly, whereas children with lower verbal skills tend to struggle on these items. High internal consistency suggests that items on the verbal comprehension index measure the same construct (e.g., verbal comprehension) and not other constructs such as the child’s visual–spatial skills or memory.
Reliability can be quantified using a coefficient ranging from 0 to 1.0. A reliability coefficient of 1.0 indicates perfect consistency. What constitutes “acceptable” reliability varies depending on the type of reliability and construct the test is measuring. For example, tests that assess traits that are believed to be stable over time, such as FSIQ, should have high test–retest reliability. In contrast, tests that measure mental states that are likely to change over time, such as symptoms of depression or anxiety, may have lower test–retest reliability.