Читать книгу Statistics for HCI - Alan Dix - Страница 13

На сайте Литреса книга снята с продажи.

CHAPTER 3

Properties of randomness

We’ve seen how wild random phenomena can be; however, this does not mean they cannot be understood and at least partially tamed.

3.1 BIAS AND VARIABILITY

When you take a measurement, whether it is the time for someone to complete a task using some software, or a preferred way of doing something, you are using that measurement to find out something about the ‘real’ world—the average time for completion, or the overall level of preference amongst your users.

Two of the core things you need to know about are bias (is it a fair estimate of the real value) and variability (how likely is it to be close to the real value). Are your results fair and are they reliable?

3.1.1 BIAS

The word ‘bias’ in statistics has a precise meaning, but it is very close to its day-to-day meaning. Bias is about systematic effects that skew your results in one way or another. In particular, if you use your measurements to predict some real-world effect, is that effect likely to over-or under-estimate the true value? In other words, is it a fair estimate.

Say you take 20 users, and measure their average time to complete some task. You then use that as an estimate of the ‘true’ value, the average time to completion of all your users. Your particular estimate may be low or high (as we saw with the coin tossing experiments). However, if you repeated that experiment very many times would the average of your estimates end up being the true average?

If the complete user base were employees of a large company, and the company forced them to engage in your study, you could randomly select your 20 users, and in that case, yes, the estimate based on the users would be unbiased.¹

However, imagine you are interested in the popularity of Ariana Grande and issued a survey on a social network as a way to determine this. The effects would be very different depending on whether you chose to use LinkedIn or TikTok. No matter how randomly you select users from LinkedIn, they are probably not representative of the population as a whole, so you would end up with a biased estimate of Grande’s popularity.²

However, the good news is that sometimes it is possible to model bias and correct for it. For example, you might ask questions about age or other demographics and then use known population demographics to add weight to groups under-represented in your sample … although I doubt this would work for the Ariana Grande example: if there are 15-year-old members of LinkedIn, they are unlikely to be typical 15-year-olds!

Подняться наверх