Читать книгу Arguments, Cognition, and Science - André C. R. Martins - Страница 9

Оглавление

Our reasoning skills are not perfect—and we have known that for a long time. Maybe we have even been aware of that since we start arguing with one another. Noticing the mistakes of others is something we do quite well, particularly when we disagree with them. Our own failures are harder to accept, but very few sane people (if any) would dare to call their own reasoning skills infallible.

Whenever the first actual debates about correct reasoning and argumentation started, it is very clear the problem of building solid arguments was already well known in ancient Greece. By that time, there were already proposals for solving the problem. Aristotle’s Organon (Aristotle 2007) provides very clear rules for demonstrations. It tells us how, if we accept some ideas, we must also accept some conclusions that become unavoidable. But it also shows there are situations where we cannot reach any conclusions with certainty. Aristotle’s goal was, indeed, determining when we can say without doubt one conclusion follows from a set of premises—and from there, determining what is true.

Yet, while we have been able to identify many reasoning mistakes for thousands of years, we still commit them. We even make very trivial mistakes. Some of the cases in Aristotelian logic are quite simple to understand and learn. With little training, it should be possible to avoid those mistakes, but generation after generation has been prey to those well-mapped cases. Nowadays, we have large lists of fallacies easily available to everyone. In books, on websites, it is easy to find them. It would make sense to assume people should want to be competent at reasoning, but it is not hard to find intelligent people making simple mistakes. That is far more puzzling than we usually think. If reasoning correctly is an advantage, intelligent people should try harder to correct their own mistakes. While some of us do train our minds to avoid simple logical errors, in general, that is not what we observe.

As we keep committing the same mistakes, a couple of questions come to mind: Are we that dumb? Are most of us incapable of logical reasoning? Calling humans rational beings might be an exaggeration, if that is the case. Or, maybe, there are other reasons why we make those mistakes. Is it possible that we actually gain something from making errors? To understand what might be happening, we need to make a brief review of the literature about our errors. Cognitive experiments have been very common in the last decades. Despite problems such as the recent crisis with replications, there are some solid results where many experiments point in the same direction. We should be able to get some initial answers there. Once we understand this puzzle better, we will be in a better position to see if it is only the uneducated among us who commit those mistakes, or if we should expect problems even among well-educated and intelligent researchers.

Some of the first experiments on human cognition were designed to test non-trivial rules of rationality. It seems we assumed that humans were competent at reasoning and we only needed to worry about harder details. We were considered intelligent, if not perfect. If we failed, it would be at rules that seemed less natural and harder to follow. That was the general spirit of Allais’s (1953) and Ellsberg’s (1961) experiments. Those experiments were designed to show that we were not perfect at obeying a specific rule of Expected Utility Theory (EUT) (von Neumann and Morgenstern 1947). Both Allais and Ellsberg tested if, when people chose between games, the identical details in the games were ignored. EUT said that rule should be followed, but the experiments showed we did not obey them.

From early on, we were aware that humans make far simpler mistakes. Or, at least, we were aware that others commit those mistakes. We would certainly like to see ourselves as rational, as immune to trivial mistakes—at least when we take the time to think through a problem. The errors of others could, in principle, not be actual errors. It might be a character problem. Maybe those who disagreed with us knew how to reason but chose to make a few mistakes in order to fool those who are not as well educated. Maybe they were just not competent enough. People of low intelligence exist, after all. Whatever assumptions people actually made, we did see humanity as comprised of rational beings. If we assume people are rational when given the opportunity to think through a problem, most mistakes would happen on less important details. Testing easier problems would make sense, for completeness sake, at least. But, I suspect, humans were expected to succeed at those. The rules that were not so clear, even controversial, were the cases where we were most likely to find trouble.

That optimistic view of our reasoning abilities was not to last long, however. P. C. Watson and P. Johnson-Laird describe an interesting experiment on our ability to solve a trivial logical problem (Watson and Johnson-Laird 1972). The experiment consisted of showing four cards over a table to groups of volunteers. The volunteers were told that all the cards came from a deck where on one side of the card there was a letter and on the other side, a number. There was also a possible rule those cards might or might not obey. That rule was this: Whenever there was a vowel on the letter side, the number side would show an even number. Over the table, the volunteers could see the cards “E,” “K,” “4,” and “7”; and they had to answer a simple question. If you looked at the other side of those cards, which ones could provide proof that the rule was wrong?

The correct answer is, of course, cards “E” and “7.” If you get an odd number behind the letter “E,” the rule is false. And most people get this one card right with no difficulty. As the test was done, however, it turned out the majority tended to pick “E” and “4.” However, while the card “4” can provide an example of the rule working fine, it cannot provide an example of failure. It is as if people were looking for cases where the rule was confirmed, even when told to look for failures. Indeed, one of the explanations proposed for this experiment’s results is that we have a tendency to look for information that confirms our beliefs. We also avoid information that might show we are wrong. That effect is known as confirmation bias (Nickerson 1998).

Many mistakes have been observed since those initial experiments. And the list of our known biases keeps expanding. As we observe that list, a few themes start to show up. We seem to use fast heuristics quite often (Gigerenzer et al. 2000). They often provide correct answers but, as they exist for speed and not only accuracy, they can fail. Still, they make sense from both a practical as well as an evolutionary point of view. Solving a problem in a way that we can be reasonably sure of the answer can be mentally demanding. That would mean an increased use of energy, requiring more food. It could also mean devoting more time to think through that problem. Depending on the situation, we might not have that time nor the energy resources to devote to finding the actual best solution. If it might be a lion behind the bushes, waiting to be sure is a luxury we cannot afford. A fast heuristic that is reliable but not perfect might do a much better job at keeping us alive. The actual problem evolution had to solve involved not only the quality of the answer but also how fast we could have an answer and how much energy that would consume. In situations like those, instead of looking for the optimal solution, it makes sense to adopt “satisficing” strategies (Simon 1956).

The availability heuristic is a classic example of how that works (Tversky and Kahneman 1973). That heuristic claims that, when making judgments, we consider information that is more available in our minds and should be associated to more probable events. If you are a physician evaluating a patient, before you gather more information, it is more probable your patient has a common disease than a rare one. It makes sense to assume diseases you meet every day will be fresher in your memory. Those very rare cases will likely be hard to remember. The cases that are more easily available in your memory, thus, are likely to be the most probable ones.

While the ease with which you remember information is associated with how frequent it is, that is not the only factor. Dramatic cases make stronger memories, and they also make better news, which is why dramatic incidents appear to occur far more frequently in newspapers and on TV than in actual reality. Shark attacks get much more attention than home accidents, despite the fact sharks kill far less people. That does not mean that heuristics are a bad thing, or necessarily wrong. They do provide good initial guesses. Trusting those guesses as if intuitions were right, that is where we go wrong.

Heuristics are not the whole story though. Heuristics can explain some of the discrepancies between expected results and how we actually reason, but they do not account for everything. A second possible cause for those observed discrepancies is how many experiments were designed. The intention of several experiments was to check how well we obeyed EUT. So, they were planned with the tools of EUT in mind. That meant probability values and utilities. Money was used as a measure of utility, but no direct function was assumed in the experiments. The only suppositions were that people prefer to have more money than less and that they reasoned in a way that was equivalent to assigning some utility to their total assets. Probabilities, on the other hand, were presented in a straightforward way. The volunteers were often told the probability of each result in a gamble.

In the typical experiment, volunteers had to choose between two gambles. Those gambles had distinct chances attached to different amounts of money. By comparing several choices, researchers were able to determine the set of choices of a person was incompatible with the existence of a utility function. That meant the volunteers did not obey EUT. The natural question was in what ways were people departing from the normative choices?

Kahneman and Tversky realized that they could still use the EUT framework if they made a simple alteration (Kahneman and Tversky 1979). Maybe the subjects were not using the probability value given by the researchers. They assumed people altered the values they heard to less extreme values. And they observed such a correction could explain the differences between the observed behavior and EUT. It was as if when we tell people there is only a one-in-a-million chance they will win a lottery, they behave as if the odds are much better, maybe one in ten thousand, maybe even better chances. Indeed, if we observe the behavior of many people on improbable events, humans bet on the lottery as if it were much easier to win than it actually is.

That is, a weighting function on the probabilities could explain a good part of the observed behavior. But more recent studies have shown weighting functions cannot account for everything. We sometimes fail even at choosing an obviously better bet. That surprising decision was observed by Michael Birnbaum (2008). He observed that, depending on the details, people might prefer a gamble that was clearly worse. He asked volunteers to choose between similar bets: one of the possibilities, a 10 percent chance of winning $12, was split into two outcomes—one was a 5 percent chance of winning $12, the other, a 5 percent chance of winning $14. There was also the complementary 90 percent chance of winning $96, identical in both games. Many people picked the first gamble, even though that is clearly the worst choice, as the second game divides the chance to win $12 into the possibility of getting either the same $12 or a better outcome, $14.

The details of the gamble might not seem to be a simple case. However, in a much simpler scenario, of winning bets with equal chances, Cohen and collaborators (1971) had already observed how people mistakenly think they are far more likely to win two consecutive games despite the odds not being in their favor. The actual probability of winning twice is 25 percent, yet the subjects they observed, on average, estimated the probability to be around 45 percent.

It seems that we are not well adjusted to using probability values. Indeed, Gigerenzer and Hoffrage (1995) proposed that we can do a better job if we don’t receive probability estimates. Under some circumstances, it seems we make better estimates when we get those same estimates in terms of frequency. It might be better to state that something had been observed fifty times in a hundred attempts than saying the estimated probability was 50 percent.

Those errors do not fit well in the heuristics explanation. Even if our brains use heuristic, some of those cases seem to make no sense. Why perform extra calculations to alter probability values and, with that extra effort, get worse results? An important clue came from experiments on how well people estimated correlations from data. In those experiments, no probability values were stated. Instead, people observed only data containing two variables. The researchers then asked if one of those variables seemed to be associated with the other or not.

That situation was investigated by two teams of researchers in two different scenarios. In the first case, Chapman and Chapman (1967, 1969) tried to measure if people would see false correlations where none existed. They wanted to see what would happen when those false correlations were expected. They gave the volunteers pairs of words. Every pair was exhibited the same number of times, but the volunteers reported they thought that the pairs that made sense, like lion and tiger, appeared more often than the pairs that seemed random, as lion and eggs. When there was an expected relationship, people seemed to perceive that relation even when there was none.

Expectations also played a role in the reverse case, observed by Hamilton and Rose (1980). In their experiment, they used variables where no correlation was expected; and, agreeing with their initial expectations, they observed their subjects failed to notice the data had any association when the influence was weak or moderate. If the association was too strong, their volunteers noticed it, but even in that case, they felt the association was weaker than what the data showed.

That initial opinions would have any effect on data analysis might seem wrong—and it is, if the question is only what the data says—but things are different if you change the question a bit. If you are interested in not only the data but the original question that you wanted answered by the data, things are different. In that case, Bayesian methods (Bernardo and Smith 1994; O’Hagan 1994) provide a way to incorporate all you know about the problem into an answer, and they do so by telling us how we should combine what we already know with the new information we got. Assume you want to answer if an association is real, not only in the data, but in the real world. In that case, the correct way to answer is to use your prior knowledge as well as the new information. Normal people are far more interested in answering questions about the real world than just describing a specific data set.

As a consequence, your final estimate might show some correlation when you initially expected it even if no correlation exists in the data. Bayesian rules state your expected correlation should move toward what the data says. That means, if you expect some and observe none, your initial expectation must become weaker, but it should not become zero. The same is true when your prior knowledge is that your variables should be independent. In that case, when combined with a dependency in the data, your final conclusion should be that there is some correlation. The final correlation estimate, however, will be weaker than what you’d get only from the new information.

That does not mean that the volunteers were right. If the question was what was happening in the data, their answers were wrong. And if you are helping scientists in an experiment, it might not make much sense to have initial expectations. We do not live in laboratories, however; in real life, we might always have initial opinions. When there is some intuition that two things should be related, we may, at first, just assume that they are. It is certainly much faster than considering all the available information about the situation. Being fast does matter.

It seems we might reason in ways that approximate the results of a Bayesian analysis. Since we are not naturally born probabilists, the next question is how good we are at getting close to a Bayesian estimate. To some extent, we do seem to be naturally born Bayesians. We might get the numbers wrong, but, from a qualitative point of view, our reasoning seems to follow the basic ideas of Bayesianism (Tenenbaum et al. 2007). We take our initial opinions and we mix them with any new observation, obtaining a posterior estimate. That is exactly how Bayesian methods work. Experiments show that we reason following those general guidelines even when we are about twelve months old (Téglás et al. 2011).

As we look at the numbers instead of only the qualitative description, the story seems to change. Sometimes, we ignore part of the information that is available to us. We make easy probability mistakes (Tversky and Kahneman 1983). We fail at estimating our chances in two-stage lotteries. We also make mistakes in the case of the base rate neglect, first reported by Kahneman and Tversky (1973). That happens when we have information about the initial rate in which an event happens in a population. When we add to that an uncertain observation that give us clues on the probability of that event, we have a tendency to use only the observation. We just ignore the base initial rate. Apparently we only care about our own initial opinions, not about information we had not internalized somehow.

One traditional example of how base rate neglect works, and how damaging its consequences can be, is the case of disease testing. Suppose there is a sickness that is very serious, but it is not present in most of the population. Only 1 person in 1,000 has it. There is also a reliable but not perfect test for that disease. It reports a positive result when a patient is sick with a 98 percent chance of accuracy most of the time. And it provides a negative result for healthy patients, also 98 percent of the time. You are a medical doctor and a new patient you know nothing about enters your office. She has the results of that test and she got a positive result, suggesting she might be ill. The question before you is how likely that woman is to actually have the disease. You have not examined the woman, that is all you know. The test might be right, but it might have failed.

Most people estimate the chance that woman has the disease must be around 98 percent. After all, the test works 98 percent of the time. But, while the test reliability is a very important information for the final answer, it is not all. We also knew the base rate of the population, where only 1 in 1,000 are sick. Ignoring that part of the information can lead to very bad medical decisions. If we take the base rate as the initial probability, the initial opinion you should have, the actual chance the patient is sick is about 4.7 percent. That number is obtained by using Bayes theorem.

While surprising, it is not hard to understand why the probability is so low. As we had an initial probability of the disease at 0.1 percent, we can see that the probability increased by a factor of 47. That is a huge increase. But, starting from a very small chance, we still end with a less small but not large probability. More importantly, as the result was positive, one of two things might have happened. The patient might have been sick and that had an initial chance of 0.1 percent, or the test might have returned a wrong answer. That happens 2 percent of the time. It is much more probable that something that had a 2 percent chance of happening caused the positive result than something that had a 0.1 percent chance.

The correct solution to the problem is similar to how our reasoning works. We start with an initial estimate and correct it using the new data. But, from the point of view of our minds, that is not how the question is presented. The initial rate, while technically equivalent to an initial opinion, plays no such role in our brains. In problems where we neglect the base rate, we have no initial information, and we get two pieces of information, the base rate and the result of some test (or similar information). In that case, it makes sense to use the base rate as a prior estimate. That is exactly how I obtained the 4.7 percent figure.

A complete use of the Bayesian tools, as we will discuss further ahead, however, would demand I had some initial estimate about the problem before getting any data—base rates included. The initial estimate can be non-informative. That is, I might look for the best way to represent the fact that I know nothing about the problem. That would be a correct theoretical start. From there, the problem gives me two pieces of information, the base rate and the chances the test might fail. Both must be used to get the final chance the patient is sick. What happens in this case is not that people are disregarding initial opinions. They are not paying attention to part of the data. That could happen if they are only looking at the problem and trying to locate the number that makes more sense as an answer. That would be a solution in the spirit of finding a fast heuristic: pick the data that looks more relevant and use only that. If that is what we do, we might still need to learn what happens when humans actually have an opinion before they get the data.

Both memory and sensory perception provide more evidence that we reason in ways that mimic a Bayesian analysis. Almost everyone has seen pictures that deceive our eyes in some way. Some of those pictures have two possible interpretations. Others cause mistakes in our estimates of the size or the alignment of geometric figures. More complex figures can induce the illusion of movement when no actual movement exists. There are too many visual illusions. Attempts to systematize them into a few types or a theoretical framework have proved to be a surprisingly hard task (Changizi et al. 2008), but it does seem the way our brain interprets the information it receives from our eyes is similar to the way we reason (Helmholtz 1867). The task is indeed similar. Given what we know about the world and what we observe, the brain tries to arrive at the best possible conclusion. The reasoning, in this case, is not conscious. Our brains just provide us with their best guesses as if they were true (Eagleman 2001). Most of the time, those guesses are remarkably good—but not always.

If we sometimes get what we see wrong, this is not necessarily a bad thing. Recognition of patterns is a very useful skill. It does not matter whether those patterns emerge in the financial market or the behavior of the game one is hunting. If you are the first to identify it, there is more to gain. This can be enough to compensate for the cost of false detections. Indeed, in both general reasoning and in interpreting visual information, we are able to identify patterns very fast, but we also falsely identify random meaningless noise as something meaningful too often. This general phenomenon is called apophenia (Fyfe et al. 2008). An interesting example of how we see more than what is there is our tendency to identify faces everywhere. We do it with simple typographical juxtaposition of characters such as “:)” or “;-(“ and we do it when seeing faces on rocks or on toasts or on shadowy, blurred images from Mars. Quickly identifying other people and inferring their emotional state is a very useful trait for a social animal (Sagan 1995). The costs of thinking you see a face where there is none are far less important than the costs you can incur from failing to identify there is an actual person in your visual field.

This over-interpretation allows us to create new ways to communicate. It also can give extra meaning to art. But the reality is that much of what we see as faces is only the hard-wired conclusion of our brains. Evidence from MRI scanning of our brains show that the specific areas of the cortex that become more active when we see faces (Kanwisher et al. 1997) also show the same type of activity when we perceive a non-face image as a face (Hadjikhani et al. 2009). The timing of the activity suggests we don’t see an image and conclude it looks like a face. Our brains seem to see an actual face, instead of giving us a later reinterpretation of the image.

We tend to think of our memory as if it were stored in boxes. We might not find a specific box when we want it, and then we would say we forgot something, but we trust the content of each box. If we manage to find it, whatever is inside should correspond to how we had experienced a situation. At best, we may acknowledge the fact that our senses might have been deceived. We might have perceived something wrongly, but we recall accurately what we lived.

Unless some delusional state was involved, we trust memories, ours or from other people. Others may lie, but their memories are also dependable. We trust memories so completely, we send people to jail every day based only on witness testimonies, based on what those witnesses remember they saw or heard. We assume a healthy person will remember things as she has perceived them, and she will not create false memories nor alter the original ones.

This picture is often accepted not only by the layman but, until recently, by psychologists. Many practitioners believed in the concept of “repressed memory” (Loftus 1993). As a matter of fact, some psychologists still seem to believe it (Patihis et al. 2014). A “repressed memory” would be an event that a person has experienced in the past and has committed to memory, but that person would be unable to retrieve in the present moment; in other words, only the conscious memory would be missing. That would probably happen because the event the person experienced would have been too traumatic. Many therapists worked based on the idea that these memories can be recovered through treatment. When those memories were thus “recovered,” they would correspond to actual events in the life of the patient.

The first sign that there was something wrong with this picture appeared in the 1990s. Therapists observed an unexpectedly large number of cases where people claimed to have recovered the memories of abuses they had suffered. One thing was suspicious about these supposedly recovered memories: they often included elements that should be very rare, such as satanic practices, and all those cases were recovered under particular types of psychotherapy. And, as should happen if those memories were real, arrests and convictions followed. The large number of those cases, however, worried some researchers. Maybe those memories, as vivid and real as they seemed to be, were only an artifact of the therapy.

Research followed. In a series of interesting experiments, Elizabeth Loftus (1997) observed that she could create false memories in the minds of her subjects. Cases involving innocent people who had been wrongly found guilty were later discovered. Those wrong convictions were not only related to “repressed memories.” Many times the evidence of guilt had consisted only of witness reports based on distorted memories. Simple things like showing pictures of innocent people to a victim could cause the error. At a later point, the victim wrongly identified a man from those pictures as the man who had raped her. It is not clear how many innocent lives were destroyed due to our lack of understanding of how our minds work, or how many real culprits were not identified because investigations did not follow other leads once wrongly accused culprits had been found.

Our memory seems to be much more fluid than any of us would have thought. It changes to accept new information. Its final content is a function of what we perceived, our initial estimate, but it is also dependent on what we learn later, the new data. Memory combines both and tries to keep a record of whatever our brains conclude as the more plausible scenario. Our brains change the recording of events, so that they will fit our new beliefs. Missing pieces of information can be obtained from sources as unrelated to the event as a picture one observes later. What we carry in our minds is actually a mixture of what we observed, what we expected to see and things we have experienced or thought later, all mixed. As Steven Novella said in his blog, “When someone looks at me and earnestly says, ‘I know what I saw,’ I am fond of replying, ‘No, you don’t. You have a distorted and constructed memory of a distorted and constructed perception, both of which are subservient to whatever narrative your brain is operating under’” (Novella 2014).

Unfortunately, we have no uncertainty associated to our memories. A completely correct description of events would try to keep different possible scenarios. It would have chances associated to each one and the most likely memory could be the first one we would remember, followed by other possible alternative memories. But that is not how our brains work. While correct, that method would need extra mental effort and a larger capacity of memory. Our brains simply choose the more probable alternative and keep that. It is an approximation, but not as absurd as it might seem at first. Storing memories this way might also be a useful heuristic, but we need to be aware, to not blindly trust memories.

When we have an initial opinion, we have a tendency to keep it. But how strong is that tendency? Phillips and Edwards (1966) tested how much we change our estimates when we get new data. Their experiment had two bags filled with chips. The composition of each bag was known. They asked their subjects to estimate from where a series of chips had been drawn, assuming that, at first, both bags were equally likely. What they observed was that estimates did not change as much as they should. When their subjects changed that initial estimate of 50 percent to around 70 percent, the correct new estimate, dictated by probability rules, should have gotten them close to 97 percent. Phillips and Edwards coined the term conservatism. In this case, conservatism means our tendency to change our opinions less than we should.

Peterson and DuCharme observed similar results and called this tendency the primacy effect (Peterson and Ducharme 1967). They also asked their volunteers to estimate from each urn a set of chips had been drawn. The difference between both experiments was that Peterson and DuCharme rigged their draws. In their experiment, the total number of draws their volunteers observed favored neither urn, but for the initial thirty draws, the chips seemed to come from urn A, while the next thirty draws reversed that effect. However, once the volunteers started thinking A was the correct urn, the same amount of evidence in favor of B was not enough to make them doubt. Indeed, after thirty draws that favored A, it took fifty new ones favoring B to compensate for that initial opinion.

Conservatism is not observed only in laboratory experiments. It happens even when there is a lot of money to be made or lost, as for example in the world of corporate finance. Indeed, investors seem to react less than they should when they get new information (Kadiyala and Rau 2004). We seem to be too confident of our initial guesses. We might ignore data that we could use as an initial opinion and instead not use that information at all, as we have seen in the base rate neglect bias. But when the initial opinion is ours, we tend to keep it. It is as if we trusted our own evaluation better than external data, and we do that even when our opinion was formed recently from the same type of external data.

That mistrust can actually work as an extra explanation to our biases. We trust our minds more than we trust other sources. For many circumstances, that makes sense. Others might have hidden agendas, some people might lie to us and try to deceive us. Outside sources are less reliable than ourselves, at least in symmetrical situations where both sides had access to the same amount of information and training. When others know as much as we do, it is reasonable to trust ourselves more. People lie, after all.

Taking outside information with a grain of salt makes sense. If you do that, changing your opinions more slowly than a naïve Bayesian analysis might not be wrong at all. Indeed, to avoid deception, we should perform more rigorous calculations rather than putting our complete trust in the numbers provided at the experiments. We should include the possibility that others might be mistaken, lying, or exaggerating. Including those possibilities in a more realistic model of the world has the consequence of making us change our opinions less. In that case, we may be doing better work than assumed in those experiments—not perfect work. Ignoring base rates is ignoring relevant information. That is clearly an error, and our tendency to pay more attention to initial data than what we get later is also wrong. But we might have heuristics that, although fallible, could be compensating for the fact that the real world is far more uncertain than ideal situations in laboratory experiments.

Indeed, if we look more carefully at the way we alter the probabilities we hear, the same effect can be observed. In most real-life situations, that is, outside labs and classrooms, when someone tells us a probability, that is a raw estimate. That estimate is uncertain, and it is often based on few observations. That raw estimate is not an exact number; it is data. And we can use that data to help us infer what the correct value is. If we include those ideas in a more sophisticated Bayesian model, we get estimates for the actual probability values. Suppose your teenage son tells you that you can let him drive your car. He claims there is only one chance in a million he will cause any kind of damage to it. And you know that is a gross exaggeration, even if, for the moment, we assume he is telling the exact truth about what he thinks. Part of his numbers come from a faulty estimate on his part. As he only has data on a few cases of his driving, he could never guess parts in one million. His estimate also suffers from something called survivor bias. If he had crashed the car before, it is very likely he wouldn’t think that. If only because that would be a poor argument, an argument that you wouldn’t believe as you reminded him of the incident. So, only lucky teenagers get to make that claim. Of course, in the real case, your son is also probably lying on top of his estimate, trying to convince you. The probability he provides is not the true value.

Take a look at the lab experiments again. They were choices between gambles, and the gambles had exact probability values. Of course, the scientists are not your kid trying to get your car keys, but they are still people and you will still use your everyday skills if you are a volunteer to the experiment. When someone tells you there is only a 5 percent of chance of rain today, you should ask yourself how precise that is. That 5 percent becomes your data. If your initial opinion about the problem was weak, that would mean the chance of rain could be anything at first. The correct analysis uses that initial weak estimate and the 5 percent you heard. Your final estimate will be a probability between your lack of information and 5 percent. If the source for the 5 percent figure is very credible, it should dominate your final guess; if not, the 5 percent guess should matter little.

In the original experiments, Kahneman and Tversky observed a curve that described what our brains seem to do when they hear a probability (Kahneman and Tversky 1979). They called that curve a weighting curve. Weighting curves are the function that gives us the value w our brains might be using as probability when they hear a probability value p. We do not know if our brains actually use weighting functions though. All that they observed was that they could describe our behaviors as if we did. If we use them, we can still understand some of our choices as if they obeyed decision theory, except we use altered probability values. Is there any reason why we would do that?

Back to the rain problem, if you want to know how close your final estimate will get to the 5 percent, you need information on how reliable that value is. Is it a wild guess? Is it the result of state-of-the-art models that processed a huge amount of accurate climate information? Each case carries a different weight to the rain estimate. As we are looking for what would work well in most daily scenarios, the question is what a typical person would mean by a 5 percent value. Evidence shows people usually make guesses with little information and using small samples (Kareev et al. 1997). It is natural our brains would assume something like that.

Put all those ingredients together and you can do a Bayesian estimate of the actual chance of raining. I was able to show that the shape of the estimated curves we obtain from those considerations matches those observed in the experiments (Martins 2005, 2006). Even the cases where we pick worse gambles can be described by an appropriate choice of assumptions. In the model, that corresponds to a proper choice of parameter values.

Heuristics still seem to be a part of this answer. But here, it seems we have some specialized ways to interpret information about chances. Even when we should consider those chances as exact, we seem to think they are uncertain. It looks like we treat probabilities as the result of a guess by common humans. Common humans do not use probability nor statistics for their guesses. Our ancestors had to make their guesses based on the inaccurate estimates of their peers. That is also how we receive information when we are kids. Whatever our brains are doing, they might be helping us function better in a normal social setting. That might also make us fail badly in cases where we can get exact probability values, but those exact values only exist in artificial problems, usually with little impact on daily life.

There is an approximate description of human reasoning that is a standard terminology in the psychology literature (Kahneman 2011). It is an approximation of how our reasoning works, and it is also a useful way to understand how we are shaped. While criticized, and almost certainly nothing more than approximation, it is still a useful terminology for illustrating some important points.

That terminology claims our mental skills seem to work as if we had two types of systems in our brains. System 1 would be fast, based on heuristics, and work automatically with no conscious effort. System 2 would be activated when we need to solve complex problems that need a conscious effort, as, for example, calculations. It would work much slower and we tend to associate it with mental tasks that involve decision and agency. It would be related to our conscious thought.

A similar division was also proposed among artificial intelligence (AI) researchers, although they had a different question in mind. Moravec (1998) observed that creating artificial systems that behave in similar ways to our higher cognition functions is actually much easier to do than creating systems that perform tasks we consider trivial, such as walking or recognizing faces. Those trivial tasks are the ones we share with most animals.

The division between the two systems, however, is not perfect. First of all, they interact. One system obviously communicates with the other. Also, some tasks can change between systems. If you have learned to drive long ago, you can relate to that. The first time behind the wheel of a car, you had to make a genuine effort to remember all the details and to coordinate everything you had to do at the same time. That is, you were basically using your System 2. After months of practice, things became more natural; and after a time, you no longer noticed all the individual decisions your brain was making. Driving became natural, instinctive, System 1–like. Some harder tasks, those out of your comfort zone, for example, like parking a car you have never driven before into a small space, might require you to go back to using System 2. You will need to pay attention, make conscious decisions, and you will feel you require more from your brain.

And yet, System 1 usage is far from being effortless or easy; we are just not aware of it. Even if multiplying two numbers with three digits might feel a far harder task than walking, it is not. The number of calculations our brain has to perform to keep you moving and to keep you from falling is staggeringly larger. Old, very simple calculators can easily do the multiplication. State-of-the-art robots are still learning how to move as effectively and graciously as we can.

Indeed, the parts of our brain that we might call System 1 can do many very hard tasks so fast and, most of the time, so precisely, we are not even aware of its workings. Most of us are incredibly good at detecting when someone is angry, from very few clues. We can understand our native language even when the sound is horribly mixed with other sources and noise. We recognize faces well, often even when the person has made significant changes, such as changing their hairstyle, adding glasses, putting on makeup, cutting their beard, and so on. System 1, based on whatever heuristics our brain uses, is a very efficient system.

Sometimes, however, it fails; and, when we recognize a specific failure of System 1, so the terminology says, we can activate System 2. We can stop, wonder what went wrong, analyze scenarios, conduct whatever mental calculations we consider necessary. And try to learn, if possible. That is not a tale of incompetence. It is a tale of a system with some remarkable and efficient characteristics. It is not a perfect system—far from it—but we can still feel some pride in it. We must, however, learn how to use our brain better, understand its limitations, and recognize when our natural reasoning can be trusted—or not. While the terminology is not perfect, it highlights how we might use fallible heuristics and still not be considered a failure.

Too Confident

Probability and logic are not skills with which we are born—nothing new there. If our brains were only compensating from the natural uncertainty of the world, we could stop here. The lesson would be that we only had to trust our instincts less and that we had to trust our complete calculations—when they do exist—more. But the assumption that our brains are only trying to do the best with a complicated situation has some implications, and we can check if that is the case.

Assume all that mattered was finding the best answers with less effort. In that case, having a sense that we might be making mistakes would be useful. Even if we did not try to estimate how likely those mistakes are, simply awareness that they might happen would be good. It would help make it easier to reevaluate our opinions when we noticed the world seems to be at odds with them. That is, it is reasonable to expect we would be cautious and not too confident about our estimates. That is something we can verify. Measuring our confidence and comparing it with how accurate we are is something researchers have been doing for a while now.

To test how well professionals know how accurate they are, Oskamp performed a series of experiments (Oskamp 1965). The group of subjects was composed of clinical psychologists with several years of experience, psychology graduate students, and advanced undergraduate students. They received only written data about a certain Joseph Kidd. From that data, they had to evaluate his personality and predict his attitudes and typical actions. Extra information was introduced in each of four stages, that way, it was possible to see how the subjects opinions evolved as they received more information. Surprisingly, the subjects did not get better at answering the questions as they got more information. Their accuracy, the percentage of questions they got right, oscillated a little, but stayed in the 23–28 percent range. Notice that, as they were answering a multiple-choice test, that is only a little better than a random rate, 20 percent. But they became more confident about their answers. At first, they thought they had gotten about 33 percent of the questions right. After the fourth stage, they believed they had scored a little less than 53 percent of the questions.

New information made the subjects feel more confident, but it did not help them to answer more correctly. That tendency was confirmed in other studies. In another area of expertise, predicting results of basketball games, people became more confident with new data, but there was at least one type of extra information that seemed to make predictions become worse. When Hall et al. informed their subjects about the names of the teams, their estimate got worse (Hall et al. 2007).

Of course, not all information is damaging. Tsai and collaborators (2008) asked their subjects to predict the outcomes of games as well. They slowly added more information on performance statistics, but they provided no names. For the first six cues, the ability to predict the outcomes did improve. After that, and up to thirty different cues, accuracy no longer improved, but confidence kept increasing.

When we think our performance was much better than it actually was, we can say we are badly calibrated. In other words, we are overconfident. In the experiments I described, as people got more information, they started getting more and more overconfident. Even professionals seemed to think their work was better than it was. People trust their own opinion and evaluations when they should not. That happens to teachers (Praetorius et al. 2013), law enforcement officers (Chaplin and Shaw 2015), or consumers (Alba and Hutchinson 2000).

Overconfidence does not happen all the time. When accuracy increases to large values, overconfidence tends to diminish. Lichtenstein and Fischhoff (1977) described that in their experiments; when subjects scored higher than 80 percent, they usually became underconfident. Confidence did seem to increase with accuracy, but it did not increase as much as it should have when individuals got more competent.

Yet, it is not true that high confidence corresponds to high accuracy. Highly accurate people largely do report as confident, but those who score poorly can also feel confident about abilities they don’t have. When individuals estimated they were 99 percent sure they had answered correctly, Fischhoff and collaborators (1977) observed they were right between 73 percent and 87 percent. Even almost certainty, represented by only a chance in a million that they were wrong, was not real. When people were that sure, the researchers still observed an error rate between 4 percent and 10 percent. Remember how we change the probability values we hear to less extreme values. When everyone is stating they are far more sure than their competence, those corrections make perfect sense.

Dunning and et al. have suggested that part of the problem might come from incompetence itself. In several areas, estimating one’s competence is only possible if you are competent. Incompetent people might not have the required competence to know how accurate they are (Dunning et al. 2003). But whether that effect is real or not, that is only one cause for the problem, if at all. It is also likely that the social context might provide important incentives to overconfidence. In many situations, we choose experts based on how confident those experts are. When we do that, we are encouraging overconfidence over accuracy (Radzevick and Moore 2010). Indeed, it seems we are far more overconfident about our own skills than about the skills of others. Cooper and collaborators (1988) observed that entrepreneurs showed more overconfidence about their own chance of success than about the chance of success of others. That confidence did not depend on their competence. Both well and poorly prepared entrepreneurs seemed to show the same optimism.

When we get solid feedback about our mistakes, we can be trained to avoid being too confident. On the other hand, when that feedback does not exist, the problem can become quite serious. Physicians still tend to be overconfident (Christensen-Szalanski and Bushyhead 1981). But the same is no longer true for meteorologists (Murphy and Winkler 1984). That does not mean that meteorologists make accurate predictions, of course. It only means that it rains 40 percent of the times when they say there is 40 percent chance of chance of rain—and, in that case, it does not rain 60 percent of times. Meteorologists were not well calibrated in the past. They managed to improve their estimates by observing if their predictions were true. Of course, checking if it rains is much easier than determining if a diagnostic is correct. That did make their assessments of their own calibration much easier. They observed how often their forecasts were correct. With that data, meteorologists were able to use that information in their models. From that, they managed to basically get rid of the overcalibration problem, but they had to learn how to do it and use strong mathematical methods to keep them well calibrated.

Our failure at knowing what we know is problematic. It should be useful to know the limitations of our abilities. Among other things, that would allow us to know when we need to improve. Blindly believing we are capable when we are not does not sound like a successful strategy. It does not sound like a good heuristic. It also does not seem to be an answer to any kind of uncertainty in the information we receive. Overconfidence seems to be so prevalent that we need a new, extra explanation for some of our mistakes. We still need to explore our own cognition in more detail to understand what might be going on.

References

Alba, J. W., and J. W. Hutchinson. 2000. “Knowledge Calibration: What Consumers Know and What They Think They Know.” Journal of Consumer Research 27(2), 123–56.

Allais, P. M. 1953. “The Behavior of Rational Man in Risky Situations—a Critique of the Axioms and Postulates of the American School.” Econometrica 21, 503–46.

Aristotles. 2007. The Organon or Logical Treatises of Aristotle. Whitefish, MT: Kessinger Publishing.

Bernardo, J. M., and A. Smith. 1994. Bayesian Theory. New York: Wiley.

Birnbaum, M. H. 2008. “New Paradoxes of Risky Decision Making.” Psychology Review 115(2), 463–501.

Changizi, M. A., A. Hsieh, R. Nijhawan, R. Kanai, and S. Shimojo. 2008. “Perceiving the Present and a Systematization of Illusions.” Cognitive Science 32, 459–503.

Chaplin, C., and J. Shaw. 2015. “Confidently Wrong: Police Endorsement of Psycho-Legal Misconceptions.” Journal of Police and Criminal Psychology 31(3), 1–9.

Chapman, L. J., and J. P. Chapman. 1967. “Genesis of Popular but Erroneous Psychodiagnostic Observations.” Journal of Abnormal Psychology 72(3), 193–204.

———. 1969. “Illusory Correlation as an Obstacle to the Use of Valid Psychodiagnostic Signs.” Journal of Abnormal Psychology 74(3), 271–280.

Christensen-Szalanski, J. J., and J. B. Bushyhead. 1981. “Physicians’ Use of Probabilistic Information in a Real Clinical Setting.” Journal of Experimental Psychology: Human Perception and Performance 7(4), 928–935.

Cohen, J., E. Chesnick, and D. Haran. 1971. “Evaluation of Compound Probabilities in Sequential Choice.” Nature 232(5310), 414–16.

Cooper, A. C., Y. Woo, Carolyn, and W. C. Dunkelberg. 1988. “Entrepreneurs’ Perceived Chances for Success.” Journal of Business Venturing 3(2), 97–108.

Dunning, D., K. Johnson, J. Ehrlinger, and J. Kruger. 2003. “Why People Fail to Recognize Their Own Incompetence.” Current Directions in Psychological Science 12, 83–87.

Eagleman, D. M. 2001. “Visual Illusions and Neurobiology.” Nature Reviews Neuroscience 2(12), 920–26.

Ellsberg, D. 1961. “Risk, Ambiguity and the Savage Axioms.” Quarterly Journal of Economics 75, 643–69.

Fischhoff, B., P. Slovic, and S. Lichtenstein. 1977. “Knowing with Certainty: The Appropriateness of Extreme Confidence.” Journal of Experimental Psychology: Human Perception and Performance 3(4), 552–64.

Fyfe, S., C. Williams, O. J. Mason, and G. J. Pickup. 2008. “Apophenia, Theory of Mind and Schizotypy: Perceiving Meaning and Intentionality in Randomness.” Cortex 44(10), 1316–25.

Gigerenzer, G., and U. Hoffrage. 1995. “How to Improve Bayesian Reasoning without Instruction: Frequency Formats.” Psychology Review 102, 684–704.

Gigerenzer, G., P. M. Todd, and T. A. R. Group. 2000. Simple Heuristics That Make Us Smart. New York: Oxford University Press.

Hadjikhani, N., K. Kveraga, P. Naik, and S. Ahlfors. 2009. “Early (m170) Activation of Face-Specific Cortex by Face-like Objects.” Neuroreport 20(4), 403–7.

Hall, C. C., L. Ariss, and A. Todorov. 2007. “The Illusion of Knowledge: When More Information Reduces Accuracy and Increases Confidence.” Organizational Behavior and Human Decision Processes 103(2), 277–90.

Hamilton, D. L., and T. L. Rose. 1980. “Illusory Correlation and the Maintenance of Stereotypic Beliefs.” Journal of Personality and Social Psychology 39(5), 832–45.

Helmholtz, H. v. 1867. “Concerning the Perceptions in General.” Treatise in Physiological Optics, Volume Part III. The Optical Society of America (1925).

Kadiyala, P., and P. R. Rau. 2004. “Investor Reaction to Corporate Event Announcements: Underreaction or Overreaction?” The Journal of Business 77(2), 357–86.

Kahneman, D. 2011. Thinking, Fast and Slow. New York: Farrar, Straus and Giroux.

Kahneman, D., and A. Tversky. 1973. “On the Psychology of Prediction.” Psychology Review 80(4), 237–51.

———. 1979. “Prospect Theory: An Analysis of Decision under Risk.” Econometrica 47(2), 263–91.

Kanwisher, N., J. McDermott, and M. M. Chun. 1997. “The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception.” Journal of Neuroscience 17(11), 4302–11.

Kareev, Y., I. Lieberman, and M. Lev. 1997. “Through a Narrow Window: Sample Size and the Perception of Correlation.” Journal of Experimental Psychology: General 126, 278–87.

Lichtenstein, D., and B. Fischhoff. 1977. “Do Those Who Know More Also Know More about How Much They Know? The Calibration of Probability Judgments.” Organizational Behavior and Human Performance 20(2), 552–64.

Loftus, E. F. 1993. “The Reality of Repressed Memories.” American Psychology 48(5), 518–37.

———. 1997. “Creating False Memories.” Scientific American 277(3), 70–75.

Martins, A. C. R. 2005. “Adaptive Probability Theory: Human Biases as an Adaptation.” Cogprint preprint at http://cogprints.org/4377/.

———. 2006. “Probabilistic Biases as Bayesian Inference.” Judgment and Decision Making 1(2), 108–17.

Moravec, H. 1998. “When Will Computer Hardware Match the Human Brain?” Journal of Evolution and Technology 1.

Murphy, A. H., and R. L. Winkler. 1984. “Probability Forecasting in Meteorology.” Journal of the American Statistical Association 79(387), 489–500.

Nickerson, R. S. 1998. “Confirmation Bias: A Ubiquitous Phenomenon in Many Guises.” Review of General Psychology 2(2), 175–220.

Novella, S. 2014. “Sleep and False Memory.” Neurologica, http://theness.com/neurologicablog/index.php/sleep-and-false-memory/.

O’Hagan, A. 1994. Kendall’s Advanced Theory of Statistics: Bayesian Inference, Volume 2B. London: Arnold.

Oskamp, S. 1965. “Overconfidence in Case-Study Judgments.” Journal of Consulting Psychology 29(3), 261–65.

Patihis, L., L. Y. Ho, I. W. Tingen, S. O. Lilienfeld, and E. F. Loftus. 2014. “Are the “Memory Wars” Over? A Scientist-Practitioner Gap in Beliefs about Repressed Memory.” Psychological Science 25(2), 519–30.

Peterson, C. R., and W. M. Ducharme. 1967. “A Primacy Effect in Subjective Probability Revision.” Journal of Experimental Psychology 73(1), 61.

Philips, L. D., and W. Edwards. 1966. “Conservatism in a Simple Probability Inference Task.” Journal of Experimental Psychology 27(3), 346–54.

Praetorius, A.-K., V.-D. Berner, H. Zeinz, A. Scheunpflug, and M. Dresel. 2013. “Judgment Confidence and Judgment Accuracy of Teachers in Judging Self-Concepts of Students.” The Journal of Educational Research 106(1), 64–76.

Radzevick, J. R., and D. A. Moore. 2010. “Competing to Be Certain (but Wrong): Market Dynamics and Excessive Confidence in Judgment.” Management Science 57(1), 93–106.

Sagan, C. 1995. The Demon-Haunted World: Science as a Candle in the Dark. New York: Random House.

Simon, H. A. 1956. “Rational Choice and the Structure of Environments.” Psychological Review 63(2), 129–38.

Téglás, E., E. Vul, V. Girotto, M. Gonzalez, J. B. Tenenbaum, and L. L. Bonatti. 2011. “Pure Reasoning in 12-Month-Old Infants as Probabilistic Inference.” Science 332(6033), 1054–59.

Tenenbaum, J. B., C. Kemp, and P. Shafto. 2007. “Theory-Based Bayesian Models of Inductive Reasoning.” In A. Feeney and E. Heit (Eds.), Inductive Reasoning. Cambridge: Cambridge University Press.

Tsai, C. I., J. Klayman, and R. Hastie. 2008. “Effects of Amount of Information on Judgment Accuracy and Confidence.” Organizational Behavior and Human Decision Processes 107(2), 97–105.

Tversky, A., and D. Kahneman. 1973. “Availability: A Heuristic for Judging Frequency and Probability.” Cognitive Psychology 5(2), 207–32.

———. 1983. “External versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment.” Psychological Review 90, 293–315.

von Neumann, J., and O. Morgenstern. 1947. Theory of Games and Economic Behavior. Princeton: Princeton University Press.

Подняться наверх