- Preface
**Part I**- 1 The Monty Hall Problem
- 2 Logic
- 3 Truth Tables
- 4 The Gambler’s Fallacy
- 5 Calculating Probabilities
- 6 Conditional Probability
- 7 Calculating Probabilities, Part 2
- 8 Bayes’ Theorem
- 9 Multiple Conditions
- 10 Probability & Induction
**Part II**- 11 Expected Value
- 12 Utility
- 13 Challenges to Expected Utility
- 14 Infinity & Beyond
**Part III**- 15 Two Schools
- 16 Beliefs & Betting Rates
- 17 Dutch Books
- 18 The Problem of Priors
- 19 Significance Testing
- 20 Lindley’s Paradox
**Appendix**- A Cheat Sheet
- B The Axioms of Probability
- C The Grue Paradox
- D The Problem of Induction
- E Solutions to Selected Exercises

It’s tough to make predictions, especially about the future.

—Yogi Berra

Many inductive arguments work by projecting an observed pattern onto as-yet unobserved instances. All the ravens we’ve observed have been black, so all ravens are. All the emeralds we’ve seen have been green, so all emeralds are.

The assumption that the unobserved will resemble the observed seems to be central to induction. Philosophers call this assumption the *Principle of Induction*.11 See Section 2.5 and Appendix C for previous discussions of the Principle of Induction. But what justfies this assumption? Do we have any reason to think the parts of reality we’ve observed so far are a good representation of the parts we haven’t seen yet?

Actually there are strong reasons to doubt whether this assumption can be justified. It may be impossible to give any good argument for expecting the unobserved to resemble the observed.

We noted in Chapter 2 that there are two kinds of argument, inductive and deductive. Some arguments establish their conclusions necessarily, others only support them with high probability. If there is an argument for the Principle of Induction, it must be one of these two kinds. Let’s consider each in turn.

Could we give an inductive argument for the Principle of Induction? At first it seems we could. Scientists have been using inductive reasoning for millenia, often with great success. Indeed, it seems humans, and other creatures too, have relied on it for much longer, and could not have survived without it. So the Principle of Induction has a very strong track record. Isn’t that a good argument for believing it’s correct?

Figure D.1: David Hume (1711–1776) raised the problem of induction in \(1739\). Our presentation of it here is somewhat modernized from his original argument.

No, because the argument is circular. It uses the Principle of Induction to justify believing in the Principle of Induction. Consider that the argument we are attempting looks like this:

The principle has worked well when we’ve used it in the past.

Therefore it will work well in future instances.

This is an inductive argument, an argument from observed instances to ones as yet unobserved. So, under the hood, it appeals to the Principle of Induction. But that’s exactly the conclusion we’re trying to establish. And one can’t use a principle to justify itself.

What about our second option: could a deductive argument establish the Principle of Induction? Well, by definition, a deductive argument establishes its conclusion with necessity. Is it necessary that the unobserved will be like the observed? It doesn’t look like it. It seems perfectly possible that tomorrow the world will go haywire, randomly switching from pattern to pattern, or even to no pattern at all.

Maybe tomorrow the sun will fail to rise. Maybe gravity will push apart instead of pull together, and all the other laws of physics will reverse too. And just as soon as we get used to those patterns and start expecting them to continue, another pattern will arise. And then another. And then, just as we give up and come to have no expectation at all about what will come next, everything will return to normal. Until we get comfortable and everything changes again.

Thankfully, our universe hasn’t been so mischievous. We get surprised now and again, but for the most part inductive reasoning is pretty reliable, when we do it carefully. But we’re lucky in this respect, is the point.

Nature *could* have been mischievous, totally unpredictable. It is not a necessary truth that the unobserved must resemble the observed. And so it seems there cannot be a deductive argument for the Principle of Induction. Because such an argument would establish the principle as a necessary truth.

If you read Appendix C, you know of another famous problem with the Principle of Induction: the grue paradox. (If you haven’t read that chapter, you might want to skip this section.)

The two problems are quite different, but it’s easy to get them confused. The problem we’re discussing here is about justifying the Principle of Induction. Is there any reason to believe it’s true? Whereas the grue paradox points out that we don’t even really know what the principle says, in a way. It says that what we’ve observed is a good indicator of what we haven’t yet obsered. But in what respects? Will unobserved emeralds be green, or will they be grue?

So the challenge posed by grue is to spell out, precisely, what the Principle of Induction says. But even if we can meet that challenge, this challenge will remain. Why should we believe the principle, once it’s been spelled out? Neither a deductive argument nor an inductive argument seems possible.

The Problem of Induction is centuries old. Isn’t it out of date? Hasn’t the modern, mathematical theory of probability solved the problem for us?

Not at all, unfortunately. One thing we learn in this book is that the laws of probability are very weak in a way. They don’t tell us much, without us first telling them what the prior probabilities are. And as we’ve seen over and again throughout Part III, the problem of priors is very much unsolved.

For example, suppose we’re going to flip a mystery coin three times. We don’t know whether the coin is fair or biased, but we hope to have some idea after a few flips.

Now suppose we’ve done the first two flips, both heads. The Principle of Induction says we should expect the next flip to be heads too. At least, that outcome should now be more probable.

Do the laws of probability agree? Well, we need to calculate the quantity: \[ \p(H_3 \given H_1 \wedge H_2).\] The definition of conditional probability tells us: \[ \begin{aligned} \p(H_3 \given H_2 \wedge H_1) &= \frac{\p(H_3 \wedge H_2 \wedge H_1)}{\p(H_2 \wedge H_1)}. \end{aligned} \] But the laws of probability don’t tell us what numbers go in the numerator and the denominator.

The numbers have to be between \(0\) and \(1\). And we have to be sure mutually exclusive propositions have probabilities that add up, according to the Additivity rule. But that still leaves things wide open.

For example, we could assume that all possible sequences of heads and tails are equally likely. In other words: \[ \p(HHH) = \p(THH) = \p(HTH) = \ldots = \p(TTT) = 1/8. \] Given that assumption, we get the result that \(\p(H_3 \given H_2 \wedge H_1) = 1/2\). \[ \begin{aligned} \p(H_3 \given H_2 \wedge H_1) &= \frac{\p(H_3 \wedge H_2 \wedge H_1)}{\p(H_2 \wedge H_1)}\\ &= \frac{1/8}{1/8 + 1/8}\\ &= 1/2. \end{aligned} \] But that means the first two flips didn’t tell us anything about the third! Even after we got two heads, the chance of another heads is still stuck at \(1/2\), same as it was to start with.

We didn’t *have* to assume all possible sequences are equally likely. We can make a different assumption, and get a different result.

Let’s try assuming instead that all possible *frequencies* of heads are equally probable. In other words, the probability of getting \(0\) heads is the same as the probability of getting \(1\) head, which is also the same as the probability of getting \(2\) heads, and likewise for \(3\) heads. So we’re grouping the possible sequences like so:

- \(0\) heads: \(TTT\)
- \(1\) head: \(HTT\), \(THT\), \(TTH\)
- \(2\) heads: \(HHT\), \(HTH\), \(THH\)
- \(3\) heads: \(HHH\)

Each grouping has the same probability, \(1/4\). And for the groups in the middle, which have multiple members, we divide that evenly between the members. So \(\p(HTH) = 1/12\), for example, but \(\p(TTT) = 1/4\).

This might seem a funny way of assigning prior probabilities. But it actually leads to very sensible results; the same results as the Rule of Succession in fact! For example, we get \(\p(H_3 \given H_2 \wedge H_1) = 3/4\): \[ \begin{aligned} \p(H_3 \given H_1 \wedge H_2) &= \frac{\p(H_3 \wedge H_2 \wedge H_1)}{\p(H_2 \wedge H_1)}\\ &= \frac{1/4}{1/4 + 1/12}\\ &= 3/4. \end{aligned} \] So, on this analysis, the first two tosses do tell us what to expect on the next toss.

We’ve seen two different ways of assigning prior probabilities, which lead to very different results. The first way, where we assume all possible sequences are equally likely, disagrees with the Principle of Induction. Our observations of the first two flips tell us nothing about the next one. But the second way, where we assume all possible frequencies are equally likely, agrees with with the Principle of Induction. Observing heads on the first two does tell us to expect another heads on the next one.

Both assumptions are consistent with the laws of probability. So those laws don’t, by themselves, tell us what to expect. The laws of probability only tell us what to expect once we’ve specified the prior probabilities. The problem of induction challenges us to justify one choice of prior probabilities over the alternatives.

In the \(280\) years since this challenge was first raised by David Hume, no answer has gained general acceptance.

Suppose we do \(100\) draws, with replacement, from an urn containing an unknown mixture of black and white balls. All \(100\) draws come out black. Which of the following is correct?

- According to the laws of probability, the next draw is more likely to be black than white.
- According to the laws of probability, the next draw is more likely to be white than black.
- According to the laws of probability, the next draw is equally likely to be black vs. white.
- The laws of probability are consistent with any of the above conclusions; it depends on the prior probabilities.
- None of the above.

Write a short essay (\(3\)–\(4\) paragraphs) explaining the problem of induction. Your essay should include all of the following:

- a clear, accurate explanation of the Principle of Induction,
- a clear, accurate explanation of the dilemma we face in justifying the Principle of Induction,
- a clear, accurate explanation of the challenge for the deductive horn of this dilemma, and
- a clear, accurate explanation of the challenge for the inductive horn of the dilemma.

A coin will be tossed \(3\) times, so the number of heads could be \(0\), \(1\), \(2\), or \(3\). Suppose all \(4\) of these possibilities are equally likely. Moreover, any two sequences with the same number of heads are also equally likely. For example, \(\p(H_1 \wedge H_2 \wedge T_3) = \p(T_1 \wedge H_2 \wedge H_3)\). Answer each of the following.

- What is \(\p(H_2 \given H_1)\)?
- What is \(\p(H_3 \given H_1 \wedge H_2)\)?

Now suppose we do \(4\) tosses instead of \(3\). The prior probabilities follow the same rules: all possible numbers of heads are equally likely, and any two sequences with the same number of heads are equally likely.

- If the first \(n\) tosses come up heads, what is the probability the next toss will come up heads? In other words, give a formula for \(\p(H_{n+1} \given H_1 \wedge \ldots \wedge H_n)\) in terms of \(n\).
- What if only \(k\) out of the first \(n\) tosses land heads, then what formula gives the probability of heads on the next toss?

Suppose a computer program prints out a stream of A’s and B’s. After observing the sequence A, A, B, we want to know the probability of an A next.

Our friend Charlie suggests we reason as follows. We are going to observe \(4\) characters total. Before we observed any characters, there were \(5\) possibilities: the total number of As could turn out to be \(0\), \(1\), \(2\), \(3\), or \(4\). And all of these possibilities are equally likely. So each has prior probability \(1/5\).

Some of these possibilities can be subdivided. For example, there are \(4\) ways to get \(3\) A’s:

A, A, A, B

A, A, B, A

A, B, A, A

B, A, A, ASo each of these sequences gets \(1/4\) of \(1/5\), in other words prior probability \(1/20\).

According to Charlie’s way of reasoning, what is the probability the \(4\)th character will be an A, given that the first \(3\) were A, A, B?

In this chapter we considered a coin to be flipped \(3\) times, with the same prior probability for every possible sequence of heads/tails. Now suppose the coin will be flipped some very large number of times, \(n\). And again suppose the prior probability is the same for every possible sequence of heads/tails. Prove that no matter how many times the coin lands heads, the probability of heads on the next toss is still \(1/2\). In other words, prove that \(\p(H_{k+1} \given H_1 \wedge \ldots \wedge H_{k}) = 1/2\) no matter how large \(k\) gets.

Suppose a coin will be flipped \(3\) times. There are \(2^3 = 8\) possible sequences of heads/tails that we might get. Find a way of assigning prior probabilities to these \(8\) sequences so that, the more heads we observe, the

*less*likely it becomes we’ll get heads on the next toss. In other words, assign a prior probability to each sequence so that \(\p(H_2 \given H_1) < \p(H_2)\) and \(\p(H_3 \given H_2 \wedge H_1) < \p(H_3 \given H_1)\).Suppose a coin will be flipped \(4\) times. Recall the two different ways of assigning a probability to each possible sequence of heads and tails discussed in the chapter:

- Scheme 1: all possible sequences have the same probability.
- Scheme 2: all possible
*frequencies*(numbers of heads) have the same probability, and all sequences that share the same frequency have the same probability.

According to each scheme, how probable is each of the following propositions?

- \(A =\) All tosses will land the same way.

- \(B =\) The number of heads and tails will be the same.

Now answer the same question with \(10\) tosses instead of \(4\).