10
Why Not Bayes
10.1. Prelude
A central proposition of this book is that there are no universal rules for inductive inference. The chapters so far have sought to argue for this proposition and illustrate it by showing how several popular accounts of inductive inference fail to provide universally applicable rules. Many in an influential segment of the philosophy of science community will judge these efforts to be mistaken and futile. In their view, the problem has been solved, finally and irrevocably.
This segment of the community represents “Bayesians” who work in what has come to be called “Bayesian epistemology.” Its central idea is that issues of belief and inductive inference are to be treated solely by means of the probability calculus. The central structure is a conditional probability measure P(A | B), the probability of proposition A against the background proposition B. The term “Bayesian” derives from an easily proven theorem in the probability calculus, Bayes’ theorem:
The theorem provides the central engine for inference in Bayesian epistemology. The inference starts with some prior belief or inductive strength of support for a hypothesis H on the background B, P(H | B). Learning evidence E leads the prior probability to be updated to the posterior probability P(H | E & B), which now incorporates the full import of the evidence E. This posterior probability is computed via Bayes’ theorem using the auxiliary quantities, the “likelihood” P(E | H & B) and the “expectedness” P(E | B) = P(E | H & B) P(H | B) + P(E | −H & B) P(−H | B).
This is the barest sketch of the core notions of the Bayesian approach, which is now so widely known as not to require further elaboration. There is much more to the general Bayesian approach and there are many variant forms. Recalcitrant cases that do not easily fit with the core notions in their bare form are treated by “imprecise probabilities.” The imprecision derives from replacing a single probability measure by a set of measures, or by replacing an additive measure by a superadditive measure. These are conceived of as providing a generalized probabilistic analysis. In more recent scholarship, Bayesian epistemology has been subsumed under the heading of “formal epistemology,” whose leading idea is that epistemic problems are to be addressed by formal and mathematical methods. There is little real change, however, as far as belief and inductive inference are concerned. Probability measures remain the principal instrument used to treat them.
For present purposes, the core commitment of the Bayesian tradition resides in a single idea:
It’s all probabilities.
Here just what “probabilities” mean can be construed differently according to one’s interpretive inclinations. However, this general conception is taken to solve the essential problems addressed in this book. There are universal rules, the Bayesian tradition holds. They are axioms of the probability calculus. Once this is recognized, all that remains are the finer details of determining just how they are to be applied to each problem. The big problem is solved.
The purpose this chapter and the remaining chapters of this book is to explain why I am dissatisfied with this Bayesian solution.
10.2. Introduction
The case against the universality of probabilities will be made in this chapter in two parts. The first part will apply the general argument developed in Chapter 2 against the idea that any calculus—probabilistic or otherwise—can be universally applicable. The core of the argument can be summed up in the following:
337Any logic of induction must restrict what happens in ways that go beyond logical consistency. Hence, a logic of induction is applicable in some domain if the facts of that domain match the factual restrictions of the logic of induction. Since there is no universally applicable factual restriction, in general, different domains require different inductive logics.
This argument will be developed more fully in Sections 10.4 and 10.5 and will conclude that any calculus of induction must eventually reach a boundary to its domain of applicability beyond which it fails.
I will argue in Section 10.6 that efforts to develop theories of imprecise probabilities are misplaced attempts to disguise these boundaries. They use an additive measure merely as adjuncts to simulate the non-additive inductive logic of a new domain. In foundational terms, they mislead by fostering the impression that “it’s all probabilities” even when the logic simulated is inherently non-additive.
As a foil for further analysis, Sections 10.7, 10.8, and 10.9 will present an extreme but simple example of such a non-additive inductive logic. The example is the relation of “completely neutral support,” which is derived from the principle of indifference and illustrated by Richard von Mises’ example of different mixtures of wine and water. Section 10.10 reviews the extent to which theories of imprecise probability can accommodate completely neutral support. In so far as they do not accommodate it, they are inadequate; in so far as they do, they are superfluous. Any success in this one case merely postpones the inevitable failures of a probabilistic logic, I argue, that must eventually arise when it seeks to accommodate more exotic logics.
The second part of the case against the universality of probabilities is a general rebuttal of the many proofs offered in the literature as demonstrating the necessity of probabilities. These proofs come in different guises. One of the oldest and best known is Frank Ramsey and Bruno de Finetti’s Dutch book argument. It is used to infer that non-probabilistic distributions of belief are “incoherent,” which is a form of irrationality. All such proofs must fail and they must fail in the same way, for this reason:
338A proof of necessity of probabilities is a deductive argument whose premises must be at least as strong logically as the conclusion. Therefore the assumptions of the proof must already presuppose the necessity of probabilities or something logically stronger. Hence, by dominance we are better off simply presupposing the necessity of probabilities at the outset and forgoing the proof.
This general argument is developed in Sections 10.11 and 10.12. It is used to predict that a careful analysis of a proof of the necessity of probabilities triggers a regress of reasons. For the assumptions used in the proof will always be found to be improperly grounded. Attempts to provide proper grounding will require new assumptions that will then also prove to be improperly grounded.
The principal illustration of this circularity and the ensuing regress of reasons will be the recent efforts to vindicate probabilities by means of notions of accuracy and scoring rules. Since the analysis is quite extensive, it is postponed until the next chapter. To show that other attempts at vindication fail in the same way, two more examples are given briefer treatment in this chapter. Section 10.13 examines the Dutch book argument. It identifies which assumptions already have the axioms of the probability calculus built into them and recounts the failure of attempts to remove the circularity. Section 10.14 repeats this analysis for a different approach developed by Richard Cox (1961) and Edwin Jaynes (2003). There, necessary conditions are identified for strengths of inductive support, and from them the computational rules of the probability calculus are recovered by functional analysis. We will see that the necessity of the conditions requires further grounding, triggering the now familiar regress of reasons. Conclusions in Section 10.15 suggest further directions of exploration.
As a preliminary, in Section 10.3, I will distinguish objective from subjective approaches and apologize to the reader for not always distinguishing them clearly as the chapter unfolds.
Finally, before proceeding, I would like to give Bayesian epistemology its due. My view is far from a complete dismissal of Bayesian epistemology. I view it in the same way as I view all other candidate logics of induction. Whether it applies in some domain is determined by the background facts of the domain. These background facts will also determine the variety of Bayesianism applicable. Stronger facts will authorize strict Bayesianism in which inductive support or, subjectively speaking, beliefs are measured by a single probability measure. Weaker facts will authorize a relaxed Bayesianism in which these supports are represented by sets of probability measures or upper and lower bounds. There are many domains in which varieties of Bayesian analysis are authorized and can be applied. In such domains, it provides a wonderful instrument.
Bayesian epistemology is formally precise where other accounts flounder. Arguments from analogy struggle to separate the strong from the weak analogies. Accounts that reward simplicity cannot provide a clear and unobjectionable notion of simplicity whose measure translates mechanically into inductive strength. In contrast, once the probability space is well defined, the Bayesian analysis has no such trouble. Determining all its relations is reduced to well-defined computations in the probability space. When the system under investigation becomes very complicated, other approaches provide little guidance on how apparently conflicting evidence is to be combined. The fossil record is best explained by an old earth. The earth’s cool temperature is best explained through Newton’s law of cooling by a young earth. Using inference to the best explanation, to which do we infer? If Bayesians can pass the formidable hurdle of providing a well-defined probability space, Bayesians can answer the corresponding questions by mechanical computation. For all the information needed to trade off competing items of evidence lies within the conditional probabilities. Indeed, if any general question about belief or inductive support can be translated into a precise query in probability theory, it can be decided by a theorem that affirms or denies it.
With virtues as strong as these, it is all too appealing to hope that Bayesian analysis can be applied universally. When the inevitable problems arise, it is easy to dismiss them as the routine teething troubles of an infant who will outlive them and grow to boisterous maturity. Once that was a defensible attitude. As time passes and the problems remain unsolved, we can no longer afford to indulge the universal aspirations. If we are to understand what inductive inference is fundamentally, we need a different approach.
10.3. Objective and Subjective Bayesianism
My concerns in this book are the objective relations of inductive support. Bayesians are also interested in these relations in so far as they expect them to be embraced by their analyses in one form or another. What complicates responding to Bayesian ambitions of universality is that Bayesianism is not a univocal doctrine. It comes in many varieties.
A major division is between the objective Bayesians, such as Jaynes, and the subjective Bayesians, such as de Finetti. The objective Bayesians are distinguished by the claim that, in any epistemic situation, there is one correct probability distribution applicable. There is, in particular, one correct prior probability. In this regard, the project of objective Bayesians is closest to mine. I am comfortable regarding the objective Bayesians’ conditional probability P(H | E & B) as an attempted expression of the objective strength of inductive support provided by evidence E in background B for hypothesis H. As I will remark briefly in the concluding section of this chapter, the primary obstacle facing objective Bayesians specifically is that the rules needed to define this one correct prior are arbitrary.
Subjective Bayesians permit many probability distributions, constrained only by conformity with the axioms of the probability calculus. They characterize the freedom in our choice among the probability distributions as a free exercise of opinion. Thus, antecedent to the consideration of evidence, we are free to choose any prior probability distribution we like. Thus, at best, for a subjective Bayesian the conditional probability P(H | E & B) cannot simply express the strength of inductive support accrued to H, since it has no unique value. Rather it is, at best, one of many possible mixes of evidential support and opinion. The hope is that eventually, in some longer term limit, the balance will move decisively toward objective support. There are also many attempts to derive measures from confirmatory support from the subjective probabilities.
My principal concern with the subjective approach is that it demotes strengths of inductive support to a derived quantity. The primary quantity—the conditional probability—is a measure of belief or credence. Strengths of inductive support are to be recovered from them. That a notion of belief should be more primitive than a notion of inductive support has it the wrong way round. We wish to assess the strength of inductive support evolutionary theory derives from the fossil record or what Big Bang cosmology derives from the cosmic background radiation. To make this assessment, we should not first have to determine our beliefs or credences. These strengths should not be dependent on our beliefs, else the objectivity of science is at risk. Worse, the project of assessing these strengths of inductive support from the beliefs has proven to be so troublesome that no univocal assessment is recoverable from the present literature in subjective Bayesianism. It was an approach that was risky and has not yet succeeded.
Since there is so much in common between the objective and subjective approaches to Bayesianism, it is impractical in what follows to keep them fully separated. There is often no need, since argumentation concerning subjective probabilities can often be adapted to apply to objective probabilities, and vice versa. Thus, in this chapter and the chapters that follow, I will move freely between treating probabilities as objective relations of support and subjective credences.
10.4. The Main Failing of Bayesianism
As noted at the outset of this chapter, I have no quarrel with the application of Bayesian analysis in specific cases. There are many successful and interesting cases that arise when the background facts provide the warrant needed. For example, a physical theory may supply the probabilities as the physical chances; and the particular conditions of the system may provide unambiguous prior probabilities.
My concern is the claim that Bayesian analysis is universally applicable to all systems, where it can supply the one, true logic of inductive inference. Against this, I will argue that there is a boundary beyond which Bayesian analysis fails. The existence of this limit is a corollary of the more general claim that there are no universal rules of inductive inference. For without the boundary, Bayesian analysis would be providing a universal rule of inductive inference. The argument for this general claim was developed in Chapter 2. It is recapitulated here for the special case in which a mathematical calculus is used in one manner or another as a logic of inductive inference. The argument applies to all calculi used this way, not just to the probability calculus.
The key premise in the argument is that a calculus of inductive inference must place limitations on what can happen that is more restrictive than logical consistency.1 Otherwise, the calculus is not part of a system that implements inductive inferences. If the limitation goes beyond logical necessity, then it is by definition a contingent restriction—that is, one that may without logical contradiction be true or false. It follows from this that there will be some conceivable domains that conform with the restriction and some that do not. Whether the calculus in question can be applied in some domain as a logic of inductive inference will depend on whether the requisite facts obtain. When they do, these facts warrant the use of this calculus in this domain.
This consideration applies directly to objective Bayesian approaches, for according to them, the probability calculus is the “logic of science.” Or so proclaims Jaynes (2003) in the title of his treatise, Probablity Theory: The Logic of Science. The consideration also applies to subjective Bayesian approaches. While a probability measure for them mixes opinion and evidential warrant, the probability calculus does constrain someone to conform their beliefs with the evidence. The expectation is that conditionalization on a sufficiently rich body of evidence for some theory will lead a subjective Bayesian to mass the probability almost entirely on the true theory. Since such a body of evidence is finite but the theory’s scope is infinite, this is a form of inductive inference. In this sense, a subjective Bayesian is implementing a scheme of inductive inference, although not as directly as an objective Bayesian.
We have now concluded that contingent facts warrant the applicability of some particular calculus of inductive inference in some domain. We might still hope that a single calculus is warranted universally. This would happen if it turns out that there is a single, contingent warranting fact that obtains in all domains where we might conceivably practice inductive inference. Such a fact was pursued, in effect, in the nineteenth century under the guise of a search for a principle of the uniformity of nature. The search for such a universally applicable principle failed. As described in greater detail in Chapter 2, all candidates either proved empirically false or so hedged as to be vacuous.
The facts prevailing in some domain warrant the inductive logic applicable there. There is no single warranting fact common to them all. It follows that different inductive logics are warranted in different domains. This locality does not preclude one domain from being very large. The success of probabilistic methods suggests that the domain or domains in which they are warranted as a logic of inductive inference are large. However, every such domain is bounded and there are others where a different logic is warranted. The logic warranted in these other domains may be governed by a calculus. But there may well be domains so irregular in their facts that no well-developed calculus can systematize whichever inductive inferences are warranted in them.
10.5. Probabilities without Warrants
The need for some sort of warrant for the use of probabilities becomes quite apparent if we consider cases in which there is no warrant. What results are striking inductive fallacies.
A simple example is provided by Peter Van Inwagen’s (1996, p. 95) question “Why is there anything at all?” He notes that there is only one possible world with nothing at all. There are infinite other possible worlds with something, however, each differing in their configuration of something. Since we are assuming antecedently to have no basis for knowing what there is—if anything at all—we distribute our probabilities roughly uniformly over all possible worlds. The result is that all the probability mass is attracted to the set of worlds in which there is something. It follows that the probability of there being nothing is zero. As Van Inwagen (p. 99) concludes, it is “as improbable as anything can be.”
This conclusion is derived fallaciously. Not even the prodigious powers of the probability calculus can legitimately extract such a strong conclusion from premises so bereft of content. The fallacy derives directly from employing a probabilistic analysis in a context in which no background facts warrant the probabilities. This particular fallacy is unfortunately widespread. It is a version of what I have elsewhere called the “inductive disjunctive fallacy” (Norton 2010, §4).2
The “doomsday argument”3 provides another illustration of a related fallacy. It uses only the evidence that our world has survived for t years. It asks after the probability that our world will end in T > t years—“doom.” Since our t can equally be any of the total T years of our world, the probability that our present world has survived for t years is P(t | T) = 1/T. The quantity we seek is the posterior probability P(T | t), the probability that the world meets its doom after T years, given that it has survived for t years. An application of the ratio form of Bayes’ theorem tells us that4
Substitute T1 = t and T2 = 10t, and we recover P(T1 | t) / P(T2 | t) = 10. This extraordinary conclusion tells us that doom is ten times more likely right now at t than is survival for another ten ages to 10t.
Once again, the analysis delivers too much. The evidence is just that our world has survived t years. This is too thin an evidential basis for the strong conclusions drawn. They are not a reflection of what the evidence authorizes. They are merely artifacts of the use of an unwarranted inductive logic.5 See Norton (2010, §6) for further analysis and for a proposal for a reduced inductive logic more appropriate to the problem.
10.6. Mapping the Boundaries: The Fate of Imprecise Probability
It is an interesting exercise to map out the boundaries for a probabilistic inductive logic. A simple axiom system, such as Andrey Kolmogorov’s (1950) celebrated system, guides us to the boundaries.6 The axioms have us posit the following set of propositions:7
• non-negativity: we assign a non-negative, real-valued probability P(A) to proposition A;
• normalization: we assign a probability of unity P(Ω) = 1 to the universal proposition (tautology) Ω; and
• additivity: P(A ∨ B) = P(A) + P(B) when proposition A and B contradict.
Inductive problems in which each of these fail are well known. This chapter and the next will recount some of them. The “completely neutral support” relation below violates additivity, as do the indeterministic systems of Chapter 15. Normalization and additivity are violated by the infinite lottery. That the structure is a real-valued function is violated by the quantum inductive logic of Chapter 16.
That some boundary has been reached is not controversial. What is controversial, at least in my mind, is how we should respond to it. I believe the correct response is to recognize that we have found the boundaries of probabilistic logic; that we should recognize that different logics prevail in the domains beyond it; and that we should begin the task of identifying them.
The standard response in the literature is different. It is to weaken the probability calculus until the generalized calculus has been weakened enough to encompass whatever troublesome counterexample has arisen. For example, as we shall see shortly, the field of imprecise probability encompasses systems that violate additivity by replacing a single probability measure with a set of them; or it may employ superadditive measures. This is the project of a variety of approaches grouped under the heading of “imprecise probability.”8
These approaches are, ultimately, ill-fated attempts to preserve the core idea that “It’s all probabilities.” The domains covered are ones in which an inherently non-probabilistic inductive logic is warranted. What imprecise probability does is to employ an additive calculus to simulate a non-additive logic. It thus preserves the illusion that an additive measure is somehow still the core of the logic. The better approach would simply have been to recognize that a qualitatively distinct logic is required and to map out its behavior as a distinct logic.
In any case, the general stratagem of extending the calculus offers only temporary respite. For as long as the generalized calculus supports inductive inference, it must place restrictions on the systems that go beyond mere logical consistency. These are contingent constraints that, recalling the argument of Section 10.4 above, will not obtain in all domains. Further investigation will reveal new boundaries and new domains beyond them, as we will see shortly in Section 10.10 below.
The standard response leads to an unhappy ending. Each time a calculus is generalized to embrace new examples, it is weakened in the sense that it becomes less restrictive. As long as the generalized logic places some restrictions on systems beyond logical consistency, the domains in which it applies are limited. The need to generalize to embrace unanticipated counterexamples will continue. The process of generalization can only assuredly terminate when the logic places no restrictions beyond logical consistency on its domain. But then it has ceased to be an inductive logic.
10.7. The Principle of Indifference
To make the foregoing concerns more concrete, it will be helpful to develop the simplest case in which the boundaries of probabilistic analysis are breached. It is “completely neutral support” and arises when we have inductive support that is, in objective terms, maximally uninformative. In subjective terms, it corresponds to the case of complete ignorance. This case has been explored extensively in Norton (2008, 2010) as part of an investigation of the import of the principle of indifference. We will first recall the principle and its application and then develop the notion of completely neutral support in the next section.
The form of the principle that I prefer is the following:
Principle of Indifference. If one has no grounds for distinguishing several outcomes, then we should assign equal inductive support to them.
The principle in this form is a truism of evidence. It reflects the requirement that discriminations in inductive support cannot be made arbitrarily.
The principle applies when we have indistinguishable outcomes. They are realized most securely through invariances, which are transformations that leave the relations of inductive support unchanged. Their use is familiar and unproblematic, initially. When we have a fair coin toss, our formal analysis is unchanged if we switch the labels on the sides of the coin. Whatever grounds we have for favoring heads would remain unchanged if we reassigned the label “heads” to the other side of the coin and similarly reassigned the label “tails.” That is, we would have no grounds for distinguishing the outcome of either side. The principle of indifference requires us to assign equal support to each. If the relations of support are probabilistic, then each side is assigned equal probability.
In the case of a coin toss, the invariance under this permutation of the labels is derived from background physical facts: the mechanical conditions of tossing are such that they favor both sides equally. One can also have a more epistemic version. The coin need not even be tossed. We might just imagine it as sitting somewhere, untouched, in a drawer. But since our information about the coin is so limited, we have no grounds for distinguishing whether it is heads up or tails up.
Invariances like these seem benign until we start to combine them. Then they yield the well-known paradoxes of indifference. An early and well-known example is presented by John Maynard Keynes (1921, chap. 4), who also named the principle of indifference. We ask of a man what country he may be from:
France, Ireland, Great Britain (1)
Since we have no grounds for discriminating among them, we assign equal probability of 1/3 to each. However, the disjunction of two outcomes (Ireland or Great Britain) was equivalent to the British Isles at the time Keynes first wrote. So we might equally ask of the man:
France, British Isles (2)
Since again we have no grounds for discriminating, we assign equal probability of 1/2 to each. We have now arrived at contradictory assignments. For we have assigned both probability 1/3 and probability 1/2 to France as the man’s country.
Examples like these are usually used to impugn the principle of indifference. This is a misdiagnosis. The principle is a truism of evidence and not readily discarded. It just says that the support for outcomes should not differ without a reason. What is overlooked in these efforts to impugn the principle is that the real cause of the trouble lies elsewhere. It is the presumption that relations of inductive support must always be probabilistic. These paradoxes of indifference are an early indication that they need not always be so.
In recent scholarship, there have been several alterative interpretations of the import of the principle of indifference on representing the neutrality of support. Yann Benétreau-Dupin (2015) draws on the existing ideas in imprecise probability and explores representing completely neutral support through sets of probability measures. Benjamin Eva (2019) proposes a novel accommodation in which not all degrees of support are comparable.
10.8. Completely Neutral Support
10.8.1. Invariance under Redescription
The transformation from (1) to (2) in the section above is a “disjunctive coarsening” of the outcome space. Two outcomes are replaced by a single outcome, their disjunction. The reverse transformation is a “disjunctive refinement.” If we conceive of these operations as redescriptions of the outcomes, Keynes’ example depends on a particular invariance:
349Invariance under redescription. In cases of completely neutral support, equality of inductive support over outcomes remains under disjunctive coarsening and refinement.
We arrive at the formal representation of completely neutral support by applying this invariance to an outcome space that has a finite number of mutually exclusive atoms A1, A2, A3, …, An, where an atom is the logically strongest proposition in the outcome space.9 If our circumstance is maximally uninformative concerning these outcomes, then the principle of indifference enjoins us to assign equal support to each. Thus we write:
[A | B] = inductive support accrued to proposition A from B.
We then infer:
where I represents the common inductive strength of support.
We can disjunctively coarsen the outcome space by replacing the first two propositions A1 and A2 by their disjunction, which we will write as A1∨2 = A1 ∨ A2. The new outcome space is A1∨2, A3, …, An and has only n − 1 propositions. Proceeding as did Keynes, we remain maximally uniformed about these propositions, so we must assign equal support to each. That is,
The same strength of support I must be used in both (3) and (4), since they have many common terms. For example, both include [A3 | W], so we can infer
Continuing by forming more disjunctive coarsenings of the original outcome space, it is easy to see that the support offered to any contingent disjunctions of atoms10 is the same strength of support I:
[any contingent disjunction of atoms | W] = I
However, every contingent proposition in the outcome space is equivalent to some disjunction of atoms. Thus we arrive at11
The strengths “1” and “0” have been chosen for continuity with the familiar probabilistic case. Their arithmetic properties are not invoked.
This subsection argues for a unique characterization (5) of completely neutral support. It is tempting to block the argument at equation (4) by asserting that the principle of indifference should only be applied to the most refined outcome space, which is (3) in this case. I set aside the question of whether this knowledge is adequate to block the argument. For either way, the difficulty is easily escaped. It is presumed that we know that (3) is the case of maximum refinement. So let us consider the case in which we do not know that (3) is the maximum refinement; or that we know positively that there is no maximum refinement. Then, the argument for completely neutral support goes through.
How might there be no maximum refinement? Such a case arises if the propositions represent ranges of some real-valued parameter x (e.g., Am might correspond to m ≤ x < m + 1). Then, we can disjunctively refine the outcome space by replacing Am by a disjunction B ∨ C, where B corresponds to m ≤ x < m + 1/3 and C corresponds to m + 1/3 ≤ x < m + 1. Since these intervals can be divided indefinitely, there is no most refined outcome space.12
10.8.2. Invariance under Negation
The invariance under redescription of the last section is already well represented in the literature. There is a second, less familiar invariance that leads to the same result. To see it, consider some proposition A about which you know nothing at all. How well supported is it? Now consider its negation, not-A. Is the negation any more or any less well supported? If there is any doubt, imagine that the first question asked about the proposition not-A, which we initially labeled B. Is not-B any more or less well supported than B?
Invariance under negation asserts that the two strengths of support are the same, simply because by supposition we have no basis for discriminating between them. Switching their labels makes no difference to the strengths of support.
Invariance under negation. In cases of completely neutral support, the inductive support for a contingent13 proposition and its negation are the same.
Let us implement this invariance in the outcome space with atoms A1, A2, A3, …, An. Any contingent proposition consists of a disjunction of some number of atoms from one to n − 1. For example, the negation of A1 is A2 ∨ A3 ∨ … ∨ An; and the negation of A2 ∨ A3 ∨ … ∨ An is A1; and so on for all other possible combinations. If we have a case of completely neutral support, we infer from invariance under negation that
The strengths of support must be distinguished as I1, I1,2, … at this stage, since negation invariance by itself is not strong enough to force all the strengths to the same value. Their equality, however, can be recovered if we add the following condition:
352Monotonicity. The strength of support of a proposition is no greater than14 that of its consequences. If A entails B, then [A | W] ≤ [B | W].
Since A1 entails A1 ∨ A2 and A2 ∨ A3 ∨ … ∨ An is entailed by A3 ∨ A4 ∨ … ∨ An, we have
These two inequalities can only obtain if the two strengths are equal:
Proceeding analogously for all the other cases, we recover the equality of the strengths of support with the single value I for all contingent propositions. The details of the recovery are straightforward but somewhat tedious; they are given in Norton (2008, §6.3).
10.8.3. Invariance from Ignorance or Positive Warrant
The representation of completely neutral support has been developed assuming that the invariances prescribed can come about in some circumstance. It is tempting to invert the argumentation. Since the totality of these invariances is incompatible with a probabilistic treatment of inductive support, might we then infer that it is impossible for us ever to be in a position in which these invariances are realized? I have argued that sufficient ignorance will realize these invariances. However, my real concern here is not ignorance but strengths of support warranted by background facts. We might well wonder what sort of facts could realize these invariances. What sort of coin tosses or die throws or other similar machines could yield probabilities such that P(A1) = P(A2) = … = P(An) = P(A1 ∨ A2) = P(A1 ∨ A2 ∨ A3) = … as required by (5)? Since these probability assignments violate the additivity axiom of the probability calculus, no probabilistic randomizer can realize them, precisely because its mechanism is probabilistic.
This inversion of the argument succeeds only in so far as we are restricted to warrants for support arising from probabilistic randomizers. Chapter 15 on “Indeterministic Physical Systems” describes many physical systems whose indeterminism is not probabilistic and which realizes the two invariances employed above. The inductive logic warranted for these systems conforms with completely neutral support (5). However, where here I have used the three values 0, I, and 1, in the later chapter I relabel these values as imp (“impossible”), poss (“possible”), and nec (“necessary”) to reflect better the physical underpinnings of these new cases.
10.9. Von Mises’ Wine and Water
The argument for completely neutral support derives from either of the two invariances just stated. It is tempting to try to defeat them by calling upon asymmetries in propositions that are not respected by the invariances. For example, while there may be no finest disjunctive refinement, there are some that are more refined and some that are less so. All negations are not the same in their atom counts. The negation of a single atom proposition A1 has a different atom count from the negation of a disjunction of n − 1 atoms, A2 ∨ … ∨ An. These are all asymmetries among the cases that are not reflected in the invariances. We may well ask how the invariances can be maintained with these asymmetries.
The uniform response to all concerns of this type is merely to reduce our knowledge still further, until the symmetries are restored. Then, the invariances apply and completely neutral support is recoverable. The asymmetries depend on choosing a particular outcome space to describe a system’s behavior. A proposition may consist of one atom in one outcome space but a disjunction of n − 1 atoms in another, where both spaces describe the same system. The different atom counts are then immaterial to the invariance if we have no way to discern which of the outcome spaces is the “right one.”
A version of von Mises’ wine and water example illustrates this effect. A goblet contains a mixture of wine and water. All we know is that a ratio of x wine to water lies in the interval 0.5 < x < 2. It follows that the ratio of y = 1/x water to wine lies in the interval 2 > y > 0.5. The variables x and y each allow the definition of outcome spaces that describe the same physical goblet. The first has atoms15
The second has atoms
The principle of indifference requires us to assign uniform support across the atoms:
One might try to adapt probabilities to these equalities, by setting
The adaptation fails since A1 = B2 ∨ B3 so that we end up with a contradiction:
A similar contradiction follows from B1 = A2 ∨ A3.
Figure 10.1 illustrates how the atoms in the two spaces are related.
Figure 10.1. Relations between two outcome spaces for the wine and water example.
Instead of trying to impose probabilities where they do not belong, we can apply the invariances to arrive at the completely neutral support (5). We start with a coarsened outcome space with two outcomes: the ratio of wine to water is either greater or lesser than 1. This space can be represented in two equivalent ways:
Following the principle of indifference, we assign equal support to each outcome:
Implementing invariance under redescription, we disjunctively refine the outcome space and expect the equalities to be preserved. There are two ways to implement the disjunctive refinement. We can refine B1 = A2 ∨ A3 and end up with
Since A2 ∨ A3 = B1 we have:
Or we can refine A1 = B2 ∨ B3 and end up with:
Since B2 ∨ B3 = A1 we have:
In these relations, we have now recovered much of the completely neutral support (5) for the two outcome spaces. Continued application of the invariances recovers the remainder. For example, applying either invariance to the disjunctive coarsening A2, A1∨3 = A1 ∨ A3 returns [A2 | W] = [A1 ∨ A3 | W] = I.
Returning to the concerns expressed at the start of this section, there is no sense in which one of the outcome spaces A1, A2, A3 or B1, B2, B3 is more refined than the other. The first represents a refinement of B1 but not A1; and the second represents a refinement of A1 but not B1. The perfect symmetry in all the formulae gives us no basis for preferring one over the other.
Negation invariance is also implemented automatically. A1 is the negation of B1 and vice versa. They are both assigned the same support I. The earlier concern that negation invariance might be troubled by an asymmetry in atom counts is also not realized. In the A-outcome space, A1 is comprised of one atom and B1 is a disjunction of two atoms. In the B-outcome space, this is reversed: A1 is a disjunction of two atoms and B1 is comprised of one atom. Once again, the perfect symmetry in all the formulae gives us no basis for preferring one over the other.
10.10. Imprecise Probabilities Again
The literature on imprecise probability treats the case of completely neutral support or—as they characterize it often in subjective terms—of complete ignorance. In so far as these treatments seek to replicate formally the behavior of completely neutral support (5), they do the right thing. The invariances show that (5) is the correct representation for the case above. There will be room to quibble about the treatments of imprecise probability, as I will point out below, but these quibbles are minor in comparison to the major concern: namely, that they do not do the right thing if the representation of complete ignorance is intended as part of a case for the universality of the particular scheme employed, now conceived of as some kind of a generalized probability theory. For I have already described above in Section 10.5 how all such efforts are necessarily ill-fated. To the extent that the generalized probability theory places restrictions that go beyond logical consistency, these restrictions are contingent, and it is inevitable that there are systems that contradict these factual restrictions. The generalized probability theory must fail to apply to these systems so that its aspirations for universality must fail.
Consider a popular approach to imprecise probability that employs sets of probability measures to represent credal states. Benétreau-Dupin (2015, §3) has given a careful account of them and their prospects in regard to complete neutrality of support. This approach depends on the assumption that probability measures can be defined for the system in question. Otherwise, sets of the measures cannot be formed. Thus, the approach fails when applied to systems with nonmeasurable outcomes, such as those investigated in Chapter 14 on “Uncountable Problems”; for these outcomes admit no probability measures. The approach also fails when applied to infinite dimensional outcome spaces, for they admit no non-trivial additive measure, even if the requirement of normalization to unity is dropped. We shall see an example of this in Chapter 15 on “Indeterministic Systems” in conjunction with an indeterminism in Newtonian cosmology.
There are lesser technical issues as well. As Benétreau-Dupin (2015, §3) points out, it is unclear just which set of probability measures should represent completely neutral support. The natural choice is this set of all probability measures in the outcome space. However, that set has the unappealing property for Bayesians that it is preserved under conditionalization so that inductive learning is precluded.
In my view, this representation is needlessly complicated, since it assigns no definite probability value to a given outcome. Rather, it assigns all values to each outcome and not just as an interval of values, but with each value part of one of infinitely many probability measures. There is no analog of the simple and adequate ignorance strength I. Worse, a set of probability measures violates invariance by negation. This arises because every probability measure is non-decreasing as we pass through chains of deductive consequences, such as
(Contradiction)
entails (A1)
entails (A1 ∨ A2)
entails (A1 ∨ A2 ∨ A3)
entails …
entails (tautology).
Since the contradiction is assigned zero probability and the tautology unit probability, the probabilities of these outcomes must, at some point, be strictly increasing. That is, this strict increase endows the probability measures with a directedness from fewer to greater atom propositions. The operation of negation maps the strengths assigned to propositions with fewer atoms to those with more atoms and vice versa. That is, the operation flips the assignments of inductive strengths with respect to this direction. It follows that any assignments of strengths that are directed cannot be preserved by negation. Since all measures have such directedness, a set of measures cannot be preserved under negation. The invariance is violated. Norton (2007, §6) explores this failure as a failure of a duality required by the representation of completely neutral support. Benétreau-Dupin (2015, §3) has sketched a dissenting view.
Other popular approaches employ some form of superadditivity of a measure.16 An early version of this is found in Shafer-Dempster belief functions. The “vacuous belief function” (Shafer 1976, p. 22) assigns unity to the tautology and zero to every other proposition, including the contradiction. This vacuous belief function does respect both invariances of Section 10.8 above. However, it has the awkward feature of assigning the same value of zero also to the contradiction. This means that its individual values do not distinguish complete disbelief, which we must have in the contradiction, from complete ignorance, which is presumed for all the contingent propositions.
Peter Walley’s (1991) related approach represents a credence in each outcome by two numbers, a lower and an upper probability. So-called “vacuous upper and lower probabilities” (p. 92) assign a zero lower probability and a unit upper probability to all contingent propositions. “They seem to be,” Walley writes, “the only reasonable models for ‘complete ignorance.’” He notes that this representation accords with appropriate invariance properties: it is invariant under refinements and coarsenings of the outcome space. Walley’s representation, considered in isolation from the rest of his system, is unobjectionable. It is equivalent to the representation of completely neutral support (5). Wherever strength I appears in (5), for example, Walley has the functionally equivalent pair of upper and lower probabilities: 0, 1. Unlike the Shafer-Dempster vacuous belief function, contingent propositions are distinguished from the contradiction in that both lower and upper probabilities of zero are assigned to the contradiction.
While this particular representation of complete ignorance is successful, there are other problems with Walley’s system. Notably, he derives his quantities in the de Finetti tradition as previsions associated with betting scenarios through which some sort of universality of applicability is suggested. My concerns about this betting approach will be addressed below.
More generally, more exotic problems in inductive inference will present continuing challenges to aspirations of universality for all systems of imprecise probability. We will see some in the next chapters. It is not clear now how these systems would accommodate the different sectors of the logic native to an infinite lottery, discussed in Chapter 13. These sectors are divided into finite sets of outcomes, infinite-co-infinite sets of outcomes, and infinite-co-finite sets of outcomes, each with their own distinctive structures. If some form of probabilistic account is to be preserved, the three sectors would appear to need both infinitely small and infinitely large probabilities. Still more serious is the challenge of recovering the logic native to quantum systems, as sketched in Chapter 16 on “Quantum Inductive Inference,” for the basic structure of that logic is not a real-valued function but an operator in a Hilbert space.
As long as theories of imprecise probability implement an inductive logic, they will place contingent constraints on the domains to which they can apply. This means that we should always expect new systems to arise to which their inductive logic does not apply. The cycle of extension and counterexample can continue without end, unless the theory of imprecise logic is so weakened by generalization that it places no factual restriction on the domains to which it applies. However, the theory would then cease to implement an inductive logic. It would implement only the requirement of logical consistency.
10.11. All Proofs of the Necessity of Probabilities Are Circular
One of the more appealing aspects of the Bayesian approach is that its proponents have systematically taken on the burden of demonstrating that their approach is the uniquely correct one. The efforts at proof go back at least as far as the “Dutch book” arguments of Ramsey (1926) and de Finetti (1937) and their expansion by Leonard Savage (1954). Other approaches include the identification of necessary conditions and their consequences through representation theorems, such as developed by Cox (1961) and Jaynes (2003). More recent approaches, such as recounted by Richard Pettigrew (2016), focus directly on the notion of accuracy as measured by scoring rules.
The literature is energetic. Existing approaches are subject to continuing amendment and expansion, and new approaches are offered. Optimists will see this as a proper and ever-improving response to a worthy problem of the first order. My reaction is more pessimistic. The ferment is the inevitable outcome when a literature sets itself an unattainable task. No proposal proves sustainable, but there is always the hope that a new approach might escape the problems that beset the last one.
That the goal is unattainable follows from the material approach to inductive inference. The proofs seek to establish the necessity of probabilities—that is, that objective degrees of inductive support or subjective degrees of belief must be probabilities. It has already been argued in Section 10.4 that this is a contingent proposition. It may be true or false. It is not a necessary truth, demonstrable by pure logic alone.
It then follows that all proofs of the universal necessity of a probabilistic inductive logic, objective or subjective, must be circular. For all such proofs are logical deductions. They start with premises and from them deduce the conclusion sought. It is a basic fact of deductive logic that these premises must be at least as strong logically as the conclusion sought. Since the necessity of probabilities is not an a priori truth of logic, it follows that the premises of any demonstration of the necessity of probabilities must already contain exactly that necessity as contingent propositions, in one form or another.17
From this perspective, there is a simple procedure for undoing all purported proofs of the necessity of probabilities: one merely needs to explore the premises of the proof and uncover the disguised presumption of probabilities. No matter how natural and comfortable the proof’s starting points, no matter how congenial and convincing they may appear initially, the proof will depend on contingent premises that presuppose precisely what is to be proved. One then sees that the proofs are, in the best case, no better than merely positing probabilities in the first place. In the worst case, the premises are logically stronger, so one must assume more than the necessity of probabilities in order to derive the necessity of probabilities. In this case, one is better off positing probabilities directly in the first place.
There is a dominance argument implicit in these last observations. If our interest is to minimize risk of error in an attempt to vindicate probabilities, we are never better off using one of these proofs. That is, directly positing probabilities weakly dominates, in a game theoretic sense, any justification by a proof. The only possible gain from the proof is a psychological one: one might find the premises posited by the proof more intuitively congenial, even if they risk being logically stronger than the goal.
This way of approaching the proofs casts a different light on the activity of the vindicators of probabilities. For nearly a century, their efforts have produced a flourishing literature that never quite produces a final, definitive demonstration. Rather, the community of vindicators finds itself forever dissatisfied with the latest vindication. Sometimes new avenues are explored. Sometimes, the dissatisfaction results in a regress: a quest for further demonstrations that would establish the premises of the most recent demonstration. The regress cannot end well, for each further demonstration faces the same challenge anew: it must find new, contingent premises from which to derive the old ones; and then it will have to justify these new premises. If the dissatisfaction is deep enough, it will seek another approach. This is a regress of reasons that cannot end in the proof sought.
We can now see that the problem is not the result of some maddening inability of the vindicators to find just the right premises for their demonstrations. Rather, it is the inevitable result of the awkward fact that there are no premises truly adequate to the task. The best a vindicator can do is to proceed from an assumption that probabilists will find intuitively appealing, since the assumption is equivalent to or logically stronger than the presumption of probabilities. The same assumption will appear arbitrary and even uncongenial to someone who is antecedently unconvinced of the necessity of probabilities.
The future of the vindication project is easy to predict. Like the circle squarers and angle trisectors of old, the vindicators will be trapped perpetually in the frustrating cycle of promising avenues, proofs that finally seem to succeed, and then the unhappy recognition that the latest proof falls just short. If they persist, it must be so. The escape from the trap lies in the recognition that there is no necessity to probabilities.
Might one worry that this mode of objection is made too easily? If it works, might it not be able refute any demonstration of any proposition in philosophy? This worry is easily set aside, for this mode of objection applies only when a deductive proof is offered for a contingent proposition. Then, it is cogent and should be applied.
For example, consider a theist who offers a deductive demonstration of the necessity of God’s existence, such as Anselm’s ontological argument. What reply can be given by a skeptic who holds the assertion of God’s existence to be a contingent proposition? The skeptic would proceed as I have with the contingency of probabilities. Assuming the steps are valid, the skeptic would look at the premises of the theist’s demonstration and expect to find contingent premises that are logically at least as strong as the necessity of God’s existence. The theist would be untroubled by the display of these contingent premises. They would merely be a reformulation, possibly in a logically stronger form, of what the theist already believes. The skeptic, however, would object that, precisely because of this, the demonstration is no demonstration at all but only assumes what is to be proved.
10.12. Illustrations of Circularity
The last section established as a generality that all vindications of probabilities will prove circular, and it predicted a manifestation of this failure in a regress of reasons. The exercise now is to find the circularities in the standard vindications. The literature on vindications is so large that it is impractical to cover it all in sufficient detail. Therefore, I have chosen to examine a recent, presently popular vindication in greater detail. The scoring rule or accuracy-based vindication is driven by a single, intuitively appealing idea. It suggests that there is a unique way to distribute our beliefs such that we cannot improve their accuracy, whichever circumstance may prove to be the true one. The distribution is probabilistic. Treating the case adequately, however, has required a separate chapter, which follows this one. That chapter illustrates how the presumption of probability resides in the particular choice of the scoring rule used to measures accuracy. The choice must be carefully fine-tuned, else the approach fails to return probabilities. We will then see how efforts to protect the fine-tuning from suggestions of circularity trigger precisely the doomed regress predicted above.
Here, we will take a briefer look at two other attempts to vindicate probabilities, and we will see that each presumes the very thing sought.
10.13. The Dutch Book Argument
10.13.1. The Betting Scenario
The Dutch book argument or arguments, if we separate out various forms of them, derive initially from Ramsey (1926) and de Finetti (1937). They have been a mainstay of the subjective Bayesian approach for decades.18 The argument begins with the assertion that beliefs must be manifested operationally. The method chosen is to offer agents various bets and determine their beliefs from which bets they accept and refuse. The argument then takes on a normative19 burden: if—and only if—the beliefs manifested do not conform with the probability calculus, then it is possible, the argument goes, to offer the agent a combination of bets that results in a sure loss. This combination is the “Dutch book.” Beliefs that allow this sure loss are disparaged as incoherent and reflect the supposed irrationality of non-probabilistic beliefs.
The central structure of the argument is the wager offered. The stake S is the sum of money or some other valuable of a similar type associated with the bet. The distribution is decided by the presently unknown truth or falsity of a proposition A. In a bet “on” A, the agent pays a price qS for the possibility of gaining S > 0 if A turns out to be true. Otherwise, if A is false, the agent simply loses the price. In sum:
Table 10.1. Payoffs of a bet “on” A (S > 0) and “against” A (S < 0).
Proposition A is true | Agent gains S − qS |
Proposition A is false | Agent gains − qS |
This arrangement is reversed for a bet “against” A. It is most simply implemented by selecting a negative S and using the same payoffs as in Table 10.1. The full analysis requires an additional assumption to which we will return shortly:
Existence of a fair bet. For any proposition A, for each agent, there is a “fair” betting quotient q such that the agent is willing to accept either side of the bet: “on” A or “against” A. This betting quotient measures the agent’s strength of belief in A.
The main result is that failing to conform the betting quotients to the axioms of the probability calculus allows a Dutch book to be made against the agent (Dutch book theorem); and that conforming the betting quotients to the axioms makes it impossible for the Dutch book to be made (converse Dutch book theorem). A simple illustration does not even require the notion of a fair bet. Avoidance of sure loss immediately precludes q > 1. For if q > 1, a bet on any proposition A leads to a loss of S(1 − q) < 0 if A turns out to be true; and a loss of −qS < 0 if A turns out to be false.
10.13.2. The Dubious Presumption
How does this construction presume probabilities? The principal, tendentious presumption is laid out in plain sight at the very start. In requiring agents always to express their beliefs in terms of monetary bets—accepted or refused—it forces agents to represent their beliefs on a single numerical scale. The betting quotient q has to be a real number, else the payoffs S − qS and −qS cannot be formed. Since there is so much more detail to come, it is easy to treat this presumption as an unimportant preliminary and to skip past it. This is a mistake if one wants to understand which are the strongest assumptions underlying the Dutch book argument. Once proponents of the argument can get us to accept that beliefs are measurable on a real number scale, most of their work is done.
The arguments for this presumption are weak in relation to the strength of what it asserts. De Finetti (1937, p. 139) pretends the assumption is innocuous. It is, he says, “the trivial and obvious idea that the degree of probability attributed by an individual to a given event is revealed by the conditions under which he would be disposed to bet on that event.” Ramsey (1927, p. 166), however, recognized that the idea is not nearly so innocent: “It is a common view,” he conceded, “that belief and other psychological variables are not measurable.” He recognized that something stronger is needed, and he asserted that without some measurement protocol meaninglessness threatens: “degree of a belief is just like a time interval; it has no precise meaning unless we specify more exactly how it is to be measured” (p. 167). Here, Ramsey echoes operationist sentiments of growing popularity in the 1920s. Bridgman (1927) was then writing his manifesto of operationism. It used special relativity, including its treatment of time, as a motivating example (chap. 1). Perhaps Ramsey’s remark on time alluded to this example. At the same time in psychology, behaviorists were urging the elimination of invisible thoughts and ideas in favor of observable behaviors.
Nearly a century later, operationism and behaviorism have long since fallen from favor. They proved unable to deliver accounts that matched the richness of complex physical theories and mental content. The deepest problem with operationism was its core assertion. It is, in Bridgman’s words: “In general, we mean by any concept nothing more than a set of operations; the concept is synonymous with the corresponding set of operations” (1927, p. 5; emphasis in original). This is a false assertion. Concepts are not synonymous with the operations that measure them. Time is not the ticking of a clock; or length the laying out of a ruler; or mass the extension of a spring in a weighing scale; or electric current the deflection of a needle in an ammeter. Correspondingly, belief is not the behavior of accepting or refusing bets. To reiterate a widespread objection to the Dutch book approach: beliefs have cognitive goals concerning learning the truth; betting behaviors have pragmatic goals of maximizing one’s fortune.20 Supposing otherwise risks oversimplifications comparable to those that doomed operationist analyses elsewhere. Here, we might imagine the (possibly fictional) enlightened Buddhist who has no material desires. Such a figure would, under these operationist strictures, be incapable of holding beliefs.
While concepts are not operations, there is still some value in asking how something might be measured as long as we do not infer too hastily to meaninglessness if the operations prove elusive. What operations might measure strength of belief? Here we face the awkward realization that only one operation has been proposed: measurement through monetary bets accepted or refused. Why must we accept just this? Why is money the measure of belief?
The answer is that nothing forces the acceptance in general. However, there are quite specific circumstances in which we are forced to make money the measure of belief. The most obvious case arises if we are wagering in a casino or racetrack. Other cases arise if we are buying or selling insurance. We must determine what premium is appropriate as insurance against some uncertain, future harm whose gravity will be measured monetarily.21 In the financial futures market, one can buy a contract that allows purchase of some asset at a fixed price at some later date. For example, an airline, fearing an increase in jet fuel prices, might buy a contract that enables the purchase of jet fuel at present prices but at a later date. Whether the later purchase will be made depends on the unknown of whether jet fuel prices will rise or fall. Thus, pricing the contract requires the same sort of judgments over uncertainties as insurance and wagering.
Something close to a fair bet is also realized in these circumstances. In casinos and racetracks, every wager bought is sold by someone. In insurance, every policy bought by someone is sold by someone else. In the futures market, every contract bought by one trader is sold by another. These transactions would constitute fair bets if we neglect a small spread between the buying and selling prices and the house’s small margin in casino gambling.
Viewed materially, if we are in any of these circumstances, then pragmatic goals will force us to reason inductively as the framework of the Dutch book argument requires. The facts of the circumstances warrant the resulting logic. It provides a nice illustration of how the material theory of induction is applied. When we move to other circumstances, however—when facts of this type are missing—nothing warrants an inductive logic in which strengths of support must conform to defensive gambling strategies. Viewed materially, Dutch book argumentation fails to establish the universal rationality of probabilistic inference precisely because the factual presumptions of the Dutch book scenarios do not hold universally.
Alan Hájek (2008) has proposed an amusing device that we can use to underscore the dependence of the logic on the background conditions. If an agent has incoherent betting quotients, a benevolent bookie can offer the agent a combination of bets that assures a gain, a “Czech book.” For example, if the agent’s betting quotient q is greater than one for some proposition A, the bookie could offer the agent a bet against A. Since S < 0, the agent would make a gain of S(1 − q) > 0 if A turned out to be true; and a loss of −qS > 0 if A turned out to be false. That is, if one found oneself in the clutches of a benevolent bookie, coherent betting quotients would be the only thing preventing the benevolent bookie providing you with an assured gain!
10.13.3. The Rationality of Refusing to Bet
A sharper expression of the coercive presumption of the betting scenario is provided by a common response to it: it can be an expression of rationality for agents simply to refuse to bet. In the abstract, this refusal may seem like a crafty evasion. That it need not be is easier to see if we consider how a bookie might seek to force a Dutch book on someone whose beliefs conform with completely neutral support (5). Consider the case of three mutually exclusive outcomes, A1, A2, and A3, such as those of the wine and water problem. An agent whose credences coform with (5) would judge all of the following to be equally supported:
Assume per impossibile that there is some bet on A1 that is acceptable to the agent as fair. Since the agent regards not-A1 as equally supported, the agent will accept as fair a bet on not-A1 with the same payoffs. Since a bet “on” A1 is just the same as a bet “against” not-A1, one can see that the two net payoffs S(1 − q) and −qS must be numerically equal but different in sign.22 For the sake of simplicity, assume that the bet “on” A1 pays a net of 1 if A1 is true, and −1 if A1 is false. The agent will judge similar bets fair for A2 and A3.
The three bets on A1, A2, and A3 combined form the Dutch book in Table 10.2:
Table 10.2. Dutch book for an agent with beliefs conforming with completely neutral support.
Bet on A1 pays: | Bet on A2 pays: | Bet on A3 pays: | Net payoff | |
---|---|---|---|---|
A1 is true | +1 | −1 | −1 | −1 |
A2 is true | −1 | +1 | −1 | −1 |
A3 is true | −1 | −1 | +1 | −1 |
What we cannot conclude from this Dutch book is that the assignments of support by the agent are irrational. They were determined as the only assignments compatible with the invariances of the system in question. If this Dutch book impugns the rationality of these assignments, then all we can conclude is that no rational treatment of systems like von Mises’ wine and water is possible.
The obvious alternative is to recognize that someone who harbors assignments of support like those of (5) should not accept bets in accordance with the rules specified in the Dutch book gambling scenario. For such an agent’s credences are in conflict with the assumptions of the scenario. The irrationality lies not in the assignment of beliefs but in the indiscriminate acceptance of bets devised using those rules. Here, I concur fully with the assessment of Bacchus et al. (1990, pp. 504–05) who argue “that an agent ought not to accept a set of wagers according to which she loses come what may, if she would prefer not to lose, is a matter of deductive logic and not of propriety of belief.”
10.13.4. Circularity in the Notion of a Fair Bet
Consider again a key assumption in the Dutch book argument: for any proposition A, an agent can find a fair bet with payoffs comprising those of Table 10.1; and the associated betting quotient q is the strength of belief in A. This may seem like a benign preliminary before the real work of assembling a Dutch book begins, but it is not. That assumption in effect already has the axioms of the probability calculus built into it, and an excursion in repeated betting shows it.
Consider a set of atomic propositions A1, A2, …, An and their Boolean combinations over which an agent distributes belief. Imagine that there are repeated scenarios in which there is a similar set of propositions over which the agent distributes the same beliefs. Call the corresponding propositions of the form A1 in each scenario “like propositions”; and so on for the remaining A2, …, An.
The obvious example is provided by the two propositions that a tossed coin shows heads (A1) or that it shows tails (A2). The repeated scenarios are then just independent tossing of many coins. For another example, we might consider the propositions that someone named in a telephone directory was born on Monday (A1), or born on Tuesday (A2), or born on some Boolean combination of days, such as (not-Monday and not-Friday) = (not-A1 & not-A5.) We create scenarios with identical beliefs over like propositions by scanning down a list of names in a telephone directory and asking for the birthday of each person named.
Since the agent has the same belief in the truth of each of the like propositions in the corresponding sets, the agent can execute the same bet on each like proposition. That is, the agent’s betting quotient for proposition Ai in each scenario is the same value qi for what the agent judges to be a fair bet with the same fixed stake Si in each case.23 Assuming that there are Wi wins and N − Wi losses among N bets, we find the following for the bets on proposition Ai:
From this, we compute the average payoff per wager in terms of the frequency ri = Wi/N with which the propositions turn out to be true:
To proceed we need to separate two cases. The frequencies ri may or may not stabilize to definite limiting values as N grows indefinitely large. In the first case, we can define the limiting frequency as
It would be natural to identify the limiting frequency pi with the probability of truth among the propositions Ai; for, if there is such a probability, the law of large numbers assures us that, with probability one, it will be revealed as this limit. However, to arrive at the results that interest us, we do not need to do this. We can simply treat the pi as parameters that have the specific property of importance here. Since they are derived from relative frequencies, they conform with the axioms of the probability calculus. That is, they are non-negative, additive for mutually exclusive outcomes, and normalize to unity. For example, the limiting frequencies of truth pi among the repetitions of the atomic propositions Ai always sum to unity:
We can now see that the following two propositions are equivalent where the same set of betting quotients qi is indicated in each proposition:
(a) There are fair betting quotients qi such that the agent fares equally well by making all the bets over Ai “on” bets with Si > 0; or by making all the bets over Ai “against” bets with Si < 0.
(b) There are betting quotients qi that equal the limiting frequency of truth pi among the propositions Ai (so that these betting quotients conform with the axioms of the probability calculus).
To infer from (a) to (b), note that “fares equally well” means that “on” and “against” betting yields the same results concerning payoffs. It follows that the limiting average payoff must be unaltered when we merely change the sign of Si from positive to negative, where
If we interpret the parameters pi as probabilities, this limiting average payoff is just the expected payoff per bet. Now, the limiting average payoff is linear in Si. So it can only remain unchanged under an alternation of the sign of Si if it is zero. That is, (pi − qi)S i = 0. It follows immediately that pi = qi. In other words, we have inferred (b). The reverse inference from (b) to (a) follows by taking the steps of the inference in reverse order: pi = qi entails that both total and average payoffs are zero so that bets “on” and “against” are equally attractive.
In the second case, there is no stable limit to the frequencies ri as N grows indefinitely large. This is an uncommon case, but it can occur. We shall see, for example, that it occurs for outcomes of draws from an infinite lottery in Chapter 13. It is the case that is unfavorable to probabilities and thus we might not expect that the assumption of the existence of fair betting quotients might still drive the quotients toward conformity with the axioms of the probability calculus. However, they still do so in the following sense.
Since the frequencies ri have no limiting value, there is no unique value for them unless we specify the specific number of repetitions N. Once this is specified, fairness of the bet on proposition Ai is implemented if the agent can pick a betting quotient qi that matches the actual frequency of truth ri among the set of like propositions Ai in the N repetitions. For then the average payoffs are the same for both “on” and “against” bets. Any other value of qi will favor either the bets “on” or “against” the like propositions according to whether qi > ri or qi < ri.
As the number of repetitions varies, the particular target set of frequencies of truth ri will vary. But what will not vary is that the target set for the betting quotients qi will always be a set of frequencies. Frequencies obey the axioms of the probability calculus but with the added restriction that they are rational number valued. Thus we have weaker analogs of the equivalent propositions (a) and (b) for the same set of betting quotients qi in each proposition:
(a’) For some fixed set of repetitions N, there are fair betting quotients qi such the agent fares equally well by making all the bets over Ai “on” bets with Si > 0; or by making all the bets over Ai “against” bets with Si < 0.
(b’) For some fixed set of repetitions N, there are betting quotients qi that equal the frequency of truth ri among the propositions Ai (so that these betting quotients conform with the axioms of the probability calculus).
The proof of the equivalence of (a’) and (b’) is analogous.
In sum, for both cases, the assumption that there are fair betting quotients in the context of repeated betting scenarios is equivalent to assuming that the betting quotients behave like frequencies—that is, that they conform with the axioms of the probability calculus. Thus, one should not think that the assumption of fair betting quotients is an innocent background assumption. It does not merely provide a context in which Dutch book argumentation can prove that credences must conform with the axioms of the probability calculus. Rather, conformity with those axioms is already tacitly presumed by them. All the Dutch book argumentation does is to make that conformity visible.
This outcome may be untroubling to someone who already believes that credences must be probabilistic. Why be troubled by a demonstration that just clarifies the probabilist’s commitments? If, however, you are someone like me who does not believe that credences must be probabilistic, you will find this result damning. What was supposed to be a demonstration of the incoherence of non-probabilistic beliefs turns out to be an exercise in circularity. Probabilities are demonstrable simply because they were introduced covertly in an assumption of the argument at the outset.
10.13.5. The Regresses Begin
The prediction of the general analyses above is that recognition of weaknesses in an attempted proof of probabilities leads to a regress. One form is a successive weakening of what is sought to be proved. This form of regress is well underway for Dutch book arguments; for it has been long recognized in the literature that the assumption of fairness is arbitrary and can be discarded without compromise to the rationality of the enterprise. This recognition is at least a half century old, extending as far back as Cedric Smith (1961). It is the basis of the analysis of Walley’s (1991) treatise Statistical Reasoning with Imprecise Probabilities.
Following Walley (1991, p. 28), it may be quite prudent for an agent to refuse to admit any bet over some proposition A as fair. Rather, the agent may be willing to accept a bet “on” A with a maximum betting quotient of qlower and be willing to accept a bet “against” A with a minimum betting quotient of qupper. If the two are equal, then they comprise a fair bet over A. If qlower < qupper , then the agent is more cautious in the agent’s betting behavior. No Dutch book can be made against such an agent. The agent’s belief in A is no longer a single probability but an interval bounded by a lower probability equal to qlower and an upper probability equal to qupper.
That betting quotients qlower < qupper betoken caution becomes most evident in the extreme case in which qlower = 0 and qupper = 1. In this extreme case, the agent is willing only to accept individual bets for which no loss is possible.24 The interval of probabilities is maximally large, bounded by 0 and 1. For this reason, Walley (1991, p. 66) associates this state with vacuity or maximum ignorance.
To support this discarding of the necessary existence of fair bets, Walley decries what he calls “the Bayesian dogma of precision”—“that uncertainty should always be measured by a single (additive) probability measure.” He writes:
For example, de Finetti assumes that for each event of interest, there is some betting rate that you regard as fair, in the sense that you are willing to accept either side of a bet on the event at that rate. This fair betting rate is your personal probability for the event. More generally, we take your lower probability to be the maximum rate at which you are prepared to bet on the event, and your upper probability to be the minimum rate at which you are prepared to bet against the event. It is not irrational for you to assess an upper probability that is strictly greater than your lower probability. Indeed, you ought to do so when you have little information on which to base your assessments. In that case we say that your beliefs about the event are indeterminate, and that (for you) the event has imprecise probability.
This work is motivated by the ideas that the dogma of precision is mistaken, and that imprecise probabilities are needed in statistical reasoning and decision. (1991, p. 3)
There are two ways to understand the import of this relaxation of the conditions of the betting scenarios. The correct way, in my view, is merely to regard the various betting scenarios envisaged as circumstances that may or may not arise in different domains. There is no necessity for their implementation everywhere. All we can say is that if an agent is in a circumstance in which the assumptions of the scenario are realized, then reasoning inductively according to the prescribed system is their best course. Such circumstances arise, as noted above, in the insurance and futures markets. Indeed, the slight spread between the buying and selling prices in both cases suggests that Walley’s imprecise logic is the appropriate one.
The incorrect way to understand the import of this relaxation is to think of it as a successful purging from the Dutch book analysis of an unwarranted element—the necessary existence of a fair bet—so that the analysis that remains is universally applicable. This would just replace the dogma of precision by the dogma of imprecision. For there is no necessity in the presumption of strict upper and lower limits on the betting quotients or even that having beliefs requires their operational manifestation in betting behavior. With this understanding, we have taken the first step in the regressive weakening of what is sought to be proved, as described in Sections 10.5 and 10.10 above.
Another sort of regress arises when we retain what is sought to be proved, but seek to strengthen the grounds used in the proof. This is how I see Savage’s (1954) decision theoretic proof of probabilities. Like the Dutch book argument, it seeks to infer from an agent’s preferences to the beliefs that must conform with them and thereby show them necessarily to be probabilistic. Savage acknowledges (p. 4) inspiration from de Finetti’s (1937) work. While de Finetti simply posits certain betting behaviors, and his posits are—as we saw above—quite susceptible to challenge, I read Savage’s analysis as an attempt to provide a more secure grounding for this approach.
Savage’s full theory is based on seven postulates. Since they entail the same contingent result that beliefs are probabilities, they must contain that contingency in one form or another. Once again, careful scrutiny should reveal its presence. Because of the complexity of Savage’s system, detailed analysis is precluded here. However, we can discern the direction of the analysis by considering just the first postulate. It asserts, in effect, (p. 18) that the relation of preference over acts is a total order, which means that it is antisymmetric and transitive.
Consider the transitivity of preference. Accordingly, if you strictly prefer A to B and B to C, then you must prefer A to C. (Antisymmetry precludes you also strictly preferring C to A.) This form of transitivity is important in the system. It provides an order that, when filtered through the other postulates, orders strengths of beliefs and eventually enables them to be real valued. Savage provides no argument to preclude intransitivity when the postulate is introduced. He merely announces that “the definition of preference suggests” it (p. 18). If you are antecedently disposed towards probabilities, it is quite easy to accept the suggestion and let the reasoning lead you to the result you expect. However, if you are not so disposed, you will have seen no good reason in the account that precludes intransitive preferences. Say I prefer eating apple pie to cherry pie and cherry pie to apricot pie. Aside from an unsupported declaration in the definition of preference or in notions of rationality, nothing precludes me from preferring apricot pie to apple pie. But that would be an intransitive set of preferences.
It was soon recognized that more was needed if the prohibition on intransitivity was to be sustained. That is, the regress of reasons continued. The instrument that sustains it came to be known as the “money pump” argument, which appears in the literature as early as Davidson et al. (1955, pp. 145–46). Assume an agent who harbors intransitive preferences, captured compactly by the obvious notation: A > B, B > C, C > A. Presumably, the agent could be induced to trade a C to gain a B, while paying some small price, such as $1; and a B to gain an A, for $1; and an A to gain a C for $1. The net effect is that the agent has paid $3 to be returned to the original C. This, it is supposed, makes the intransitive preferences “irrational.”
Once again, we have an argument that can only be convincing to someone who already believes that there is some irrationality in intransitive preferences. Someone who does not believe this will have no trouble seeing that the irrationality is not in the intransitivity of the preferences. Rather, it lies in the agent engaging in free trading with a second commodity (money) over which the agent’s preferences are transitive. That trading behavior is dangerous and should be avoided is all the money pump argument shows. Patrick Maher puts it well:
This is such a simple and vivid argument that it is a pity it is fallacious. But fallacious it is. The fallacy lies in a careless analysis of sequential choice. The argument assumes that someone with intransitive preferences will make each choice without any thought about what future options will be available, yet this is not in general a rational way to proceed. (1993, p. 36)
10.14. Necessary Conditions
To further illustrate the inevitable failure of the proofs of necessity of probabilities described in Section 10.11, consider the approach taken by Cox (1961) and Jaynes (2003). The general approach is both elegant and appealing. Necessary conditions are laid down for a structure called “i | h” (Cox) or “A | B” (Jaynes), which represents the strength of support of the first (i or A) afforded by the second (h or B). From them, by some simple but powerful functional analysis, the computational rules of the probability calculus are derived.
Precisely because these computational rules can be derived, the assumptions used must be at least as logically strong as them. Since the conclusion is contingent, so are the assumptions. Any hope that the assumptions might somehow be self-evident will fail under scrutiny. Sustaining the proof will then trigger a regress of reasons, each of which fails in the sense that the new reasons themselves need further support. We shall see this regress begin with Cox first positing the necessary conditions with short justifications. The inadequacy of the justifications becomes clear. Jaynes then intervenes and provides a stronger justification; and then sometimes when that stronger justification proves inadequate, yet another is provided, but still without arriving at a satisfactory end point.
There are three necessities: that the strengths are real values and what Jaynes calls the sum and product rules. We shall look at each in turn.
First, Cox (1961, p. 1) introduces the idea that the strengths are real valued with some rather casual remarks about their measurability. As an analogy, he mentions the measurability of the pitch of a stairway.25 Jaynes is rightly not satisfied with such a casual development. He makes the requirement explicit as his first desideratum: “(I) Degrees of plausibility are represented by real numbers” (p. 17). Jaynes first seeks to establish that it is satisfied by means of his parable of a robot (pp. 8–9) who will compute with the degrees. He then asserts that “desideratum (I) is practically forced on us by the requirement that the robot’s brain must operate by the carrying out of some definite physical process” (p. 17). Of course, this is incorrect. A robot can represent and compute with all sorts of magnitudes and relational structures. To presume otherwise suggests willful ignorance, if someone has a minimal understanding of computers. To establish that the magnitudes treated are real numbers, we must assume quite an extensive list of specific properties, including a transitive order (“greater than”) and universal comparability under this order of all the magnitudes.
Tribus (1969, chap. 1) developed a similar account that included the parable of the robot. He reports (p. 6) drawing on Cox (1961) and unpublished course notes by Jaynes. He remarks a few pages later:
The only general way in which objects may be compared with one another is to assign to the objects a real number. The real number system provides the only scale of universal comparability. (1969, p. 13)
It is easy to see that one might let this pass if one already believes that the strengths of support must be probabilities. Otherwise, it is baffling that such a claim could be made.
The regress of reasons continues. Jaynes presumably recognized the weakness of the robotic justification and included an Appendix (pp. 656–59) designed specifically to strengthen it. He noted that if an order on the strengths is transitive and universal, then, in the case of a finite outcome space, real-valued degrees can be adapted to it. He proceeded to argue rather ineffectively for both transitivity and universality. Counterexamples to transitivity can be readily constructed, as in Norton (2007a, pp. 149–50). Keynes (1921, chap. 3) long ago realized that we must take seriously the possibility of incomparable degrees. More troublesome is that transitivity and universal comparability are insufficient to assure that the strengths can be fully represented by real numbers.
Cox’s second necessity is expressed as “the probability of an inference on given evidence determines the probability of its contradictory on the same evidence” (p. 3). Cox’s justification is brief. He gives a few simple examples (p. 2) and announces that “in this all schools can agree.” Jaynes’ treatment is similarly hasty and incomplete. He declares: “The plausibility that A is false must depend in some way on the plausibility that it is true” (p. 30). He proceeds immediately to conclude the much stronger result that there must be a functional relation of dependence between A | B and not-A | B and even that “common sense requires [the function] to be a continuous monotonic decreasing function.”
Once again, all these suppositions can pass without objection if one already has the goal of additivity of strengths in mind. That is, one might imagine that these necessities are simply reduced descriptions of the rule in the probability calculus that P(A | B) + P(not-A | B) = 1. If one is not antecedently committed to this rule or something like it, these necessities will appear as unfounded stipulations. One need only consider superadditive measures to find all the conditions laid down by Cox and Jaynes violated.
Cox’s third necessity is the following:
The probability on given evidence that both of two inferences are true is determined by their separate probabilities, one on the given evidence, the other on this evidence with the additional assumption that the first inference is true. (p. 4)
The content of this necessity is more easily grasped if we give it in symbolic form, as does Jaynes (p. 25). The support for the conjunction of A and B on the evidence C, (AB | C) is some function F of two other strengths (B | C) and (A | BC):
To someone remote from probability theory, this functional stipulation will appear quite arbitrary. To probabilists, it immediately calls to mind the product rule for forming conjunctions:
So, for them, it can pass as reasonable and even natural. Both Cox and Jaynes seek to establish this functional dependence by recalling informal sequences of inferences. If we are to infer to (A and B) from C, we might first establish from the truth of C that B is true. Then we would establish from the truth of (C and B) that A is truth and so also that the conjunction (A and B) is true. (For later reference, represent this as “C→ B→ A→ AB.”) This sequence is one way that we might proceed deductively. Cox and Jaynes then declare that the functional dependence (6) follows since it mimics the same order of steps.
The inference is quite dubious. Indeed, one of the lessons of twentieth-century philosophy of science was that transferring properties of deductive inference over to inductive inference regularly produces incorrect rules. For example, if C deductively entails each of A and B separately, then C also deductively entails their conjunction. However, the corresponding rule for induction fails. C may strongly support each of A and B separately, but actually refute their conjunction.
Presumably, because they recognize the inadequacy of the arguments for (6), Jaynes and Tribus (1969, chap. 1) embarked on a more elaborate demonstration. Its basic supposition is that (AB | C) must be a function of some or all of the following four strengths only:
They then argue that the only possibility is (6) or its equivalent form under relabeling, (AB | C) = F[(A | C) , (B | AC)]. Locally the argumentation is quite cogent. For example, (AB | C) cannot depend functionally on just (A | C) and (B | C). For each of A and B may be strongly confirmed by C, but C may either confirm or even refute (A and B).
However, the next attempt to buttress the functional dependence of (6) fails. For the assumptions are still far too strong, and they are likely only unobjectionable if one already accepts the final result. The lacunae are both narrow and broad. In a narrow sense, consider the details of the functional dependencies. They infer that (AB | C) can depend functionally on (A | C) and (B | AC); or that it can depend functionally on (B | C) or (A | BC). These dependencies are analogous to the two deductive pathways “C→ B→ A→ AB” and “C→ A→ B→ AB.” Since either individually suffices in the deductive case, Jaynes seems to presume that either will also suffice in the inductive case. This certainly does not follow, since the analogies between deduction and induction are fragile. They have not ruled out the case that both inductive pathways must enter into the functional dependence, which would mean that (AB | C) is still a function of all four strengths listed.26
Taking a broader, synoptic view, the most obvious lacuna is the assumption that (AB | C) must be a function of the four strengths listed. They might be related, but must the relationship be functional? Might there not be a more complicated relationship? Perhaps one that involves some auxiliary quantity where the strengths (A | C) are derived from them? Or might there simply be no definite relation at all? This last possibility would then mimic the situation with superadditive measures. These measures decouple the values of (A | C) and (not-A | C), so that there is no functional relation between them. Each value of (A | C) may be compatible with many (not-A | C) and vice versa.
10.15. Conclusion
The approach taken in this chapter has pursued the two lines of criticism indicated. But there are more grounds for hesitation over probabilities than those covered in this chapter. In (Norton 2011), I review many of these further grounds. Perhaps the best known and most intractable of these problems is the problem of the priors. It arises from the need for a Bayesian analysis always to provide some prior probability, P(H | B), antecedent to the consideration of evidence. The very fact that priors must be provided in this way introduces an arbitrariness into the analysis that has been the bane of all forms of Bayesianism. Objective Bayesians try to find good reasons for picking a particular prior. Jaynes’ ill-chosen maximum entropy principle is an example of this.27 Subjective Bayesians try to avoid the problem by demoting the prior probability to mere opinion, which can be freely chosen. Theirs has proven to be a poor bargain, since once one allows opinion to be mingled with evidential warrant, they prove virtually impossible to separate.
The necessity for prior probabilities is a form of incompleteness of the inductive logic: the priors always supply inductive content that is beyond the reach of the evidence to be considered subsequently. One might imagine that it is a problem peculiar to the probability calculus so that the best escape is to find another calculus free of the problem. In recent work, I have shown that this escape fails. The sort of incompleteness that troubles the probability calculus must arise in a large class of calculi of induction, which would include all those we would reasonably entertain. An informal development of this result is provided in Chapter 12.
What I have sought to establish in this chapter is that the probability calculus does not supply a universally applicable logic of inductive inference. The emphasis here is on universal applicability. I do not doubt the utility of Bayesian analysis in specific domains in which background facts positively warrant it. My hope is that Bayesians can relinquish the tacit commitment to the idea that “It’s all probabilities” and to the notion that this idea solves the foundational problems of inductive inference; for then we will be able to address these foundational problems anew and, it is to be hoped, find better solutions. Readers, of course, will know that I offer the material theory of induction as my solution to the foundational problem of the nature of inductive inference.
If we are loosening tacit Bayesian commitments, there is a second one that can be relaxed profitably. The general view seems to be that the probability calculus must be accepted or rejected as a whole. Against this, I have argued that we can be more selective. In Norton (2007a), I provided an axiomatization of the probability calculus using familiar techniques. Its novelty was that it was designed explicitly to identify qualitative properties of support relations that could be employed selectively. The most important result was that there was shown to be two components in these qualitative properties, that the two could be readily separated, and that they could be deployed individually as circumstances demanded.
The first is a property I have called “addition,” which captures the additivity of the calculus. It resides in a reciprocal relation between the support accorded to a proposition and its negation. Addition is an appropriate property when degrees of support span from positive to negative. It should be dropped, however, if neutral support is to be represented.
The second property, “Bayes property,” provides the probability calculus with the updating dynamics characteristic of Bayesian analysis. It depends on a particular mode of updating in which the import of evidence is simply to refute disjunctive parts of the hypothesis that are logically incompatible with the evidence, and then to redistribute support uniformly.
Many of the successes of Bayesian analysis can be traced back to these properties. Since conditions may favor the use of one but not the other, their utility can only be increased if we decide to employ them separately, for then they can be used more widely. For example, the completely neutral support described in this chapter contradicts additivity, but it is compatible with the Bayes property. Thus, an extension of the theory of completely neutral support will permit updating by a Bayesian-style dynamics.
References
Bacchus, Fahiem, Henry E. Kyburg Jr., Henry E. Kyburg, and Miriam Thalos. 1990. “Against Conditionalization.” Synthese 85: pp. 475–506.
Benétreau-Dupin, Yann. 2015. “The Bayesian Who Knew Too Much.” Synthese 192: pp. 1527–42.
Bostrom, Nick. 2002. Anthropic Bias: Observation Selection Effects in Science and Philosophy. New York: Routledge.
Bradley, Seamus. 2016. “Imprecise Probabilities.” The Stanford Encyclopedia of Philosophy. Winter 2016 Edition. Edited by Edward N. Zalta, https://plato.stanford.edu/archives/win2016/entries/imprecise-probabilities/.
Bridgman, Percy W. 1927. The Logic of Modern Physics. New York: MacMillan.
Cox, Richard T. 1961. The Algebra of Probable Inference. Baltimore: The Johns Hopkins University Press.
De Finetti, Bruno. 1937. “Foresight: Its Logical Laws, Its Subjective Sources.” In Breakthroughs in Statistics. Vol 1, Foundations and Basic Theory, edited by S. Kotz and N. L. Johnson, pp. 134–74. New York: Springer Verlag, 1992.
Davidson, Donald, J. C. C. McKinsey, and Patrick Suppes. 1955. “Outlines of a Formal Theory of Value, I.” Philosophy of Science 22: pp. 140–60.
Eva, Benjamin. 2019. “Principles of Indifference.” The Journal of Philosophy 116: pp. 390–411.
Hájek, Alan. 2008. “Arguments for–or against–Probabilism?” British Journal for the Philosophy of Science 59(4), pp. 793–819.
———. 2009. “Dutch Book Arguments.” In The Handbook of Rational and Social Choice: an Overview of New Foundations and Applications, edited by Paul Anand, Prasanta K. Pattanaik, and Clemens Puppe, pp. 173–95. Oxford: Oxford University Press.
Jaynes, Edwin T. 2003. Probability Theory: The Logic of Science. Cambridge: Cambridge University Press.
Keynes, John Maynard. (1921) 1979. A Treatise of Probability. London: Macmillan. Reprint, New York: AMS.
Kolmogorov, Andrey. 1950. Foundations of the Theory of Probability. New York: Chelsea Publishing Company.
Maher, Patrick. 1993. Betting on Theories. Cambridge: Cambridge University Press.
Norton John D. 2007. “Disbelief and the Dual of Belief.” International Studies in the Philosophy of Science 21: pp. 231–52.
———. 2007a. “Probability Disassembled.” British Journal for the Philosophy of Science 58: pp. 141–71.
———. 2008. “Ignorance and Indifference,” Philosophy of Science 75: pp. 45–68.
———. 2010. “Cosmic Confusions: Not Supporting Versus Supporting Not,”Philosophy of Science 77: pp. 501–23.
———. 2010a. “Deductively Definable Logics of Induction.” Journal of Philosophical Logic 39: pp. 617–54.
———. 2011. “Challenges to Bayesian Confirmation Theory.” In Handbook of the Philosophy of Science. Vol. 7, Philosophy of Statistics, edited by Prasanta S. Bandyopadhyay and Malcolm R. Forster, pp. 391–439. Amsterdam: Elsevier.
Pettigrew, Richard. 2016. Accuracy and the Laws of Credence. Oxford: Oxford University Press.
Ramsey, Frank P. (1926) 1931. “Truth and Probability.” In The Foundations of Mathematics and Other Logical Essays, edited by R. B. Braithwaite, pp. 156–98. London: Kegan, Paul, Trench, Trubner & Co. Reprint, New York: Harcourt, Brace and Company.
Savage, Leonard J. (1954) 1972. The Foundations of Statistics. John Wiley & Sons. Revised ed., New York: Dover.
Shafer, Glenn. 1976. A Mathematical Theory of Evidence. Princeton: Princeton University Press.
Smith, Cedric A. B. 1961. “Consistency in Statistical Inference and Decision.” Journal of the Royal Statistical Society, Series B, 23: pp. 1–37.
Tribus, Myron. 1969. Rational Descriptions, Decisions and Designs. New York: Pergamon.
Van Inwagen, Peter. 1996. “Why Is There Anything at All?” Proceedings of the Aristotelian Society 70 (suppl.): pp. 95–120.
Vineberg, Susan. 2016. “Dutch Book Arguments.” The Stanford Encyclopedia of Philosophy. Spring 2016 Edition. Edited by Edward N. Zalta, https://plato.stanford.edu/archives/spr2016/entries/dutch-book/.
von Mises, Richard. 1957. Probability, Truth and Statistics. London: George Allen & Unwin.
Walley, Peter. 1991. Statistical Reasoning with Imprecise Probabilities. London: Chapman and Hall.
Weirich, Paul. 2011. “The Bayesian Decision-Theoretic Approach to Statistics.” In Handbook of the Philosophy of Science. Vol. 7, Philosophy of Statistics, edited by Prasanta S. Bandyopadhyay and Malcolm R. Forster, pp. 233–61. Amsterdam: Elsevier.
1 That it must do this might not be immediately clear. The standard manipulations within the probability calculus, for example, are all deductive. Take ten independent tosses of a fair coin. If the probability of heads is 1/2 in each toss, then the probability that at least one heads appears is 1 – 1/210 = 0.999. So far all the reasoning has been deductive. The inductive component only enters when we employ an interpretive rule that tells us that outcomes of near unit probability are to be expected. Without some sort of interpretive rule like this, the probability is simply a mathematical quantity with no import for real things in the world.
2 More examples are described in Norton (2010, §4).
3 See Bostrom (2002a, chaps. 6–7) for an entry into the earlier literature on this argument.
4 Assume equal prior probabilities P(T1) = P(T2).
5 Bostrom (2002, p. 57) seeks to warrant probabilistic analysis with his “self-sampling assumption.” In his view, “One should reason as if one were a random sample from the set of all observers in one’s reference class.” Here, “random sampling” implies equal probability of each sample drawn. Since it is an assumption without factual basis, it provides no warrant. Rather it enables us to identify and name the arbitrary posit that is the origin of the inductive fallacy.
6 Kolmogorov’s axioms are simpler since they specify an unconditional probability P(A). Conditional probabilities are introduced through the definition P(A | B) = P(A & B)/P(B). This approach is more restrictive than providing more complicated axioms directly for conditional probabilities. But the simpler approach suffices for the present analysis since the more complicated approaches only move the boundaries slightly.
7 More precisely, we posit a Boolean algebra of propositions, which is a set of propositions closed under finite or countable disjunction ∨, conjunction &, and negation −.
8 For an introduction to this literature, see Bradley (2016) and the resources on the website of the Society for Imprecise Probability: http://www.sipta.org.
9 As before, the outcomes space is a Boolean algebra of propositions whose universal proposition or tautology Ω is the disjunction of all the atoms Ω = A1 ∨ A2 ∨ … ∨ An. Proposition Ai is an atom just if any proposition A that entails it is either Ai itself or the contradiction.
10 This requirement of contingency excludes the tautology Ω.
11 Keeping distinct values for the tautology and contradiction presumes sufficient logical knowledge that we can discern them from the contingent propositions. One could also define a still more extreme case in which we are maximally ignorant of deductive relations among the propositions, so that all the strengths are I.
12 It is tempting to argue that the refinement must divide parameter values into uniform intervals. This requirement fails since what may be a uniform division for one scaling of the parameter will not be so for another. The wine and water example below illustrates the problem.
13 A stronger version arises if we cannot discern the tautology and contradiction from the contingent propositions. Then the restriction to contingent propositions can be dropped.
14 That is, we assume that there is a partial order “<” defined over the strengths. It is transitive and antisymmetric. Monotonicity is widely assumed but unnecessary, and one can conceive of logics in which it fails. An example is the specific conditioning logic of Norton (2010a, §11.2).
15 To avoid the need to juggle too many “<” and “≤,” I employ the expedient assumption that x and y can never adopt exactly the values 0.5, 2/3, 1, 1.5, and 2.
16 A measure is superadditive if the value assigned to a disjunction of mutually incompatible outcomes A ∨ B is greater than the sum of the values assigned to A and B individually.
17 While I know of no such efforts, one might seek to show the universal necessity of a probabilistic inductive logic through a demonstration that itself employs inductive inferences. These efforts would face a dilemma. If the inductive inferences used are not probabilistic, it is conceded at the outset that some inductive inferences are not probabilistic. If the inductive inferences are probabilistic, then it must be shown that this particular probabilistic demonstration of the necessity of probabilities is not viciously circular.
18 For recent surveys of a very extensive literature, see Hájek (2009) and Vineberg (2016).
19 The normative element is essential. The system so narrowly constrains an agent’s possible responses that it is a dismal means of ascertaining beliefs non-coersively.
20 Recounted in Weirich (2010, p. 246).
21 Starting in 1931, de Finetti had worked for an insurance company. Might this explain why he found it trivial that monetarily rewarded betting behavior measures belief?
22 Suppose that the bet “on” A1 pays a net of X > 0, if A1 is true, and Y < 0, if A1 is false. Then the bet “against” A1 pays a net of −X < 0, if A1 is true, and −Y > 0, if A1 is false. The bet with the same stakes “on” not-A1 pays X > 0, if not-A1 is true, and −Y > 0, if not-A1 is false. These last two bets can only be same if X = −Y.
23 What follows is an analysis concerning the atomic propositions. An analogous analysis can be applied to the propositions that are Boolean combinations of them.
24 The agent is willing to make a bet “on” A only if the bet pays a net of S – 0 ⋅ S = S > 0, if A is true; and with a payoff of −0 ⋅ S = 0 if A is false. The agent is willing to accept a bet “against” A only if the bet pays S – 1 ⋅ S = 0 if A is true; and with a payoff of −1S = −S > 0 if A is false, since S < 0 for an “against” bet.
25 Cox’s (pp. 29–34) later remarks on measurement pertain not to whether the strengths have real-valued magnitudes, but whether they can be assessed with precision.
26 Tribus (1969, pp. 16–17) has an argument against this possibility that appears flawed. He seems to argue that it is ruled out since (B | AC) becomes ill-defined when C is not-A. But this sort of difficulty is routinely overcome by allowing that in some special cases the function is ill-defined.
27 The principle tells us to distribute our prior probabilities as uniformly as the external constraints allow. Thus it is an extended form of the principle of indifference. If there are no constraints other than conformity with the probability calculus, maximizing entropy reduces to choosing the uniform probability distribution required by the original principle of indifference. This principle, as we have seen in Sections 10.7, 10.8, and 10.9, is an insecure basis for reasoning within the probability calculus since it rapidly produces results that contradict the calculus.