The Two-Update Problem, Part 2

abramdemski

In the first part, I explained that the two-update problem can be understood in two different ways: (1) adding an axiom $ϕ$ to a logic before defining a probability distribution $P$ for it is generally different from a Bayesian update on $ϕ$ , and (2) improving our approximation of $P$ upon observing a proof of $ϕ$ is generally different from a Bayesian update. (These two forms appear to really be the same problem.)

This may not be fundamentally concerning: perhaps neither of these should be a Bayesian update. However, Paul Christiano's paradox of ignorance suggests that we will have some problems if we depart from Bayesian updates. I closed with the question of whether we can circumvent this by making the update-on-proofs into a Bayesian update; I'll address that question here.

What we would want to do is start with a probability distribution which is ignorant of logic, and do a Bayesian update on facts of logic as we prove them. (I think this suggestion was made once or twice elsewhere; I'll link to original sources as I find them, but let me know if you know of them.) Can this be made to work?

The simplest case is that we start out by assigning every statement a probability of 1/2, independent of all other statements. We then run our theorem-prover, and update on any proofs we find. This doesn't just mean updating single sentences as they are proved; indeed, this would do nothing to alter the probability of other sentences. The important thing is to eliminate combinations of sentence truth-assignments which are proven to be inconsistent. For example, if we find that $ϕ ⊢ ψ$ , we rule out the states where $ϕ$ is true and $ψ$ is false. If $ψ$ is a new sentence not yet touched by anything else, this modifies $P (ϕ)$ by a likelihood ratio of 1:2 (making it less probable since its truth depends on a more constrained circumstance). Similarly, if $ϕ$ is new, $P (ψ)$ is modified by the likelihood ratio 2:1.

It follows that no matter how far along in the process we are, the likelihood ratio of $ϕ$ is always being doubled or halved: it's halved when we encounter a proof of a new instance of $ϕ ⊢ ϕ \lor ψ$ (choosing $ψ$ to select an untouched $ϕ \lor ψ$ ), and doubled when we encounter a new instance of $ϕ \land ψ ⊢ ϕ$ (again, chosen so that $ϕ \land ψ$ is fresh). This means that the distribution can only converge if it converges to probabilities that are just 0 or 1. Furthermore, if $ϕ$ is never proved or disproved, whether it converges to 0 or 1 could be manipulated by changing the order of the proof search; we can bias search toward cases of $ϕ ⊢ ϕ \lor ψ$ or of $ϕ \land ψ ⊢ ϕ$ depending on which direction we want to drive $ϕ$ .

(Again, I seem to recall that something like this has been discussed before -- let me know if you know where I should be linking to.)

So, this isn't any good. It seems like our initial distribution over sentences was too simplistic. For fixed $ϕ$ , as we observe more and more proofs of $ϕ ⊢ ψ$ or $ψ ⊢ ϕ$ , the probability of $ϕ$ should change less and less.

What if we start with the Solomonoff distribution over sentence truth-values, instead? We could regard a sequence of 0s and 1s as representing the truth and falsehood of sentences, with the $n$ th position in the sequence telling us about the $n$ th sentence. Unfortunately, logically consistent assignments of 1s and 0s are, taken together, an event of measure 0 in the Solomonoff distribution. There is no stochastic Turing machine which is guaranteed to produce such an assignment. (If there were such a machine, we could use it to specify a complete and consistent extension of $P A$ by feeding it 1s rather than random bits, and taking just the first-order statements of the form "the axioms of PA $\to$ ...".) This means the result of a Bayesian update is not defined by Bayes' theorem; perhaps we can construct something reasonable by taking limits, but this seems likely to run into the same problem as we saw with the prior which assigns 1/2 to everything.

Benja suggested a way to get around this: use a modified version of Solomonoff induction which uses Turing machines with a halting oracle. These machines can decide logical implication, and therefore will generate consistent assignments with nonzero probability.

Unfortunately, this approach will not be approximable. Machines with a halting oracle are equivalent to Schmidhuber's GTMs, which are not approximable for reasons exactly analogous to the halting problem.

What Benja effectively did is to say: let's use a mixture of all approximable probability distributions! This includes any reasonable probability distribution over logic we might ever come up with (if "reasonable" includes approximability). Updating on logic as a constraint is then straightforward. Sadly, this distribution is not itself approximable.

In the next post in this series, I plan to describe ways we considered to try and get around this.

AI ALIGNMENT FORUM
AF

3

The Two-Update Problem, Part 2

3