An Untrollable Mathematician
post by Abram Demski 542 days ago | Alex Appel, Sam Eisenstat, Vanessa Kosoy, Jack Gallagher, Jessica Taylor, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this | 1 comment

Follow-up to All Mathematicians are Trollable.

It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the non-convergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild non-convergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.

Major parts of this post were cribbed from Sam’s notes, with some modifications.

# Construction

Set some prior on sampling individual sentences, $$\mu(\phi)$$, which gives positive probability to every sentence. Now sample an infinite sequence of sentences, $$\phi_1, \phi_2, ...$$ by at each step rejection-sampling from $$\mu$$ with the requirement of propositional consistency with the sentences sampled so far. The prior probability of a sentence, $$\mathbb{P}(\psi)$$, is the probability that it appears anywhere in this sequence.

This is nothing new – it’s just the old Demski prior with propositional consistency only. However, Sam’s idea is to interpret the sequence of sentences as the sequence of things Nature tells us / the sequence of things which is proved. (Why didn’t I think of that?) Thus, in addition to the raw probability distribution over sentences, we use the probability distribution over sequence locations. Start with $$n=1$$. When a sentence $$\psi$$ is proved (or otherwise observed), we perform a Bayesian update on $$\psi = \phi_n$$, and increment $$n$$.

This distribution can be computed to any desired accuracy $$\epsilon$$ in finite time. The probability of a sequence prefix $$\phi_1, \phi_2, ... \phi_n$$ can be computed to within $$\epsilon$$ by computing the normalization factor for $$\mu$$ at each step in the sequence to within sufficient accuracy to ensure the accuracy of the whole, by enumerating sentences and checking which are propositionally consistent with the sequence so far. The joint probability of any finite set of ordering assertions $$\psi = \phi_n$$ and unordered sentence assertions $$\psi$$ can be computed to within $$\epsilon$$ by enumerating the sequences of sentence selections by which these can become jointly true and by which they can become false, and calculating their probabilities with increasing accuracy, until the sum of the probability of ways they can become true and ways they can become false is less than epsilon with certainty.

# Untrollability

So long as the probability of a sentence $$\psi$$ is not yet 1 or 0, it has probability at least $$\mu(\psi)$$, since it could be sampled next. Similarly, its probability probability is at most $$\mu(\neg \psi)$$. Hence, it is not possible to drive the probability arbitrarily up or arbitrarily down, no matter what order we prove things in.

# Convergence

Note that $$\mathbb{P}$$ expects nature to eventually decide every proposition, in which case convergence to a single distribution is trivial; beliefs converge to 0 or 1 on every proposition. However, the posterior probabilities also converge to a single distribution even if some sentences are never updated on – as is the case when we restrict ourselves to updating on provable sentences.

To see this, take some sentence $$\phi$$ and some $$\epsilon>0$$. We want to show that for all sufficiently large $$n$$, we have $$| \mathbb{P}(\psi | \phi_1, \phi_2, ... \phi_n) - \mathbb{P}(\psi | \phi_1, \phi_2, ... \phi_m)| < \epsilon$$ for all $$m>n$$. If $$\psi$$ is eventually decided, this is trivially true. Otherwise, let $$Phi$$ be a large finite collection of sentences which will never be decided by the environment, chosen so that the probability of sampling a sentence that will never be decided but that is outside of $$\Phi$$ before sampling $$psi$$ or $$\neg \psi$$ (as we continue adding to the sequence by sampling) is less than $$\epsilon /4$$ no matter what other sentences have been decided already. Then, pick $$N$$ large enough that (1) after time $$N$$, the sentences announced after time $$N$$ do not announce any new logical relations between the sentences in $$\Phi$$; and, (2) the probability of deciding any new sentence not in $$\Phi$$ before deciding $$\psi$$ is less than $$\epsilon / 2$$. (1) is possible since there are only $$2^{|\Phi|}$$ joint assignments of truth values to sentences in $$\Phi$$, so after finite time all joint assignments which will ever be ruled out have already been. (2) is possible since the probability of sampling a sentence that will never be decided but that is outside of $$\Phi$$ is already small enough, and the probability of deciding all the rest of the sentences outside of $$\Phi$$ only goes down as more sentences get decided, approaching zero, so that it is small enough for some $$N$$.

So, for $$m>N$$, the probability that $$\psi$$ is decided before any sentences outside of $$\Phi$$ is $$1-(\epsilon / 2)$$, ensuring that any dependence is less than $$\epsilon$$.

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering! For any ordering of assertions, we can find an $$N$$ as specified, and the joint probability distribution on $$\Phi$$ (and $$\psi$$) will be the same to within epsilon.

We can also see that if the environment eventually asserts every provable sentence of $$PA$$, then the limit which we converge to must be a distribution on completions of $$PA$$: if $$PA \vdash \psi$$, then there is some $$n$$ such that $$\phi_n=\psi$$, so the posterior probability is $$1$$ then and beyond. Similarly, although we only require propositional coherence of $$\mathbb{P}$$, the limit will be fully coherent so long as the environment eventually asserts all logical truths (whatever else it may assert).

 by Sam Eisenstat 542 days ago | Abram Demski likes this | link I at first didn’t understand your argument for claim (2), so I wrote an alternate proof that’s a bit more obvious/careful. I now see why it works, but I’ll give my version below for anyone interested. In any case, what you really mean is the probability of deciding a sentence outside of $$\Phi$$ by having it announced by nature; there may be a high probability of sentences being decided indirectly via sentences in $$\Phi$$. Instead of choosing $$\Phi$$ as you describe, pick $$\Phi$$ so that the probability $$\mu(\Phi)$$ of sampling something in $$\Phi$$ is greater than $$1 - \mu(\psi) \cdot \varepsilon / 2$$. Then, the probability of sampling something in $$\Phi - \{\psi\}$$ is at least $$1 - \mu(\psi) \cdot (1 + \varepsilon / 2)$$. Hence, no matter what sentences have been decided already, the probability that repeatedly sampling from $$\mu$$ selects $$\psi$$ before it selects any sentence outside of $$\Phi$$ is at least \begin{align*} \sum_{k = 0}^\infty (1 - \mu(\psi) \cdot (1 + \varepsilon / 2))^k \cdot \mu(\psi) & = \frac{\mu(\psi)}{\mu(\psi) \cdot (1 + \varepsilon / 2)} \\ & > 1 - \varepsilon / 2 \end{align*} as desired. Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering! Oh, I didn’t notice that aspect of things. That’s pretty cool. reply

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes