An Untrollable Mathematician
post by Abram Demski 25 days ago | Alex Appel, Sam Eisenstat, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this | 1 comment

Follow-up to All Mathematicians are Trollable.

It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the non-convergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild non-convergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.

Major parts of this post were cribbed from Sam’s notes, with some modifications.

# Construction

Set some prior on sampling individual sentences, $$\mu(\phi)$$, which gives positive probability to every sentence. Now sample an infinite sequence of sentences, $$\phi_1, \phi_2, ...$$ by at each step rejection-sampling from $$\mu$$ with the requirement of propositional consistency with the sentences sampled so far. The prior probability of a sentence, $$\mathbb{P}(\psi)$$, is the probability that it appears anywhere in this sequence.

This is nothing new – it’s just the old Demski prior with propositional consistency only. However, Sam’s idea is to interpret the sequence of sentences as the sequence of things Nature tells us / the sequence of things which is proved. (Why didn’t I think of that?) Thus, in addition to the raw probability distribution over sentences, we use the probability distribution over sequence locations. Start with $$n=1$$. When a sentence $$\psi$$ is proved (or otherwise observed), we perform a Bayesian update on $$\psi = \phi_n$$, and increment $$n$$.

This distribution can be computed to any desired accuracy $$\epsilon$$ in finite time. The probability of a sequence prefix $$\phi_1, \phi_2, ... \phi_n$$ can be computed to within $$\epsilon$$ by computing the normalization factor for $$\mu$$ at each step in the sequence to within sufficient accuracy to ensure the accuracy of the whole, by enumerating sentences and checking which are propositionally consistent with the sequence so far. The joint probability of any finite set of ordering assertions $$\psi = \phi_n$$ and unordered sentence assertions $$\psi$$ can be computed to within $$\epsilon$$ by enumerating the sequences of sentence selections by which these can become jointly true and by which they can become false, and calculating their probabilities with increasing accuracy, until the sum of the probability of ways they can become true and ways they can become false is less than epsilon with certainty.

# Untrollability

So long as the probability of a sentence $$\psi$$ is not yet 1 or 0, it has probability at least $$\mu(\psi)$$, since it could be sampled next. Similarly, its probability probability is at most $$\mu(\neg \psi)$$. Hence, it is not possible to drive the probability arbitrarily up or arbitrarily down, no matter what order we prove things in.

# Convergence

Note that $$\mathbb{P}$$ expects nature to eventually decide every proposition, in which case convergence to a single distribution is trivial; beliefs converge to 0 or 1 on every proposition. However, the posterior probabilities also converge to a single distribution even if some sentences are never updated on – as is the case when we restrict ourselves to updating on provable sentences.

To see this, take some sentence $$\phi$$ and some $$\epsilon>0$$. We want to show that for all sufficiently large $$n$$, we have $$| \mathbb{P}(\psi | \phi_1, \phi_2, ... \phi_n) - \mathbb{P}(\psi | \phi_1, \phi_2, ... \phi_m)| < \epsilon$$ for all $$m>n$$. If $$\psi$$ is eventually decided, this is trivially true. Otherwise, let $$Phi$$ be a large finite collection of sentences which will never be decided by the environment, chosen so that the probability of sampling a sentence that will never be decided but that is outside of $$\Phi$$ before sampling $$psi$$ or $$\neg \psi$$ (as we continue adding to the sequence by sampling) is less than $$\epsilon /4$$ no matter what other sentences have been decided already. Then, pick $$N$$ large enough that (1) after time $$N$$, the sentences announced after time $$N$$ do not announce any new logical relations between the sentences in $$\Phi$$; and, (2) the probability of deciding any new sentence not in $$\Phi$$ before deciding $$\psi$$ is less than $$\epsilon / 2$$. (1) is possible since there are only $$2^{|\Phi|}$$ joint assignments of truth values to sentences in $$\Phi$$, so after finite time all joint assignments which will ever be ruled out have already been. (2) is possible since the probability of sampling a sentence that will never be decided but that is outside of $$\Phi$$ is already small enough, and the probability of deciding all the rest of the sentences outside of $$\Phi$$ only goes down as more sentences get decided, approaching zero, so that it is small enough for some $$N$$.

So, for $$m>N$$, the probability that $$\psi$$ is decided before any sentences outside of $$\Phi$$ is $$1-(\epsilon / 2)$$, ensuring that any dependence is less than $$\epsilon$$.

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering! For any ordering of assertions, we can find an $$N$$ as specified, and the joint probability distribution on $$\Phi$$ (and $$\psi$$) will be the same to within epsilon.

We can also see that if the environment eventually asserts every provable sentence of $$PA$$, then the limit which we converge to must be a distribution on completions of $$PA$$: if $$PA \vdash \psi$$, then there is some $$n$$ such that $$\phi_n=\psi$$, so the posterior probability is $$1$$ then and beyond. Similarly, although we only require propositional coherence of $$\mathbb{P}$$, the limit will be fully coherent so long as the environment eventually asserts all logical truths (whatever else it may assert).

 by Sam Eisenstat 25 days ago | Abram Demski likes this | link I at first didn’t understand your argument for claim (2), so I wrote an alternate proof that’s a bit more obvious/careful. I now see why it works, but I’ll give my version below for anyone interested. In any case, what you really mean is the probability of deciding a sentence outside of $$\Phi$$ by having it announced by nature; there may be a high probability of sentences being decided indirectly via sentences in $$\Phi$$. Instead of choosing $$\Phi$$ as you describe, pick $$\Phi$$ so that the probability $$\mu(\Phi)$$ of sampling something in $$\Phi$$ is greater than $$1 - \mu(\psi) \cdot \varepsilon / 2$$. Then, the probability of sampling something in $$\Phi - \{\psi\}$$ is at least $$1 - \mu(\psi) \cdot (1 + \varepsilon / 2)$$. Hence, no matter what sentences have been decided already, the probability that repeatedly sampling from $$\mu$$ selects $$\psi$$ before it selects any sentence outside of $$\Phi$$ is at least \begin{align*} \sum_{k = 0}^\infty (1 - \mu(\psi) \cdot (1 + \varepsilon / 2))^k \cdot \mu(\psi) & = \frac{\mu(\psi)}{\mu(\psi) \cdot (1 + \varepsilon / 2)} \\ & > 1 - \varepsilon / 2 \end{align*} as desired. Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering! Oh, I didn’t notice that aspect of things. That’s pretty cool. reply

### NEW DISCUSSION POSTS

[Delegative Reinforcement
 by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
 by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
 by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
 by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
 by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
 by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
 by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
 by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
 by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
 by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
 by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
 by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
 by Abram Demski on Policy Selection Solves Most Problems | 0 likes