Intelligent Agent Foundations Forumsign up / log in
An Untrollable Mathematician
post by Abram Demski 124 days ago | Alex Appel, Sam Eisenstat, Vadim Kosoy, Jack Gallagher, Jessica Taylor, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this | 1 comment

Follow-up to All Mathematicians are Trollable.

It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the non-convergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild non-convergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.

Major parts of this post were cribbed from Sam’s notes, with some modifications.


Set some prior on sampling individual sentences, \(\mu(\phi)\), which gives positive probability to every sentence. Now sample an infinite sequence of sentences, \(\phi_1, \phi_2, ...\) by at each step rejection-sampling from \(\mu\) with the requirement of propositional consistency with the sentences sampled so far. The prior probability of a sentence, \(\mathbb{P}(\psi)\), is the probability that it appears anywhere in this sequence.

This is nothing new – it’s just the old Demski prior with propositional consistency only. However, Sam’s idea is to interpret the sequence of sentences as the sequence of things Nature tells us / the sequence of things which is proved. (Why didn’t I think of that?) Thus, in addition to the raw probability distribution over sentences, we use the probability distribution over sequence locations. Start with \(n=1\). When a sentence \(\psi\) is proved (or otherwise observed), we perform a Bayesian update on \(\psi = \phi_n\), and increment \(n\).

This distribution can be computed to any desired accuracy \(\epsilon\) in finite time. The probability of a sequence prefix \(\phi_1, \phi_2, ... \phi_n\) can be computed to within \(\epsilon\) by computing the normalization factor for \(\mu\) at each step in the sequence to within sufficient accuracy to ensure the accuracy of the whole, by enumerating sentences and checking which are propositionally consistent with the sequence so far. The joint probability of any finite set of ordering assertions \(\psi = \phi_n\) and unordered sentence assertions \(\psi\) can be computed to within \(\epsilon\) by enumerating the sequences of sentence selections by which these can become jointly true and by which they can become false, and calculating their probabilities with increasing accuracy, until the sum of the probability of ways they can become true and ways they can become false is less than epsilon with certainty.


So long as the probability of a sentence \(\psi\) is not yet 1 or 0, it has probability at least \(\mu(\psi)\), since it could be sampled next. Similarly, its probability probability is at most \(\mu(\neg \psi)\). Hence, it is not possible to drive the probability arbitrarily up or arbitrarily down, no matter what order we prove things in.


Note that \(\mathbb{P}\) expects nature to eventually decide every proposition, in which case convergence to a single distribution is trivial; beliefs converge to 0 or 1 on every proposition. However, the posterior probabilities also converge to a single distribution even if some sentences are never updated on – as is the case when we restrict ourselves to updating on provable sentences.

To see this, take some sentence \(\phi\) and some \(\epsilon>0\). We want to show that for all sufficiently large \(n\), we have \(| \mathbb{P}(\psi | \phi_1, \phi_2, ... \phi_n) - \mathbb{P}(\psi | \phi_1, \phi_2, ... \phi_m)| < \epsilon\) for all \(m>n\). If \(\psi\) is eventually decided, this is trivially true. Otherwise, let \(Phi\) be a large finite collection of sentences which will never be decided by the environment, chosen so that the probability of sampling a sentence that will never be decided but that is outside of \(\Phi\) before sampling \(psi\) or \(\neg \psi\) (as we continue adding to the sequence by sampling) is less than \(\epsilon /4\) no matter what other sentences have been decided already. Then, pick \(N\) large enough that (1) after time \(N\), the sentences announced after time \(N\) do not announce any new logical relations between the sentences in \(\Phi\); and, (2) the probability of deciding any new sentence not in \(\Phi\) before deciding \(\psi\) is less than \(\epsilon / 2\). (1) is possible since there are only \(2^{|\Phi|}\) joint assignments of truth values to sentences in \(\Phi\), so after finite time all joint assignments which will ever be ruled out have already been. (2) is possible since the probability of sampling a sentence that will never be decided but that is outside of \(\Phi\) is already small enough, and the probability of deciding all the rest of the sentences outside of \(\Phi\) only goes down as more sentences get decided, approaching zero, so that it is small enough for some \(N\).

So, for \(m>N\), the probability that \(\psi\) is decided before any sentences outside of \(\Phi\) is \(1-(\epsilon / 2)\), ensuring that any dependence is less than \(\epsilon\).

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering! For any ordering of assertions, we can find an \(N\) as specified, and the joint probability distribution on \(\Phi\) (and \(\psi\)) will be the same to within epsilon.

We can also see that if the environment eventually asserts every provable sentence of \(PA\), then the limit which we converge to must be a distribution on completions of \(PA\): if \(PA \vdash \psi\), then there is some \(n\) such that \(\phi_n=\psi\), so the posterior probability is \(1\) then and beyond. Similarly, although we only require propositional coherence of \(\mathbb{P}\), the limit will be fully coherent so long as the environment eventually asserts all logical truths (whatever else it may assert).

by Sam Eisenstat 123 days ago | Abram Demski likes this | link

I at first didn’t understand your argument for claim (2), so I wrote an alternate proof that’s a bit more obvious/careful. I now see why it works, but I’ll give my version below for anyone interested. In any case, what you really mean is the probability of deciding a sentence outside of \(\Phi\) by having it announced by nature; there may be a high probability of sentences being decided indirectly via sentences in \(\Phi\).

Instead of choosing \(\Phi\) as you describe, pick \(\Phi\) so that the probability \(\mu(\Phi)\) of sampling something in \(\Phi\) is greater than \(1 - \mu(\psi) \cdot \varepsilon / 2\). Then, the probability of sampling something in \(\Phi - \{\psi\}\) is at least \(1 - \mu(\psi) \cdot (1 + \varepsilon / 2)\). Hence, no matter what sentences have been decided already, the probability that repeatedly sampling from \(\mu\) selects \(\psi\) before it selects any sentence outside of \(\Phi\) is at least

\begin{align*} \sum_{k = 0}^\infty (1 - \mu(\psi) \cdot (1 + \varepsilon / 2))^k \cdot \mu(\psi) & = \frac{\mu(\psi)}{\mu(\psi) \cdot (1 + \varepsilon / 2)} \\ & > 1 - \varepsilon / 2 \end{align*}

as desired.

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering!

Oh, I didn’t notice that aspect of things. That’s pretty cool.






Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

If you drop the
by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like


Privacy & Terms