Intelligent Agent Foundations Forumsign up / log in
The Two-Update Problem, Part 1
post by Abram Demski 822 days ago | Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | discuss

This is a problem which Scott Garrabrant and I were thinking about quite a bit last fall, but moved on from as other problems in logical uncertainty captured our interest. As far as I know, it remains unsolved!

Scott and I were working on the problem of probabilistic priors, or as we’ve started calling them now, random extensions. Suppose you have a set (possibly empty) of axioms in a logic, and you want a “reasonable” probability distribution over the rest of the sentences, which are neither proved nor disproved by your set of sentences. This probability assignment is required to be coherent in the sense given by Paul Christiano or me: they “respect the logic”, namely (using Paul’s version):

  • For any \(\phi\) and \(\psi\), \(\mathbb{B}(\phi) = \mathbb{B}(\phi \wedge \psi)+\mathbb{B}(\phi \wedge \neg \psi)\).
  • For any tautology \(\phi\), \(\mathbb{B}(\phi)=1\).
  • For any contradiction \(\phi\), \(\mathbb{B}(\phi)=0\).

Paul proves that this is equivalent to requiring \(\mathbb{P}\) to be derived from a probability distribution on models (so that the probability of a sentence is the probability that the sentence is true).

Denote the random extension starting with theory \(T\) as \(\mathbb{P}_T (\phi)\). We additionally require that for \(\phi \in T\), \(\mathbb{B}(\phi)=1\).

We also wanted these probability distributions to be approximable, among other desirable properties.

The two-update problem began as the observation that, for my prior, adding a sentence \(\phi\) to the initial theory is quite different from performing a Bayesian update on \(\phi\). As our research progressed, we noticed that this seemed to be true of all the proposals we came up with. There were two distinct ways to “update” on a sentence: either do a Bayesian update to make its probability 1, or add it to the initial set of axioms which are true before the prior’s construction.

Stated as a formal problem, this may look easy to “solve”. For finite non-empty \(T\), we can simply re-define \(\mathbb{P}_T (\phi) = \mathbb{P}_{\emptyset}(\phi|T)\). (There is a complication if you try to do this for infinite \(T\), which is explained here. ) However, this seems to miss the point. Despite re-defining things in this way, if you look at the actual specification of the priors we were considering, a kind of non-bayesian update was quite visible. The non-bayesian update doesn’t go away if we take away the set of axioms (AKA the theory we’re extending into a probability distribution). Rather, it is visible in how we update on proofs.

As we approximate \(\mathbb{P}\) to increasing degree, we observe proofs of one thing or another and incorporate this into the distribution. This “incorporation” step was different depending on the prior \(\mathbb{P}\), but all of our options seemed to have it, and none of them looked like a Bayesian update. This “incorporation” step is a different way of understanding the meaning of the two-update problem, but harder to formalize exactly. It seems to us that this is really the same two-update problem, if you don’t make the ad-hoc patch mentioned above.

To put it simply: updating on an observed proof does not look like a Bayesian update. This seems potentially concerning.

Paul Christiano hit upon the same idea in his paradox of ignorance (Section 6.1.5 of the non-omniscience paper). He presented the following scenario:

Suppose two agents are trying to determine the truth of \(\forall x: \phi (x)\) for computable \(\phi\). These agents are both using the same definition of \(\mathbb{P}\), and the same set of axioms (we can suppose these are \(PA\)). However, one agent has a large amount of computing power with which to approximate \(\mathbb{P}\), and the other has relatively little. Both agents have been given a resource: a machine which computes \(\phi(x)\) for any value of \(x\). Unfortunately, this machine gets slower and slower the larger the value of \(x\) is asked for.

Suppose that \(\forall x: \phi (x)\) is actually true, but that this fact is not provable in \(PA\). Now, it naively seems that the agent with a large amount of computing power is at an advantage. However, it’s not quite so clear. What that computing power does is allow \(\mathbb{P}\) to be computed more accurately. In particular, the strong agent will have time to prove many more cases of \(\phi(x)\) than the weak agent. This means that the strong agent cannot update on those cases of \(\phi(x)\) when computed by the machine: the small examples already have probability 1, because the agent has proved them. To improve its knowledge, this agent must wait for large examples to slowly compute. The small agent, on the other hand, can look at many small values very quickly.

This is far from a formal proof of the paradox, and Paul suggests several possible ways out. It’s possible that the approximation of \(\mathcal{P}\) raises the probability of \(\forall x: \phi (x)\) just as much when it computes a new case of \(\phi (x)\) as it’s raised if we do a Bayesian update on its provability (observing the machine compute it). However, this is harder than it may initially sound.

Continued in part 2.





Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
by Stuart Armstrong on Predictable Exploration | 0 likes

Thinking about this more, I
by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
by Abram Demski on Predictable Exploration | 0 likes

Hm, I got the same result
by Alex Appel on Predictable Exploration | 1 like

Paul - how widely do you want
by David Krueger on Funding opportunity for AI alignment research | 0 likes

I agree, my intuition is that
by Abram Demski on Smoking Lesion Steelman III: Revenge of the Tickle... | 0 likes


Privacy & Terms