Intelligent Agent Foundations Forumsign up / log in
The Two-Update Problem, Part 1
post by Abram Demski 760 days ago | Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | discuss

This is a problem which Scott Garrabrant and I were thinking about quite a bit last fall, but moved on from as other problems in logical uncertainty captured our interest. As far as I know, it remains unsolved!


Scott and I were working on the problem of probabilistic priors, or as we’ve started calling them now, random extensions. Suppose you have a set (possibly empty) of axioms in a logic, and you want a “reasonable” probability distribution over the rest of the sentences, which are neither proved nor disproved by your set of sentences. This probability assignment is required to be coherent in the sense given by Paul Christiano or me: they “respect the logic”, namely (using Paul’s version):

  • For any \(\phi\) and \(\psi\), \(\mathbb{B}(\phi) = \mathbb{B}(\phi \wedge \psi)+\mathbb{B}(\phi \wedge \neg \psi)\).
  • For any tautology \(\phi\), \(\mathbb{B}(\phi)=1\).
  • For any contradiction \(\phi\), \(\mathbb{B}(\phi)=0\).

Paul proves that this is equivalent to requiring \(\mathbb{P}\) to be derived from a probability distribution on models (so that the probability of a sentence is the probability that the sentence is true).

Denote the random extension starting with theory \(T\) as \(\mathbb{P}_T (\phi)\). We additionally require that for \(\phi \in T\), \(\mathbb{B}(\phi)=1\).

We also wanted these probability distributions to be approximable, among other desirable properties.

The two-update problem began as the observation that, for my prior, adding a sentence \(\phi\) to the initial theory is quite different from performing a Bayesian update on \(\phi\). As our research progressed, we noticed that this seemed to be true of all the proposals we came up with. There were two distinct ways to “update” on a sentence: either do a Bayesian update to make its probability 1, or add it to the initial set of axioms which are true before the prior’s construction.

Stated as a formal problem, this may look easy to “solve”. For finite non-empty \(T\), we can simply re-define \(\mathbb{P}_T (\phi) = \mathbb{P}_{\emptyset}(\phi|T)\). (There is a complication if you try to do this for infinite \(T\), which is explained here. ) However, this seems to miss the point. Despite re-defining things in this way, if you look at the actual specification of the priors we were considering, a kind of non-bayesian update was quite visible. The non-bayesian update doesn’t go away if we take away the set of axioms (AKA the theory we’re extending into a probability distribution). Rather, it is visible in how we update on proofs.

As we approximate \(\mathbb{P}\) to increasing degree, we observe proofs of one thing or another and incorporate this into the distribution. This “incorporation” step was different depending on the prior \(\mathbb{P}\), but all of our options seemed to have it, and none of them looked like a Bayesian update. This “incorporation” step is a different way of understanding the meaning of the two-update problem, but harder to formalize exactly. It seems to us that this is really the same two-update problem, if you don’t make the ad-hoc patch mentioned above.

To put it simply: updating on an observed proof does not look like a Bayesian update. This seems potentially concerning.

Paul Christiano hit upon the same idea in his paradox of ignorance (Section 6.1.5 of the non-omniscience paper). He presented the following scenario:

Suppose two agents are trying to determine the truth of \(\forall x: \phi (x)\) for computable \(\phi\). These agents are both using the same definition of \(\mathbb{P}\), and the same set of axioms (we can suppose these are \(PA\)). However, one agent has a large amount of computing power with which to approximate \(\mathbb{P}\), and the other has relatively little. Both agents have been given a resource: a machine which computes \(\phi(x)\) for any value of \(x\). Unfortunately, this machine gets slower and slower the larger the value of \(x\) is asked for.

Suppose that \(\forall x: \phi (x)\) is actually true, but that this fact is not provable in \(PA\). Now, it naively seems that the agent with a large amount of computing power is at an advantage. However, it’s not quite so clear. What that computing power does is allow \(\mathbb{P}\) to be computed more accurately. In particular, the strong agent will have time to prove many more cases of \(\phi(x)\) than the weak agent. This means that the strong agent cannot update on those cases of \(\phi(x)\) when computed by the machine: the small examples already have probability 1, because the agent has proved them. To improve its knowledge, this agent must wait for large examples to slowly compute. The small agent, on the other hand, can look at many small values very quickly.

This is far from a formal proof of the paradox, and Paul suggests several possible ways out. It’s possible that the approximation of \(\mathcal{P}\) raises the probability of \(\forall x: \phi (x)\) just as much when it computes a new case of \(\phi (x)\) as it’s raised if we do a Bayesian update on its provability (observing the machine compute it). However, this is harder than it may initially sound.

Continued in part 2.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Note that the problem with
by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Typos on page 5: *
by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Ah, you're right. So gain
by Abram Demski on Smoking Lesion Steelman | 0 likes

> Do you have ideas for how
by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I think I understand what
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>You don’t have to solve
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

Your confusion is because you
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

My confusion is the
by Tom Everitt on Delegative Inverse Reinforcement Learning | 0 likes

> First of all, it seems to
by Abram Demski on Smoking Lesion Steelman | 0 likes

> figure out what my values
by Vladimir Slepnev on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I agree that selection bias
by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>It seems quite plausible
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

> defending against this type
by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

2. I think that we can avoid
by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I hope you stay engaged with
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

RSS

Privacy & Terms