Intelligent Agent Foundations Forumsign up / log in
The Two-Update Problem, Part 1
post by Abram Demski 912 days ago | Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | discuss

This is a problem which Scott Garrabrant and I were thinking about quite a bit last fall, but moved on from as other problems in logical uncertainty captured our interest. As far as I know, it remains unsolved!


Scott and I were working on the problem of probabilistic priors, or as we’ve started calling them now, random extensions. Suppose you have a set (possibly empty) of axioms in a logic, and you want a “reasonable” probability distribution over the rest of the sentences, which are neither proved nor disproved by your set of sentences. This probability assignment is required to be coherent in the sense given by Paul Christiano or me: they “respect the logic”, namely (using Paul’s version):

  • For any \(\phi\) and \(\psi\), \(\mathbb{B}(\phi) = \mathbb{B}(\phi \wedge \psi)+\mathbb{B}(\phi \wedge \neg \psi)\).
  • For any tautology \(\phi\), \(\mathbb{B}(\phi)=1\).
  • For any contradiction \(\phi\), \(\mathbb{B}(\phi)=0\).

Paul proves that this is equivalent to requiring \(\mathbb{P}\) to be derived from a probability distribution on models (so that the probability of a sentence is the probability that the sentence is true).

Denote the random extension starting with theory \(T\) as \(\mathbb{P}_T (\phi)\). We additionally require that for \(\phi \in T\), \(\mathbb{B}(\phi)=1\).

We also wanted these probability distributions to be approximable, among other desirable properties.

The two-update problem began as the observation that, for my prior, adding a sentence \(\phi\) to the initial theory is quite different from performing a Bayesian update on \(\phi\). As our research progressed, we noticed that this seemed to be true of all the proposals we came up with. There were two distinct ways to “update” on a sentence: either do a Bayesian update to make its probability 1, or add it to the initial set of axioms which are true before the prior’s construction.

Stated as a formal problem, this may look easy to “solve”. For finite non-empty \(T\), we can simply re-define \(\mathbb{P}_T (\phi) = \mathbb{P}_{\emptyset}(\phi|T)\). (There is a complication if you try to do this for infinite \(T\), which is explained here. ) However, this seems to miss the point. Despite re-defining things in this way, if you look at the actual specification of the priors we were considering, a kind of non-bayesian update was quite visible. The non-bayesian update doesn’t go away if we take away the set of axioms (AKA the theory we’re extending into a probability distribution). Rather, it is visible in how we update on proofs.

As we approximate \(\mathbb{P}\) to increasing degree, we observe proofs of one thing or another and incorporate this into the distribution. This “incorporation” step was different depending on the prior \(\mathbb{P}\), but all of our options seemed to have it, and none of them looked like a Bayesian update. This “incorporation” step is a different way of understanding the meaning of the two-update problem, but harder to formalize exactly. It seems to us that this is really the same two-update problem, if you don’t make the ad-hoc patch mentioned above.

To put it simply: updating on an observed proof does not look like a Bayesian update. This seems potentially concerning.

Paul Christiano hit upon the same idea in his paradox of ignorance (Section 6.1.5 of the non-omniscience paper). He presented the following scenario:

Suppose two agents are trying to determine the truth of \(\forall x: \phi (x)\) for computable \(\phi\). These agents are both using the same definition of \(\mathbb{P}\), and the same set of axioms (we can suppose these are \(PA\)). However, one agent has a large amount of computing power with which to approximate \(\mathbb{P}\), and the other has relatively little. Both agents have been given a resource: a machine which computes \(\phi(x)\) for any value of \(x\). Unfortunately, this machine gets slower and slower the larger the value of \(x\) is asked for.

Suppose that \(\forall x: \phi (x)\) is actually true, but that this fact is not provable in \(PA\). Now, it naively seems that the agent with a large amount of computing power is at an advantage. However, it’s not quite so clear. What that computing power does is allow \(\mathbb{P}\) to be computed more accurately. In particular, the strong agent will have time to prove many more cases of \(\phi(x)\) than the weak agent. This means that the strong agent cannot update on those cases of \(\phi(x)\) when computed by the machine: the small examples already have probability 1, because the agent has proved them. To improve its knowledge, this agent must wait for large examples to slowly compute. The small agent, on the other hand, can look at many small values very quickly.

This is far from a formal proof of the paradox, and Paul suggests several possible ways out. It’s possible that the approximation of \(\mathcal{P}\) raises the probability of \(\forall x: \phi (x)\) just as much when it computes a new case of \(\phi (x)\) as it’s raised if we do a Bayesian update on its provability (observing the machine compute it). However, this is harder than it may initially sound.

Continued in part 2.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

RSS

Privacy & Terms