Intelligent Agent Foundations Forumsign up / log in
Agent Simulates Predictor using Second-Level Oracles
discussion post by Patrick LaVictoire 1015 days ago | Jessica Taylor and Nate Soares like this | discuss

Prerequisite: Agents that can predict their Newcomb predictor

(This post is from Benja’s backlog of topics that should appear here, which he’s deputized me to write up. The original idea was due to Jessica Taylor.)

The phenomena of agent-simulates-predictor or agent-trusts-predictor, where one has stronger axioms and the other has stronger powers of deduction, can be done at various levels of the tradeoff between realistic examples and tractable proofs. In the previous post, I considered the version that uses a bounded proof search on one hand, and a halting oracle on the other. There’s a finite version where one process uses a bounded proof search with a much higher bound than the other process. But in this post, I’ll discuss the clean infinite version where we pit a halting oracle against an oracle for the next level of the arithmetical hierarchy of Peano Arithmetic.

Here, we’ll use two different formal systems built on Peano Arithmetic.

A \(\Pi_1\) statement of PA is any statement of the form \(\forall n\, \phi(n)\), where \(\phi(\cdot)\) uses only bounded quantifiers. (For example, the claim that PA is consistent is a \(\Pi_1\) sentence, asserting that every natural number is not the Gödel number of a proof of a contradiction.)

\(\textsf{PA}+\Pi_1\) is PA plus the (non-decidable in PA) axioms consisting of every \(\Pi_1\) statement which is true in the standard model of PA. We can think of it as a formal system which can make a call in finite time to a standard halting oracle.

(Note that there’s a different meaning of \(\Pi_1\) for ZFC, involving the analogous definitions for set theory. In order to avoid that ambiguity, we could instead just use the formal system \(\textsf{PA+Sound(PA)}\), which is PA plus the axiom schema \(\forall \vec{n} \;\Box_{\textsf{PA}}\varphi(\vec{n}) \to \varphi(\vec{n})\) for every \(\varphi\). However, in that case we’d need to go through a proof that \(\textsf{PA+Sound(PA)}\) also knows that \(\textsf{PA}+\Pi_1\) is sound.)

We’ll have the exact same results as in the last post, so we’ll prove the analogous lemma, showing that the stronger axiom system (here, ZFC) can prove that statements of a certain type hold if and only if they are provable by an oracle with weaker axioms but stronger computation (here, \(\textsf{PA}+\Pi_1\))

Lemma: Let \(\varphi\) be a propositional combination of \(\Pi_1\) statements. Then \(\textsf{ZFC}\vdash \varphi \leftrightarrow \Box_{\textsf{PA}+\Pi_1}\varphi\).

Proof: First, we note that \(\textsf{ZFC}\vdash \Box_{\textsf{PA}+\Pi_1}\varphi \to \varphi\) for any \(\varphi\), because ZFC knows about models of \(\textsf{PA}+\Pi_1\), and thus knows it is sound.

Now we show that if \(\varphi\) is a propositional combination of \(\Pi_1\) statements, then \(\textsf{ZFC}\vdash \varphi \to \Box_{\textsf{PA}+\Pi_1}\varphi\). Express \(\varphi\) in disjunctive normal form; it is thus clear that it suffices to show this for \(\varphi\) that are conjunctions of \(\Pi_1\) statements and negations of \(\Pi_1\) statements. Now if such a \(\varphi\) holds, then all of its \(\Pi_1\) substatements must hold, and thus these are axioms of \(\textsf{PA}+\Pi_1\); also, all of the negations of \(\Pi_1\) statements are of the form \(\exists n\, \psi(n)\) where \(\psi\) has only bounded quantifiers, and if this holds, indeed there must be such an \(n\) for each substatement, and thus \(\textsf{PA}+\Pi_1\) will successfully prove \(\varphi\). (The inductive argument for this reasoning can be carried out in ZFC.)

Now we can show the same two cases, with the same proofs as before:

Case 1: Agent Trusts Predictor

The agent uses modal UDT and a halting oracle for ZFC; Omega uses the second-level oracle for \(\textsf{PA}+\Pi_1\). Note that the statements “\(A()=1\)” and “\(A()=2\)” are each propositional combinations of \(\Pi_1\) statements about Gödel numbers.

Since we (from outside the problem) believe that ZFC is consistent and we know by the Lemma that Omega will predict correctly, the agent will fail to prove that it can get $1M + $1K by two-boxing or by one-boxing, and will fail to prove that it can get $1M by two-boxing. Then, using the Lemma, it can prove that if it one-boxes, it will get $1M, and so it does.

Case 2: Agent Simulates Predictor

The agent now uses modal UDT as well as a second-level halting oracle for \(\textsf{PA}+\Pi_1\); Omega uses a halting oracle for ZFC.

Now, as before, the Lemma implies that Omega can prove that \(\Omega()=2 \to A()=2\), and that \(\Omega()=1\to A()=2\), so it proves \(A()=2\) and does not fill the box; then the agent two-boxes.





If you drop the
by Alex Appel on Distributed Cooperation | 0 likes

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes


Privacy & Terms