Agent Simulates Predictor using Second-Level Oracles discussion post by Patrick LaVictoire 1234 days ago | Jessica Taylor and Nate Soares like this | discuss Prerequisite: Agents that can predict their Newcomb predictor (This post is from Benja’s backlog of topics that should appear here, which he’s deputized me to write up. The original idea was due to Jessica Taylor.) The phenomena of agent-simulates-predictor or agent-trusts-predictor, where one has stronger axioms and the other has stronger powers of deduction, can be done at various levels of the tradeoff between realistic examples and tractable proofs. In the previous post, I considered the version that uses a bounded proof search on one hand, and a halting oracle on the other. There’s a finite version where one process uses a bounded proof search with a much higher bound than the other process. But in this post, I’ll discuss the clean infinite version where we pit a halting oracle against an oracle for the next level of the arithmetical hierarchy of Peano Arithmetic. Here, we’ll use two different formal systems built on Peano Arithmetic. A $$\Pi_1$$ statement of PA is any statement of the form $$\forall n\, \phi(n)$$, where $$\phi(\cdot)$$ uses only bounded quantifiers. (For example, the claim that PA is consistent is a $$\Pi_1$$ sentence, asserting that every natural number is not the Gödel number of a proof of a contradiction.) $$\textsf{PA}+\Pi_1$$ is PA plus the (non-decidable in PA) axioms consisting of every $$\Pi_1$$ statement which is true in the standard model of PA. We can think of it as a formal system which can make a call in finite time to a standard halting oracle. (Note that there’s a different meaning of $$\Pi_1$$ for ZFC, involving the analogous definitions for set theory. In order to avoid that ambiguity, we could instead just use the formal system $$\textsf{PA+Sound(PA)}$$, which is PA plus the axiom schema $$\forall \vec{n} \;\Box_{\textsf{PA}}\varphi(\vec{n}) \to \varphi(\vec{n})$$ for every $$\varphi$$. However, in that case we’d need to go through a proof that $$\textsf{PA+Sound(PA)}$$ also knows that $$\textsf{PA}+\Pi_1$$ is sound.) We’ll have the exact same results as in the last post, so we’ll prove the analogous lemma, showing that the stronger axiom system (here, ZFC) can prove that statements of a certain type hold if and only if they are provable by an oracle with weaker axioms but stronger computation (here, $$\textsf{PA}+\Pi_1$$) Lemma: Let $$\varphi$$ be a propositional combination of $$\Pi_1$$ statements. Then $$\textsf{ZFC}\vdash \varphi \leftrightarrow \Box_{\textsf{PA}+\Pi_1}\varphi$$. Proof: First, we note that $$\textsf{ZFC}\vdash \Box_{\textsf{PA}+\Pi_1}\varphi \to \varphi$$ for any $$\varphi$$, because ZFC knows about models of $$\textsf{PA}+\Pi_1$$, and thus knows it is sound. Now we show that if $$\varphi$$ is a propositional combination of $$\Pi_1$$ statements, then $$\textsf{ZFC}\vdash \varphi \to \Box_{\textsf{PA}+\Pi_1}\varphi$$. Express $$\varphi$$ in disjunctive normal form; it is thus clear that it suffices to show this for $$\varphi$$ that are conjunctions of $$\Pi_1$$ statements and negations of $$\Pi_1$$ statements. Now if such a $$\varphi$$ holds, then all of its $$\Pi_1$$ substatements must hold, and thus these are axioms of $$\textsf{PA}+\Pi_1$$; also, all of the negations of $$\Pi_1$$ statements are of the form $$\exists n\, \psi(n)$$ where $$\psi$$ has only bounded quantifiers, and if this holds, indeed there must be such an $$n$$ for each substatement, and thus $$\textsf{PA}+\Pi_1$$ will successfully prove $$\varphi$$. (The inductive argument for this reasoning can be carried out in ZFC.) Now we can show the same two cases, with the same proofs as before: Case 1: Agent Trusts Predictor The agent uses modal UDT and a halting oracle for ZFC; Omega uses the second-level oracle for $$\textsf{PA}+\Pi_1$$. Note that the statements “$$A()=1$$” and “$$A()=2$$” are each propositional combinations of $$\Pi_1$$ statements about Gödel numbers. Since we (from outside the problem) believe that ZFC is consistent and we know by the Lemma that Omega will predict correctly, the agent will fail to prove that it can get $1M +$1K by two-boxing or by one-boxing, and will fail to prove that it can get $1M by two-boxing. Then, using the Lemma, it can prove that if it one-boxes, it will get$1M, and so it does. Case 2: Agent Simulates Predictor The agent now uses modal UDT as well as a second-level halting oracle for $$\textsf{PA}+\Pi_1$$; Omega uses a halting oracle for ZFC. Now, as before, the Lemma implies that Omega can prove that $$\Omega()=2 \to A()=2$$, and that $$\Omega()=1\to A()=2$$, so it proves $$A()=2$$ and does not fill the box; then the agent two-boxes.

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes