Intelligent Agent Foundations Forumsign up / log in
Agent Simulates Predictor using Second-Level Oracles
discussion post by Patrick LaVictoire 923 days ago | Jessica Taylor and Nate Soares like this | discuss

Prerequisite: Agents that can predict their Newcomb predictor

(This post is from Benja’s backlog of topics that should appear here, which he’s deputized me to write up. The original idea was due to Jessica Taylor.)

The phenomena of agent-simulates-predictor or agent-trusts-predictor, where one has stronger axioms and the other has stronger powers of deduction, can be done at various levels of the tradeoff between realistic examples and tractable proofs. In the previous post, I considered the version that uses a bounded proof search on one hand, and a halting oracle on the other. There’s a finite version where one process uses a bounded proof search with a much higher bound than the other process. But in this post, I’ll discuss the clean infinite version where we pit a halting oracle against an oracle for the next level of the arithmetical hierarchy of Peano Arithmetic.

Here, we’ll use two different formal systems built on Peano Arithmetic.

A \(\Pi_1\) statement of PA is any statement of the form \(\forall n\, \phi(n)\), where \(\phi(\cdot)\) uses only bounded quantifiers. (For example, the claim that PA is consistent is a \(\Pi_1\) sentence, asserting that every natural number is not the Gödel number of a proof of a contradiction.)

\(\textsf{PA}+\Pi_1\) is PA plus the (non-decidable in PA) axioms consisting of every \(\Pi_1\) statement which is true in the standard model of PA. We can think of it as a formal system which can make a call in finite time to a standard halting oracle.

(Note that there’s a different meaning of \(\Pi_1\) for ZFC, involving the analogous definitions for set theory. In order to avoid that ambiguity, we could instead just use the formal system \(\textsf{PA+Sound(PA)}\), which is PA plus the axiom schema \(\forall \vec{n} \;\Box_{\textsf{PA}}\varphi(\vec{n}) \to \varphi(\vec{n})\) for every \(\varphi\). However, in that case we’d need to go through a proof that \(\textsf{PA+Sound(PA)}\) also knows that \(\textsf{PA}+\Pi_1\) is sound.)

We’ll have the exact same results as in the last post, so we’ll prove the analogous lemma, showing that the stronger axiom system (here, ZFC) can prove that statements of a certain type hold if and only if they are provable by an oracle with weaker axioms but stronger computation (here, \(\textsf{PA}+\Pi_1\))

Lemma: Let \(\varphi\) be a propositional combination of \(\Pi_1\) statements. Then \(\textsf{ZFC}\vdash \varphi \leftrightarrow \Box_{\textsf{PA}+\Pi_1}\varphi\).

Proof: First, we note that \(\textsf{ZFC}\vdash \Box_{\textsf{PA}+\Pi_1}\varphi \to \varphi\) for any \(\varphi\), because ZFC knows about models of \(\textsf{PA}+\Pi_1\), and thus knows it is sound.

Now we show that if \(\varphi\) is a propositional combination of \(\Pi_1\) statements, then \(\textsf{ZFC}\vdash \varphi \to \Box_{\textsf{PA}+\Pi_1}\varphi\). Express \(\varphi\) in disjunctive normal form; it is thus clear that it suffices to show this for \(\varphi\) that are conjunctions of \(\Pi_1\) statements and negations of \(\Pi_1\) statements. Now if such a \(\varphi\) holds, then all of its \(\Pi_1\) substatements must hold, and thus these are axioms of \(\textsf{PA}+\Pi_1\); also, all of the negations of \(\Pi_1\) statements are of the form \(\exists n\, \psi(n)\) where \(\psi\) has only bounded quantifiers, and if this holds, indeed there must be such an \(n\) for each substatement, and thus \(\textsf{PA}+\Pi_1\) will successfully prove \(\varphi\). (The inductive argument for this reasoning can be carried out in ZFC.)

Now we can show the same two cases, with the same proofs as before:

Case 1: Agent Trusts Predictor

The agent uses modal UDT and a halting oracle for ZFC; Omega uses the second-level oracle for \(\textsf{PA}+\Pi_1\). Note that the statements “\(A()=1\)” and “\(A()=2\)” are each propositional combinations of \(\Pi_1\) statements about Gödel numbers.

Since we (from outside the problem) believe that ZFC is consistent and we know by the Lemma that Omega will predict correctly, the agent will fail to prove that it can get $1M + $1K by two-boxing or by one-boxing, and will fail to prove that it can get $1M by two-boxing. Then, using the Lemma, it can prove that if it one-boxes, it will get $1M, and so it does.

Case 2: Agent Simulates Predictor

The agent now uses modal UDT as well as a second-level halting oracle for \(\textsf{PA}+\Pi_1\); Omega uses a halting oracle for ZFC.

Now, as before, the Lemma implies that Omega can prove that \(\Omega()=2 \to A()=2\), and that \(\Omega()=1\to A()=2\), so it proves \(A()=2\) and does not fill the box; then the agent two-boxes.





This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Without reading closely, this
by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes


Privacy & Terms