Intelligent Agent Foundations Forumsign up / log in
Agents that can predict their Newcomb predictor
post by Patrick LaVictoire 932 days ago | Jessica Taylor likes this | 4 comments

There’s a certain type of problem where it appears that having more computing power hurts you. That problem is the “agent simulates predictor” Newcomb’s Dilemma.

As Gary Drescher put it:

There’s a version of Newcomb’s Problem that poses the same sort of challenge to UDT that comes up in some multi-agent/game-theoretic scenarios.

Suppose:

  • The predictor does not run a detailed simulation of the agent, but relies instead on a high-level understanding of the agent’s decision theory and computational power.
  • The agent runs UDT, and has the ability to fully simulate the predictor.

Since the agent can deduce (by low-level simulation) what the predictor will do, the agent does not regard the prediction outcome as contingent on the agent’s computation. Instead, either predict-onebox or predict-twobox has a probability of 1 (since one or the other of those is deducible), and a probability of 1 remains the same regardless of what we condition on. The agent will then calculate greater utility for two-boxing than for one-boxing.

Meanwhile, the predictor, knowing that the the agent runs UDT and will fully simulate the predictor, can reason as in the preceding paragraph, and thus deduce that the agent will two-box. So the large box is left empty and the agent two-boxes (and the agent’s detailed simulation of the predictor correctly shows the predictor correctly predicting two-boxing).

The agent would be better off, though, running a different decision theory that does not two-box here, and that the predictor can deduce does not two-box.

EDITED 5/19/15: There’s a formal model of this due to Vladimir Slepnev where the agent and the predictor both have different types of predictive powers, such that in some sense they each know how the other will act in this universe. We’ll write this out along with another case where things work out properly.

(One algorithm has more computing power, but the other has stronger axioms: in particular, strong enough to prove that the other formal system is sound, as ZFC proves that PA is sound.)

In one of the following cases, proof-based UDT one-boxes for correct reasons; in the other case, it two-boxes analogously to the reasoning above.


In both cases, we’ll use the following lemma:

Lemma: Let \(A()\) be an algorithm that returns either 1 or 2, such that ZFC has a proof of length \(\approx \log(N)\) of “\(A()\) returns in \(\leq N\) steps”. Then ZFC has a proof of length \(\approx \log(N)\) that “\(A()=1\) if and only if PA proves that \(A()=1\).”

Proof: ZFC knows that PA is sound, so ZFC has a short proof that “if PA proves that \(A()=1\), then \(A()=1\)”. The other direction follows from the fact that the algorithm is bounded, and PA can simply show the execution trace of \(A\) in \(\leq CN\) steps.

Case 1: Success (Agent Trusts Predictor)

Omega looks through all proofs in PA; when it finds a proof that the agent one-boxes or a proof that the agent two-boxes, it stops looking and fills or empties the box accordingly. (If it fails to prove either, it empties the box.)

The agent looks several times through all proofs in ZFC of length \(\leq N\); it first looks for a proof that one of its actions leads to the payoff $1001000, then $1000000, then $1000, then $0. The first time it finds a proof that an action implies the outcome it’s looking for, it immediately takes that action. (If it fails to find any such proof, it two-boxes.)

I claim that for sufficiently large \(N\), the agent one-boxes, and Omega fills the box.

Proof: We use the Lemma. ZFC proves in \(\leq N\) steps that \(A()=1 \leftrightarrow \Box_{PA}[A()=1]\). Since we (from outside the problem) assume ZFC is consistent, it will not be able to prove that two-boxing gets it $1001000, nor that one-boxing gets it $1001000, nor that two-boxing gets it $1000000, but it will be able to prove that one-boxing gets it $1000000. Thus it one-boxes. Since it runs in finite time, of course PA proves it one-boxes, and so Omega fills the box.

Case 2: Failure (Agent Simulates Predictor)

We’ll swap the deductive processes this time. Omega looks through all proofs in ZFC of length \(\leq N\); when it finds a proof that the agent one-boxes or a proof that the agent two-boxes, it stops looking and fills or empties the box accordingly. (If it fails to prove either, it empties the box.)

The agent looks several times through all proofs in PA; it first looks for a proof that one of its actions leads to the payoff $1001000, then $1000000, then $1000, then $0. The first time it finds a proof that an action implies the outcome it’s looking for, it immediately takes that action. (If it fails to find any such proof, it two-boxes.)

I claim that the agent two-boxes, and Omega leaves the box empty.

Proof: Again, by the lemma, ZFC proves that \(\Omega()=1 \leftrightarrow \Box_{PA}[\Omega()=1]\) (where \(\Omega()=1\) means “Omega predicts one-boxing and thus fills the box”). Now note that \(\Omega()=1\) implies that PA proves \(A()=2 \to U=\$1001000\) and does not prove that \(A()=1 \to U=\$1001000\), and thus in this case \(A()=2\); and similarly, \(\Omega()=2\) implies that PA proves \(A()=2 \to U=\$1000\) and does not prove any of the prior counterfactuals, so again \(A()=2\). All of these proofs are quick within ZFC, so \(\Omega\) does prove that \(A()=2\), and thus \(\Omega()=2\).



by Vladimir Slepnev 921 days ago | Benja Fallenstein and Patrick LaVictoire like this | link

Your second case seems really similar to my post about ASP from 2011: http://lesswrong.com/lw/5rq/example_decision_theory_problem_agent_simulates/

reply

by Patrick LaVictoire 920 days ago | link

You’re absolutely right. I know I was in a hurry writing this post, but I don’t know how I managed to skip reading the second half of the post that I excerpted.

reply

by Benja Fallenstein 906 days ago | Patrick LaVictoire likes this | link

The other direction follows from the fact that the algorithm is bounded, and PA can simply show the execution trace of \(A\) in \(\le CN\) steps.

Unimportant technical point: I think the length of the PA proof grows faster than this. (More precisely, the length in symbols, rather than the length in number of statements; we’re almost always interested in the former, since it determines how quickly a proof can be checked or be found by exhaustive search.) The obvious way of showing \(A() = 1\) in PA is to successively show for higher and higher \(t\) that “after \(t\) ticks, the Turing machine computing \(A()\) is in the following state, and its tapes have the following content”. But even to write out the contents of the tape of a Turing machine that has run for \(t\) ticks will in general take \(t\) symbols. So the obvious way of doing the proof takes at least \(O(t^2)\) symbols. Moreover, I’m not sure whether we can get from “at time \(t\), the configuration is such-and-such” to “at time \(t+1\), the configuration is such-and-such” in a constant number of proof steps. I suspect we don’t need more than \(O(t^3)\) symbols overall, but I’m not extremely confident. (I’d be quite surprised if the time turned out not to be polynomial, though!)

reply

by Patrick LaVictoire 897 days ago | Jessica Taylor likes this | link

Note that in both this and the other model of Agent Simulates Predictor, the agent’s formal system does not know that the predictor’s formal system is consistent. One could argue that in this case it’s right to two-box. However, it’s still worrisome that the agent’s formal system proves \(A()=1 \to U()= \$0\) and does not prove \(A()=1 \to U()= \$1000000\), rather than proving both counterfactuals (spuriously) or neither.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
by Stuart Armstrong on Predictable Exploration | 0 likes

Thinking about this more, I
by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
by Abram Demski on Predictable Exploration | 0 likes

RSS

Privacy & Terms