Intelligent Agent Foundations Forumsign up / log in
The many counterfactuals of counterfactual mugging
discussion post by Scott Garrabrant 1197 days ago | Ryan Carey and Tsvi Benson-Tilsen like this | 2 comments

This post is roughly an explanation of my current understanding of what the correct solution to the counterfactual mugging problem might look like. This post is all philosophy, with no real math. The interesting part, is that even if we could perform the standard counterfactuals on what if I take a different action, and look at a counterfactual in which the coin flip went another way, we would still not be done, because we would not know the true probability of the coin.

The Problem: You are a deterministic agent who knows a bunch of facts about math. In particular, you know that \(2^{2^{2^{2^2}}}\) starts with a 2 in base 10. \(\Omega\) comes up to you and shows you his source code. In \(\Omega\)’s source code, you see that \(\Omega\) first calculates the first digit of \(2^{2^{2^{2^2}}}.\) If it is even, \(\Omega\) shows you this code and asks you for 10 dollars. If it is odd, then \(\Omega\) tries to predict what you would do if the digit is even, and pays you 9 dollars if and only if he predicts that you would have paid the 10 dollars.

One possible solution: So first, since I am a deterministic agent, I either pay the 10 dollars or I don’t. Therefore, I have to do the standard decision theory counterfactuals. I compute counterfactually what happens if I pay the 10, and counterfactually what happens if I don’t. (One of these two counterfactuals is the actual world.) Note that in these first two counterfactuals, \(2^{2^{2^{2^2}}}\) still starts with a 2, so the only difference is that I lose 10 dollars if I pay the 10.

Observe that I cannot trust \(\Omega\)’s source code to tell me about the counterfactual world in which \(2^{2^{2^{2^2}}}\) starts with an odd number. For all I know, maybe \(\Omega\) would not exist if \(2^{2^{2^{2^2}}}\) didn’t start with a 2. So first, I have to myself compute the counterfactual world in which \(2^{2^{2^{2^2}}}\) starts with an odd number. If in this counterfactual, I am not playing this game at all, I probably should not pay the 10 dollars. Assume I perform this counterfactual, and I see a world in which \(\Omega\) is running the same code.

Within this counterfactual, \(\Omega\) is performing his own counterfactual. \(\Omega\) is computing what I do in the counterfactual in which \(2^{2^{2^{2^2}}}\) starts with an even number. (This is in fact a counterfactual, because it is taking place within the counterfactual world in which \(2^{2^{2^{2^2}}}\) starts with an odd number.)

Lets say that my counterfactual in which \(2^{2^{2^{2^2}}}\) starts with an odd number, and \(\Omega\)’s counterfactual in which \(2^{2^{2^{2^2}}}\) starts with an even number turn out to be inverses. Then, when \(\Omega\) performs this counterfactual, \(\Omega\) ends up looking at the real world.

We now have two different possible top level worlds. The EVEN world, in which the digit is even, and I am counterfacting to predict \(\Omega,\) and the ODD world, in which the digit is odd, and \(\Omega\) is counterfacting to predict me. In the EVEN world, we already performed the counterfactuals and observed that if I pay, I get -10, and if I don’t, I get 0. In the ODD world, we can again preform the standard counterfactuals, and see that if I pay, I get 9, and if I don’t, I get 0.

Depending on how you count, we have performed somewhere form 4 to 6 counterfactuals already, but we are not done. Even with a complete ability to analyze what the ODD world looks like, we still have to figure out how much we should care about the ODD world as a whole relative to the EVEN world.

We can’t expect either the EVEN world or ODD would to tell us how much to care about each world. Relative to the EVEN world, the EVEN world is true, and relative to ODD world, the ODD world is true. We must take a 7th counterfactual that looks something like counterfacting on ourselves not knowing what \(2^{2^{2^{2^2}}}\) starts with, and asking what probability we would assign to the EVEN world. Performing this counterfactual, we see that the probability that \(2^{2^{2^{2^2}}}\) starts with an even number is 39.11% (From Benford’s Law). Paying 10 dollars with probability 39.11% to gain 9 dollars with probability 60.89% is a good deal, so we should pay.



by Vladimir Slepnev 1190 days ago | link

Counterfactual mugging with a logical coin is a tricky problem. It might be easier to describe the problem with a “physical” coin first. We have two world programs, mutually quined:

  1. The agent decides whether to pay the predictor 10 dollars. The predictor doesn’t decide anything.

  2. The agent doesn’t decide anything. The predictor decides whether to pay the agent 100 dollars, depending on the agent’s decision in world 1.

By fiat, the agent cares about the two worlds equally, i.e. it maximizes the total sum of money it receives in both worlds. The usual UDT-ish solution can be crisply formulated in modal logic, PA or a bunch of other formalisms.

Does that make sense?

reply

by Scott Garrabrant 1189 days ago | Ryan Carey likes this | link

This makes sense. My main point is that the care about the two worlds equally part makes sense if it is part of the problem description, but otherwise we don’t know where that part comes from.

My logical example was supposed to illustrate that sometimes you should not care about them equally.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms