by Vanessa Kosoy 983 days ago | link | parent This is more or less what I was talking about here (see last paragraph). This should also give us superrationality, provided that instead of allowing an arbitrary “future version”, we constrain the future version to be a limited agent with access to a powerful “oracle” for queries of the form $$E[U \mid \pi]$$ for all possible policies $$\pi$$ (which might involve constructing another, even more powerful, agent). If we don’t impose this constraint, we run into the problem of “self-stabilizing mutually detrimental blackmail” in multi-agent scenarios.

 by Wei Dai 982 days ago | link I may be misunderstanding what you’re proposing, but assuming that each decision process has the option to output “I’ve thought enough, no need for another version of me, it’s time to take action X” and have X be “construct this other agent and transfer my resources to it”, the constraint on future versions doesn’t seem to actually do much. reply
 by Vanessa Kosoy 979 days ago | link Well, the time to take a decision is limited. I guess that for this to work in full generality we would need that the total computing time of the future agents over a time discount horizon will be insufficient to simulate the “oracle” of even the first agent, which might be a too harsh restriction. Perhaps restricting space will help since space aggregates as max rather than as sum. I don’t have a detailed understanding of this, but IMO any decision theory that yields robust superrationality (i.e. not only for symmetric games and perfectly identical agents) needs to have some aspect that behaves like this. reply

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes