by Vanessa Kosoy 983 days ago | link | parent This is more or less what I was talking about here (see last paragraph). This should also give us superrationality, provided that instead of allowing an arbitrary “future version”, we constrain the future version to be a limited agent with access to a powerful “oracle” for queries of the form $$E[U \mid \pi]$$ for all possible policies $$\pi$$ (which might involve constructing another, even more powerful, agent). If we don’t impose this constraint, we run into the problem of “self-stabilizing mutually detrimental blackmail” in multi-agent scenarios.

 by Wei Dai 982 days ago | link I may be misunderstanding what you’re proposing, but assuming that each decision process has the option to output “I’ve thought enough, no need for another version of me, it’s time to take action X” and have X be “construct this other agent and transfer my resources to it”, the constraint on future versions doesn’t seem to actually do much. reply
 by Vanessa Kosoy 979 days ago | link Well, the time to take a decision is limited. I guess that for this to work in full generality we would need that the total computing time of the future agents over a time discount horizon will be insufficient to simulate the “oracle” of even the first agent, which might be a too harsh restriction. Perhaps restricting space will help since space aggregates as max rather than as sum. I don’t have a detailed understanding of this, but IMO any decision theory that yields robust superrationality (i.e. not only for symmetric games and perfectly identical agents) needs to have some aspect that behaves like this. reply

