Intelligent Agent Foundations Forumsign up / log in
by Vadim Kosoy 376 days ago | Abram Demski likes this | link | parent

The claim that “this isn’t changed at all by trying updateless reasoning” depends on the assumptions about updateless reasoning. If the agent chooses a policy in the form of a self-sufficient program, then you are right. On the other hand, if the agent chooses a policy in the form of a program with oracle access to the “utility estimator,” then there is an equilibrium where both smoke-lovers and non-smoke-lovers self-modify into CDT. Admittedly, there are also “bad” equilibria, e.g. non-smoke-lovers staying with EDT and smoke-lovers choosing between EDT and CDT with some probability. However, it seems arguable that the presence of bad equilibria is due to the “degenerate” property of the problem that one type of agents have incentives to move away from EDT whereas another type has exactly zero such incentives.



by Abram Demski 375 days ago | Vadim Kosoy likes this | link

The non-smoke-loving agents think of themselves as having a negative incentive to switch to CDT in that case. They think that if they build a CDT agent with oracle access to their true reward function, they may smoke (since they don’t know what their true reward function is). So I don’t think there’s an equilibrium there. The non-smoke-lovers would prefer to explicitly give a CDT successor a non-smoke-loving utility function, if they wanted to switch to CDT. But then, this action itself would give evidence of their own true utility function, likely counter-balancing any reason to switch to CDT.

I was wondering about what happens if the agents try to write a strategy for switching between using such a utility oracle and a hand-written utility function (which would in fact be the same function, since they prefer their own utility function). But this probably doesn’t do anything nice either, since a useful choice of policy their would also reveal too much information about motives.

reply

by Vadim Kosoy 375 days ago | Abram Demski likes this | link

Yeah, you’re right. This setting is quite confusing :) In fact, if your agent doesn’t commit to a policy once and for all, things get pretty weird because it doesn’t trust its future-self.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms