Intelligent Agent Foundations Forumsign up / log in
by Abram Demski 375 days ago | Vadim Kosoy likes this | link | parent

The non-smoke-loving agents think of themselves as having a negative incentive to switch to CDT in that case. They think that if they build a CDT agent with oracle access to their true reward function, they may smoke (since they don’t know what their true reward function is). So I don’t think there’s an equilibrium there. The non-smoke-lovers would prefer to explicitly give a CDT successor a non-smoke-loving utility function, if they wanted to switch to CDT. But then, this action itself would give evidence of their own true utility function, likely counter-balancing any reason to switch to CDT.

I was wondering about what happens if the agents try to write a strategy for switching between using such a utility oracle and a hand-written utility function (which would in fact be the same function, since they prefer their own utility function). But this probably doesn’t do anything nice either, since a useful choice of policy their would also reveal too much information about motives.



by Vadim Kosoy 375 days ago | Abram Demski likes this | link

Yeah, you’re right. This setting is quite confusing :) In fact, if your agent doesn’t commit to a policy once and for all, things get pretty weird because it doesn’t trust its future-self.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms