Intelligent Agent Foundations Forumsign up / log in
by Stuart Armstrong 576 days ago | link | parent

Suppose the humans have already decided whether to press the shutdown or order the AI to maximise paperclips. If \(o_s\) is the observation of the shutdown command and \(o_p\) the observation of the paperclip maximising command, and \(u_s\) and \(u_p\) the relevant utilities, then \(P\) can be defined as \(P(u_s|h_{m-1}o_s)=1\) and \(P(u_p|h_{m-1}o_p)=1\), for all histories \(h_{m-1}\).

Then define \(\widehat{P}\) as the probability of \(o_s\) versus \(o_p\), conditional on the fact that the agent follows a particular deterministic policy \(\pi^0\).

If the agent does indeed follow \(\pi^0\), then \(\widehat{P}=\widehat{P}'\). If it varies from this policy, then \(\widehat{P}'\) is altered in proportion to the expected change in \(\widehat{P}\) caused by choosing a different action.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms