Intelligent Agent Foundations Forumsign up / log in
by Stuart Armstrong 360 days ago | link | parent

Suppose the humans have already decided whether to press the shutdown or order the AI to maximise paperclips. If \(o_s\) is the observation of the shutdown command and \(o_p\) the observation of the paperclip maximising command, and \(u_s\) and \(u_p\) the relevant utilities, then \(P\) can be defined as \(P(u_s|h_{m-1}o_s)=1\) and \(P(u_p|h_{m-1}o_p)=1\), for all histories \(h_{m-1}\).

Then define \(\widehat{P}\) as the probability of \(o_s\) versus \(o_p\), conditional on the fact that the agent follows a particular deterministic policy \(\pi^0\).

If the agent does indeed follow \(\pi^0\), then \(\widehat{P}=\widehat{P}'\). If it varies from this policy, then \(\widehat{P}'\) is altered in proportion to the expected change in \(\widehat{P}\) caused by choosing a different action.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Without reading closely, this
by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

RSS

Privacy & Terms