by Jessica Taylor 528 days ago | link | parent Hmm… I’m finding that I’m unable to write down a simple shutdown problem in this framework (e.g. an environment where it should switch between maximizing paperclips and shutting down) to analyze what this algorithm does. To know what the algorithm does, I need to know what $$P$$ and $$\hat{P}$$ are (since these are parameters of the algorithm). From those I can derive $$P'$$ and $$\hat{P}'$$ to determine the agent’s action. But at the moment I have no way of proceeding, since I don’t know what $$P$$ and $$\hat{P}$$ are. Can you get me unstuck?

 by Stuart Armstrong 522 days ago | link Suppose the humans have already decided whether to press the shutdown or order the AI to maximise paperclips. If $$o_s$$ is the observation of the shutdown command and $$o_p$$ the observation of the paperclip maximising command, and $$u_s$$ and $$u_p$$ the relevant utilities, then $$P$$ can be defined as $$P(u_s|h_{m-1}o_s)=1$$ and $$P(u_p|h_{m-1}o_p)=1$$, for all histories $$h_{m-1}$$. Then define $$\widehat{P}$$ as the probability of $$o_s$$ versus $$o_p$$, conditional on the fact that the agent follows a particular deterministic policy $$\pi^0$$. If the agent does indeed follow $$\pi^0$$, then $$\widehat{P}=\widehat{P}'$$. If it varies from this policy, then $$\widehat{P}'$$ is altered in proportion to the expected change in $$\widehat{P}$$ caused by choosing a different action. reply

### NEW DISCUSSION POSTS

Note: I currently think that
 by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
 by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
 by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
 by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
 by Alex Appel on Smoking Lesion Steelman | 1 like

 by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
 by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
 by Jessica Taylor on Musings on Exploration | 0 likes

 by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
 by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
 by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

If you drop the
 by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
 by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
 by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like