Indifference and compensatory rewards discussion post by Stuart Armstrong 216 days ago | discuss A putative new idea for AI control; index here. It’s occurred to me that there is a framework where we can see all “indifference” results as corrective rewards, both for the utility function change indifference and for the policy change indifference. Imagine that the agent has reward $$R_0$$ and is following policy $$\pi_0$$, and we want to change it to having reward $$R_1$$ and following policy $$\pi_1$$. Then the corrective reward we need to pay it, so that it doesn’t attempt to resist or cause that change, is simply the difference between the two expected values: $$V(R_0|\pi_0)-V(R_1|\pi_1)$$, where $$V$$ is the agent’s own valuation of the expected reward, conditional on the policy. This explains why off-policy reward-based agents are already safely interruptible: since we change the policy, not the reward, $$R_0=R_1$$. And since off-policy agents have value estimates that are indifferent to the policy followed, $$V(R_0|\pi_0)=V(R_1|\pi_1)$$, and the compensatory rewards are zero.

### NEW DISCUSSION POSTS

Note that the problem with
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Typos on page 5: *
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Ah, you're right. So gain
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> Do you have ideas for how
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I think I understand what
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>You don’t have to solve
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

My confusion is the
 by Tom Everitt on Delegative Inverse Reinforcement Learning | 0 likes

> First of all, it seems to
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> figure out what my values
 by Vladimir Slepnev on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I agree that selection bias
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>It seems quite plausible
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

> defending against this type
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

2. I think that we can avoid
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I hope you stay engaged with
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes