Indifference and compensatory rewards discussion post by Stuart Armstrong 492 days ago | discuss A putative new idea for AI control; index here. It’s occurred to me that there is a framework where we can see all “indifference” results as corrective rewards, both for the utility function change indifference and for the policy change indifference. Imagine that the agent has reward $$R_0$$ and is following policy $$\pi_0$$, and we want to change it to having reward $$R_1$$ and following policy $$\pi_1$$. Then the corrective reward we need to pay it, so that it doesn’t attempt to resist or cause that change, is simply the difference between the two expected values: $$V(R_0|\pi_0)-V(R_1|\pi_1)$$, where $$V$$ is the agent’s own valuation of the expected reward, conditional on the policy. This explains why off-policy reward-based agents are already safely interruptible: since we change the policy, not the reward, $$R_0=R_1$$. And since off-policy agents have value estimates that are indifferent to the policy followed, $$V(R_0|\pi_0)=V(R_1|\pi_1)$$, and the compensatory rewards are zero.

### NEW DISCUSSION POSTS

I found an improved version
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
 by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
 by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
 by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
 by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
 by Alex Appel on Smoking Lesion Steelman | 1 like

 by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
 by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
 by Jessica Taylor on Musings on Exploration | 0 likes