Indifference and compensatory rewards discussion post by Stuart Armstrong 102 days ago | discuss It’s occurred to me that there is a framework where we can see all “indifference” results as corrective rewards, both for the utility function change indifference and for the policy change indifference. Imagine that the agent has reward $$R_0$$ and is following policy $$\pi_0$$, and we want to change it to having reward $$R_1$$ and following policy $$\pi_1$$. Then the corrective reward we need to pay it, so that it doesn’t attempt to resist or cause that change, is simply the difference between the two expected values: $$V(R_0|\pi_0)-V(R_1|\pi_1)$$, where $$V$$ is the agent’s own valuation of the expected reward, conditional on the policy. This explains why off-policy reward-based agents are already safely interruptible: since we change the policy, not the reward, $$R_0=R_1$$. And since off-policy agents have value estimates that are indifferent to the policy followed, $$V(R_0|\pi_0)=V(R_1|\pi_1)$$, and the compensatory rewards are zero.

### NEW DISCUSSION POSTS

The "benign induction
 by David Krueger on Why I am not currently working on the AAMLS agenda | 0 likes

This comment is to explain
 by Alex Mennen on Formal Open Problem in Decision Theory | 0 likes

Thanks for writing this -- I
 by Daniel Dewey on AI safety: three human problems and one AI issue | 1 like

I think it does do the double
 by Stuart Armstrong on Acausal trade: double decrease | 0 likes

>but the agent incorrectly
 by Stuart Armstrong on CIRL Wireheading | 0 likes

I think the double decrease
 by Owen Cotton-Barratt on Acausal trade: double decrease | 0 likes

The problem is that our
 by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Yeah. The original generator
 by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 0 likes

I don't see how it would. The
 by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Does this generalise to
 by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

>Every point in this set is a
 by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This seems a proper version
 by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This doesn't seem to me to
 by Stuart Armstrong on Change utility, reduce extortion | 0 likes

[_Regret Theory with General
 by Abram Demski on Generalizing Foundations of Decision Theory II | 0 likes

It's not clear whether we
 by Paul Christiano on Infinite ethics comparisons | 1 like