Indifference and compensatory rewards discussion post by Stuart Armstrong 67 days ago | discuss It’s occurred to me that there is a framework where we can see all “indifference” results as corrective rewards, both for the utility function change indifference and for the policy change indifference. Imagine that the agent has reward $$R_0$$ and is following policy $$\pi_0$$, and we want to change it to having reward $$R_1$$ and following policy $$\pi_1$$. Then the corrective reward we need to pay it, so that it doesn’t attempt to resist or cause that change, is simply the difference between the two expected values: $$V(R_0|\pi_0)-V(R_1|\pi_1)$$, where $$V$$ is the agent’s own valuation of the expected reward, conditional on the policy. This explains why off-policy reward-based agents are already safely interruptible: since we change the policy, not the reward, $$R_0=R_1$$. And since off-policy agents have value estimates that are indifferent to the policy followed, $$V(R_0|\pi_0)=V(R_1|\pi_1)$$, and the compensatory rewards are zero.

NEW DISCUSSION POSTS

This isn't too related to
 by Sam Eisenstat on Generalizing Foundations of Decision Theory II | 0 likes

I also commented there last
 by Daniel Dewey on Where's the first benign agent? | 0 likes

(I replied last weekend, but
 by Paul Christiano on Where's the first benign agent? | 0 likes

$g$ can be a fiber of $f$,
 by Alex Mennen on Formal Open Problem in Decision Theory | 0 likes

>It seems like that can be
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

I disagree. I'm arguing that
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

But this could happen even if
 by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

 by Daniel Dewey on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

I like this suggestion of a
 by Patrick LaVictoire on Proposal for an Implementable Toy Model of Informe... | 0 likes

>It may generalize
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

I don't know what you really
 by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

>“is trying its best to do
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

In practice, I'd run your
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

>that is able to give
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

> good in practice, but has
 by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"... | 0 likes