Intelligent Agent Foundations Forumsign up / log in
Indifference and compensatory rewards
discussion post by Stuart Armstrong 157 days ago | discuss

A putative new idea for AI control; index here.

It’s occurred to me that there is a framework where we can see all “indifference” results as corrective rewards, both for the utility function change indifference and for the policy change indifference.

Imagine that the agent has reward \(R_0\) and is following policy \(\pi_0\), and we want to change it to having reward \(R_1\) and following policy \(\pi_1\).

Then the corrective reward we need to pay it, so that it doesn’t attempt to resist or cause that change, is simply the difference between the two expected values:

  • \(V(R_0|\pi_0)-V(R_1|\pi_1)\),

where \(V\) is the agent’s own valuation of the expected reward, conditional on the policy.

This explains why off-policy reward-based agents are already safely interruptible: since we change the policy, not the reward, \(R_0=R_1\). And since off-policy agents have value estimates that are indifferent to the policy followed, \(V(R_0|\pi_0)=V(R_1|\pi_1)\), and the compensatory rewards are zero.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

A few thoughts: I agree
by Sam Eisenstat on Some Criticisms of the Logical Induction paper | 0 likes

Thanks, so to paraphrase your
by Wei Dai on Current thoughts on Paul Christano's research agen... | 0 likes

> Why does Paul think that
by Paul Christiano on Current thoughts on Paul Christano's research agen... | 0 likes

Given that ALBA was not meant
by Wei Dai on Current thoughts on Paul Christano's research agen... | 0 likes

Thank you for writing this.
by Wei Dai on Current thoughts on Paul Christano's research agen... | 1 like

I mostly agree with this
by Paul Christiano on Current thoughts on Paul Christano's research agen... | 2 likes

>From my perspective, I don’t
by Johannes Treutlein on Smoking Lesion Steelman | 2 likes

Replying to Rob. I don't
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Replying to Rob. Actually,
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Replying to 240 (I can't
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Yeah, you're right. This
by Vadim Kosoy on Smoking Lesion Steelman | 1 like

The non-smoke-loving agents
by Abram Demski on Smoking Lesion Steelman | 1 like

Replying to "240" First,
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Clarification: I'm not the
by Tarn Somervell Fletcher on Some Criticisms of the Logical Induction paper | 0 likes

Alex, the difference between
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 1 like

RSS

Privacy & Terms