by Stuart Armstrong 66 days ago | link | parent but the agent incorrectly observes the action It’s a bit annoying that this has to rely on an incorrect observation. Why not replace the human action, in state $$s_2$$, with a simple automated system that chooses $$a_1^H$$? It’s an easy mistake to make while programming, and the agent has no fundamental understanding of the difference between the human and an imperfect automated system. Basically, if the human acts in perfect accordance with their preferences, and if the agent correctly observes and learns this, the agent will converge on the right values. You put wireheading by removing “correctly observes”, but I think removing “human acts in perfect accordance with their preferences” is a better example for wireheading.

### NEW DISCUSSION POSTS

A few thoughts: I agree
 by Sam Eisenstat on Some Criticisms of the Logical Induction paper | 0 likes

Thanks, so to paraphrase your
 by Wei Dai on Current thoughts on Paul Christano's research agen... | 0 likes

> Why does Paul think that
 by Paul Christiano on Current thoughts on Paul Christano's research agen... | 0 likes

Given that ALBA was not meant
 by Wei Dai on Current thoughts on Paul Christano's research agen... | 0 likes

Thank you for writing this.
 by Wei Dai on Current thoughts on Paul Christano's research agen... | 1 like

I mostly agree with this
 by Paul Christiano on Current thoughts on Paul Christano's research agen... | 2 likes

>From my perspective, I don’t
 by Johannes Treutlein on Smoking Lesion Steelman | 2 likes

 by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

 by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

 by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Yeah, you're right. This
 by Vadim Kosoy on Smoking Lesion Steelman | 1 like

The non-smoke-loving agents
 by Abram Demski on Smoking Lesion Steelman | 1 like