Intelligent Agent Foundations Forumsign up / log in
by Stuart Armstrong 401 days ago | link | parent

but the agent incorrectly observes the action

It’s a bit annoying that this has to rely on an incorrect observation. Why not replace the human action, in state \(s_2\), with a simple automated system that chooses \(a_1^H\)? It’s an easy mistake to make while programming, and the agent has no fundamental understanding of the difference between the human and an imperfect automated system.

Basically, if the human acts in perfect accordance with their preferences, and if the agent correctly observes and learns this, the agent will converge on the right values. You put wireheading by removing “correctly observes”, but I think removing “human acts in perfect accordance with their preferences” is a better example for wireheading.



by Tom Everitt 318 days ago | link

Adversarial examples for neural networks make situations where the agent misinterprets the human action seem plausible.

But it is true that the situation where the human acts irrationally in some state (e.g. because of drugs, propaganda) could be modeled in much the same way.

I preferred the sensory error since it doesn’t require a irrational human. Perhaps I should have been clearer that I’m interested in the agent wireheading itself (in some sense) rather than wireheading of the human.

(Sorry for being slow to reply – I didn’t get notified about the comments.)

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

I found an improved version
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

RSS

Privacy & Terms