by Stuart Armstrong 638 days ago | link | parent Wireheading the human is the ultimate goal of the AI. I chose heroin as the first step along those lines, but that’s where the human would ultimately end at. For instance, once the human’s on heroin, the AI could ask it “is your true reward function $$r$$? If you answer yes, you’ll get heroin.” Under the assumption that the human is rational and the heroin offered is short term, this allows the AI to conclude the human’s reward function is any given $$r$$.

 by Jessica Taylor 638 days ago | link I strongly predict that if you make your argument really precise (as you did in the main post), it will have a visible flaw in it. In particular, I expect the fact that r and r-1000 are indistinguishable to prevent the argument from going through (though it’s hard to say exactly how this applies without having access to a sufficiently mathematical argument). reply

### NEW DISCUSSION POSTS

I found an improved version
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
 by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
 by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
 by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
 by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
 by Alex Appel on Smoking Lesion Steelman | 1 like

 by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
 by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
 by Jessica Taylor on Musings on Exploration | 0 likes

 by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
 by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
 by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes