by Ryan Carey 205 days ago | link | parent [Note: This comment is three years later than the post] The “obvious idea” here unfortunately seems not to work, because it is vulnerable to so-called “infinite improbability drives”. Suppose $$B$$ is a shutdown button, and $$P(b|e)$$ gives some weight to $$B=pressed$$ and $$B=unpressed$$. Then, the AI will benefit from selecting a Q such that it always chooses an action $$a$$, in which it enters a lottery, and if it does not win, then it the button B is pushed. In this circumstance, $$P(b|e)$$ is unchanged, while both $$P(c|b=pressed,a,e)$$ and $$P(c|b=unpressed,a,e)$$ allocate almost all of the probability to great $$C$$ outcomes. So the approach will create an AI that wants to exploit its ability to determine $$B$$.

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes