by Ryan Carey 205 days ago | link | parent [Note: This comment is three years later than the post] The “obvious idea” here unfortunately seems not to work, because it is vulnerable to so-called “infinite improbability drives”. Suppose $$B$$ is a shutdown button, and $$P(b|e)$$ gives some weight to $$B=pressed$$ and $$B=unpressed$$. Then, the AI will benefit from selecting a Q such that it always chooses an action $$a$$, in which it enters a lottery, and if it does not win, then it the button B is pushed. In this circumstance, $$P(b|e)$$ is unchanged, while both $$P(c|b=pressed,a,e)$$ and $$P(c|b=unpressed,a,e)$$ allocate almost all of the probability to great $$C$$ outcomes. So the approach will create an AI that wants to exploit its ability to determine $$B$$.

