Intelligent Agent Foundations Forumsign up / log in
The universal prior is malign
link by Paul Christiano 234 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments


by Paul Christiano 224 days ago | link

I’m curious about the extent to which people:

  • agree with this argument,
  • expect to find a form of induction to avoid this problem (e.g. by incorporating the anthropic update),
  • expect to completely avoid anything like the universal prior (e.g. via UDT)

reply

by Vadim Kosoy 209 days ago | link

I think that the problem is worse than what you believe. You seem to think it only applies to exotic AI designs that “depend on the universal prior,” but I think this problem naturally arises in most realistic AI designs.

Any realistic AI has to be able to effectively model its environment, even though the environment is much more complex than the AI itself and cannot be emulated directly inside the AI. This means that the AI will make the sort of predictions that would result from a process that “reasons abstractly about the universal prior.” Indeed, if there is a compelling reason to believe that an alien superintelligence Mu has strong incentives to simulate me, then it seems rational for me to believe that, with high probability, I am inside Mu’s simulation. In these conditions it seems that any rational agent (including a relatively rational human) would make decisions as if its assigns high probability to being inside Mu’s simulation.

I don’t see how UDT solves the problem. Yes, if I already know my utility function, then UDT tells me that, if many copies of me are inside Mu’s simulation, I should still behave as if I am outside the simulation, since the copies outside the simulation have much more influence on the universe. We don’t even need fully fledged UDT for that. As long as the simulation hypotheses have much lower utility variance than normal hypotheses, normal hypotheses will win despite lower probability. The problem is that the AI doesn’t a priori know the correct utility function, and whatever process it uses to discover that function is going to be attacked by Mu. For example, if the AI is doing IRL, Mu will “convince” the AI that what looks like a human is actually a “muman”, something that only pretends to be human in only to take over the IRL process, whereas its true values are Mu-ish.

reply

by Paul Christiano 207 days ago | link

Re: UDT solving the problem, I agree with what you say. UDT fixes some possible problems, but something like the universal prior still plays a role in all credible proposals for recovering a utility function.

reply

by Paul Christiano 207 days ago | link

I agree that for now, this problem is likely to be a deal-breaker for any attempt to formally analyze any AI.

We may disagree about the severity of the problem or how likely it is to disappear once we have a deeper understanding. But we probably both agree that it is a pain point for current theory, so it’s not clear our disagreements are action-relevant.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

A few thoughts: I agree
by Sam Eisenstat on Some Criticisms of the Logical Induction paper | 0 likes

Thanks, so to paraphrase your
by Wei Dai on Current thoughts on Paul Christano's research agen... | 0 likes

> Why does Paul think that
by Paul Christiano on Current thoughts on Paul Christano's research agen... | 0 likes

Given that ALBA was not meant
by Wei Dai on Current thoughts on Paul Christano's research agen... | 0 likes

Thank you for writing this.
by Wei Dai on Current thoughts on Paul Christano's research agen... | 1 like

I mostly agree with this
by Paul Christiano on Current thoughts on Paul Christano's research agen... | 2 likes

>From my perspective, I don’t
by Johannes Treutlein on Smoking Lesion Steelman | 2 likes

Replying to Rob. I don't
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Replying to Rob. Actually,
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Replying to 240 (I can't
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Yeah, you're right. This
by Vadim Kosoy on Smoking Lesion Steelman | 1 like

The non-smoke-loving agents
by Abram Demski on Smoking Lesion Steelman | 1 like

Replying to "240" First,
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 0 likes

Clarification: I'm not the
by Tarn Somervell Fletcher on Some Criticisms of the Logical Induction paper | 0 likes

Alex, the difference between
by Vadim Kosoy on Some Criticisms of the Logical Induction paper | 1 like

RSS

Privacy & Terms