Intelligent Agent Foundations Forumsign up / log in
A note on misunderstanding the boundaries of models
discussion post by Stuart Armstrong 547 days ago | discuss

This is a reminder for me as much as for anyone. I’m often posting a bunch of ideas and impossibility results, but it’s important to remember that those results only apply within their models. And the models likely don’t match up exactly with either reality or what we want/need.

For instance, we have the various no-free-lunch theorems (any two optimization algorithms are equivalent when their performance is averaged across all possible problems), or Rice’s theorem (you can’t figure out non-trivial semantic properties of every program).

These fail when we note that we have strong evidence that we live in a very specific environment and have to solve only a small class of problems; and that most programs we meet are (or can be) designed to be clear and understandable, rather than being selected at random.

Conversely, we can have results on how to use Oracles safely. However, these results fall apart when you consider issues of acausal trade. The initial results still hold – the design seems safe if multiple iterations of the AI don’t care about each other’s reward. However, the acausal trade example demonstrates that that assumption can fail in ways we didn’t expect.

Both of these cases stem from misunderstanding the boundaries of the model. For the no-free-lunch theorem, we might be informally thinking “of course we want the AI to be good at things in general”, without realising that this doesn’t perfectly match up to “good performance across every environment”. Because “every environment” includes weird, pathological, and highly random environments. Similarly, “don’t have the AIs care about each other’s reward” seems like something that could be achieved with simple programming and boxing approaches, but what we achieve is “AIs that don’t care about each other’s rewards in the conventional human understanding of these terms”.

The boundary of the model did not lie where we thought it did.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms