Intelligent Agent Foundations Forumsign up / log in
by Paul Christiano 915 days ago | link | parent

Prior to working more on decision theory or thin priors, it seems worth clearly fleshing out the desiderata that a naive account (say a task/act-based AI that uses CDT) fails to satisfy.

You say: “[Son of X] is very opaque. There is an extra level of indirection, where you can’t just directly reason about what agent X will do. Instead have to reason about what agent X will modify into, which gives you a new agent, which you probably understand much less than you understand agent X, and then you have to reason about what that new agent will do. Second, it is unmotivated. If you had a good reason to like Son of X, you would probably not be calling it Son of X”

But if we trust X’s decisions, then shouldn’t we trust its decision to replace itself with Son of X? And if we don’t trust X’s decisions, aren’t we in trouble anyway? Why do we need to reason about Son of X, any more than we need to reason about the other ways in which the agent will self-modify, or even other decisions the agent will make?

I agree this makes thinking about Son of X unsatisfying, we should just be thinking about X. But I don’t see why this makes building X problematic. I’m not sure if you are claiming it does, but other people seem to believe something like that, and I thought I’d respond here.

I agree that there is a certain perspective on which this situation is unsatisfying. And using a suboptimal decision theory / too thick a logical prior in the short term will certainly will involve some cost (this is a special case of my ongoing debate with Wei Dai about the urgency of philosophical problems). But it seems to me that it is way less pressing than other possible problems—e.g. how to do aligned search or aligned induction in the regime with limited unlabelled data.

These other problems: (a) seem like they kill us by default in a much stronger sense, (b) seem necessary on both MIRI agendas as well as my agenda, and I suspect are generally going to be necessary, (c) seem pretty approachable, (d) don’t really seem to be made easier by progress on decision theory, logical induction, on the construction of a thin prior, etc.

(Actually I can see how a good thin prior would help with the induction problem, but it leads to a quite different perspective and a different set of desiderata.)

On the flip side, I think there is a reasonably good chance that problems like decision theory will be obsoleted by a better understanding of how to build task/act-based AI (and I feel like for the most part people haven’t convincingly engaged with those arguments).



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms