Intelligent Agent Foundations Forumsign up / log in
by Vladimir Nesov 657 days ago | link | parent

It seems to me like for the people to get stuck you have to actually imagine there is some particular level they reach where they can’t find any further way to self-improve.

For philosophy, levels of ability are not comparable, because problems to be solved are not sufficiently formulated. Approximate one-day humans (as in HCH) will formulate different values from accurate ten-years humans, not just be worse at elucidating them. So perhaps you could re-implement cognition starting from approximate one-day humans, but values of the resulting process won’t be like mine.

Approximate short-lived humans may be useful for building a task AI that lets accurate long-lived humans (ems) to work on the problem, but it must also allow them to make decisions, it can’t be trusted to prevent what it considers to be a mistake, and so it can’t guard the world from AI risk posed by the long-lived humans, because they are necessary for formulating values. The risk is that “getting our house in order” outlaws philosophical progress, prevents changing things based on considerations that the risk-prevention sovereign doesn’t accept. So the scope of the “house” that is being kept in order must be limited, there should be people working on alignment who are not constrained.

I agree that working on philosophy seems hopeless/inefficient at this point, but that doesn’t resolve the issue, it just makes it necessary to reduce the problem to setting up a very long term alignment research project (initially) performed by accurate long-lived humans, guarded from current risks, so that this project can do the work on philosophy. If this step is not in the design, very important things could be lost, things we currently don’t even suspect. Messy Task AI could be part of setting up the environment for making it happen (like enforcing absence of AI or nanotech outside the research project). Your writing gives me hope that this is indeed possible. Perhaps this is sufficient to survive long enough to be able to run a principled sovereign capable of enacting values eventually computed by an alignment research project (encoded in its goal), even where the research project comes up with philosophical considerations that the designers of the sovereign didn’t see (as in this comment). Perhaps this task AI can make the first long-lived accurate uploads, using approximate short-lived human predictions as initial building blocks. Even avoiding the interim sovereign altogether is potentially an option, if the task AI is good enough at protecting the alignment research project from the world, although that comes with astronomical opportunity costs.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms