Intelligent Agent Foundations Forumsign up / log in
by Wei Dai 816 days ago | Daniel Dewey likes this | link | parent

Do you have a current best guess at an architecture that will be most amenable to us applying metaphilosophical insights to avoid value drift?

Interesting question. I guess it depends on the form that the metaphilosophical knowledge arrives in, but it’s currently hard to see what that could be. I can only think of two possibilities, but neither seem highly plausible.

  1. It comes as a set of instructions that humans (or emulations/models of humans) can use to safely and correctly process philosophical arguments, along with justifications for why those instructions are safe/correct. Kind of like a detailed design for meta-execution along with theory/evidence for why it works. But natural language is fuzzy and imprecise, humans are full of unknown security holes, so it’s hard to see how such instructions could possibly make us safe/correct, or what kind of information could possibly make us confident of that.

  2. It comes as a set of algorithms for reasoning about philosophical problems in a formal language, along with instructions/algorithms for how to translate natural language philosophical problems/arguments into this formal language, and justifications for why these are all safe/correct. But this kind of result seems very far from any of our current knowledge bases, nor does it seem compatible with any of the current trends in AI design (including things like deep learning, decision theory based ideas and Paul’s kinds of designs).

So I’m not very optimistic that a metaphilosophical approach will succeed either. If it ultimately does, it seems like maybe there will have to be some future insights whose form I can’t foresee. (Edit: Either that or a lot of time free from arms-race pressure to develop the necessary knowledge base and compatible AI design for 2.)



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms