Intelligent Agent Foundations Forumsign up / log in
by David Krueger 562 days ago | link | parent

I really agree with #2 (and I think with #1, as well, but I’m not as sure I understand your point there).

I’ve been trying to convince people that there will be strong trade-offs between safety and performance, and have been surprised that this doesn’t seem obvious to most… but I haven’t really considered that “efficient aligned AIs almost certainly exist as points in mindspace”. In fact I’m not sure I agree 100% (basically because “Moloch” (http://slatestarcodex.com/2014/07/30/meditations-on-moloch/)).

I think “trying to find and pursue other approaches to solving the “AI risk” problem, especially ones that don’t require the same preconditions in order to succeed” remains perhaps the most important thing to do; do you have anything in particular in mind? Personally, I tend to think that we ought to address the coordination problem head-on and attempt a solution before AGI really “takes off”.



by Paul Christiano 561 days ago | link

I’ve been trying to convince people that there will be strong trade-offs between safety and performance

What do you see as the best arguments for this claim? I haven’t seen much public argument for it and am definitely interested in seeing more. I definitely grant that it’s prima facie plausible (as is the alternative).

Some caveats:

It’s obvious there are trade-offs between safety and performance in the usual sense of “safety.” But we are interested in a special kind of failure, where a failed system ends up controlling a significant share of the entire universe’s resources (rather than e.g. causing an explosion), and it’s less obvious that preventing such failures necessarily requires a significant cost.

Its also obvious that there is an additional cost to be paid in order to solve control, e.g. consider the fact that we are currently spending time on it. But the question is how much additional work needs to be done. Does building aligned systems require 1000% more work? 10%? 0.1%? I don’t see why it should obvious that this number is on the order of 100% rather than 1%.

Similarly for performance costs. I’m willing to grant that an aligned system will be more expensive to run. But is that cost an extra 1000% or an extra 0.1%? Both seem quite plausible. From a theoretical perspective the question is whether the required overhead is linear or sublinear?

I haven’t seen strong arguments for the “linear overhead” side, and my current guess is that the answer is sublinear. But again, both positions seem quite plausible.

(There are currently a few major obstructions to my approach that could plausibly give a tight theoretical argument for linear overhead, such as the translation example in the discussion with Wei Dai. In the past such obstructions have ended up seeming surmountable, but I think that it is totally plausible that eventually one won’t. And at that point I hope to be able to make clean statements about exactly what kind of thing we can’t hope to do efficiently+safely / exactly what kinds of additional assumptions we would have to make / what the key obstructions are).

Personally, I tend to think that we ought to address the coordination problem head-on

I think this is a good idea and a good project, which I would really like to see more people working on. In the past I may have seemed more dismissive and if so I apologize for being misguided. I’ve spent a little bit of time thinking about it recently and my feeling is that there is a lot of productive and promising work to do.

My current guess is that AI control is the more valuable thing for me personally to do though I could imagine being convinced out of this.

I feel that AI control is valuable given that (a) it has a reasonable chance of succeeding even if we can’t solve these coordination problems, and (b) convincing evidence that the problem is hard would be a useful input into getting the AI community to coordinate.

If you managed to get AI researchers to effectively coordinate around conditionally restricting access to AI (if it proved to be dangerous), then that would seriously undermine argument (b). I believe that a sufficiently persuasive/charismatic/accomplished person could probably do this today.

If I ended up becoming convinced that AI control was impossible this would undermine argument (a) (though hopefully that impossibility argument could itself be used to satisfy desiderata (b)).

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms