Intelligent Agent Foundations Forumsign up / log in
by Paul Christiano 338 days ago | link | parent

If such a recipe existed for a project, that doesn’t mean it’s not problematic. For example, suppose there are projects A and B, each of which had such a recipe. It’s still possible that using intermediate products of both A and B, one could build a more efficient unaligned AI but not a competitive aligned AI

My hope is to get technique A working, then get technique B working, and then get A+B working, and so on, prioritizing based on empirical guesses about what combinations will end up being deployed in practice (and hoping to develop general understanding that can be applied across many programs and combinations). I expect that in many cases, if you can handle A and B you can handle A+B, though some interactions will certainly introduce new problems. This program doesn’t have much chance of success without new abstractions.

indirectly by allowing adversarial messages to reach the human from other AIs, it clearly can’t be considered benign

This is definitely benign on my accounting. There is a further question of how well you do in conflict. A window is benign but won’t protect you from inputs that will drive you crazy. The hope is that if you have an AI that is benign + powerful than you may be OK.

directly through its own actions

If the agent is trying to implement deliberation in accordance with the user’s preferences about deliberation, then I want to call that benign. There is a further question of whether we mess up deliberation, which could happen with or without AI. We would like to set things up in such a way that we aren’t forced to deliberate earlier than we would otherwise want to. (And this included in the user’s preferences about deliberation, i.e. a benign AI will be trying to secure for the user the option of deliberating later, if the user believes that deliberating later is better than deliberating in concert with the AI now.)

Malign just means “actively optimizing for something bad,” the hope is to avoid that, but this doesn’t rule out other kinds of problems (e.g. causing deliberation to go badly due to insufficient competence, blowing up the world due to insufficient competence, etc.)

Overall, my current best guess is that this disagreement is better to pursue after my research program is further along, we know things like whether “benign” makes sense as an abstraction, I have considered some cases where benign agents necessarily seem to be less efficient, and so on.

I am still interested in arguments that might (a) convince me to not work on this program, e.g. because I should be working on alternative social solutions, or (b) convince others to work on this program, e.g. because they currently don’t see how it could succeed but might work on it if they did, or (c) which clarify the key obstructions for this research program.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Without reading closely, this
by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

RSS

Privacy & Terms