Intelligent Agent Foundations Forumsign up / log in
New circumstances, new values?
discussion post by Stuart Armstrong 22 days ago | discuss

A putative new idea for AI control; index here.

Quick, is there anything wrong with a ten minute pleasant low-intensity conversation with someone we happen to disagree with?

Our moral intuitions say no, as do our legal system and most philosophical or political ideals since the enlightenment.

Quick, is there anything wrong with brainwashing people into perfectly obedient sheep, willing and eager to take any orders and betray all their previous ideals?

There’s a bit more disagreement there, but that generally is seen as a bad thing.

But what happens when the low-intensity conversation and the brainwashing are the same thing? At the moment, no human can overwhelm most other humans in the course of ten minutes talking, and rewrite their goals into anything else. But an AI may well be capable of doing so - people have certainly fallen in love within less than ten minutes, and we don’t know how “hard” this is to pull off, in some absolute sense.

This is a warning that relying on revealed and stated preferences or meta-preferences won’t be enough. Our revealed and (most) stated preferences are that the ten minute conversation is probably ok. But disentangling how much of that “ok” relies on our understanding the consequences will be a challenge.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

The AI defers to anything
by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

Thus anything that can
by Stuart Armstrong on Corrigibility thoughts II: the robot operator | 0 likes

Ah, thanks! That seems more
by Stuart Armstrong on Loebian cooperation in the tiling agents problem | 0 likes

It doesn't mean computation
by Vladimir Slepnev on Loebian cooperation in the tiling agents problem | 1 like

I'm not sure this would work,
by Stuart Armstrong on Loebian cooperation in the tiling agents problem | 0 likes

>How can the short term
by Stuart Armstrong on Humans are not agents: short vs long term | 0 likes

I expect a workable approach
by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

Not sure what your argument
by Stuart Armstrong on Corrigibility thoughts II: the robot operator | 0 likes

It is ‘a preference for
by Stuart Armstrong on Humans are not agents: short vs long term | 0 likes

Note that we don't need to
by Paul Christiano on ALBA requires incremental design of good long-term... | 0 likes

If I want my boat to travel
by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

I don't think it's much like
by Abram Demski on An Approach to Logically Updateless Decisions | 0 likes

Yeah, I like tail dependence.
by Sam Eisenstat on An Approach to Logically Updateless Decisions | 0 likes

This is basically the
by Paul Christiano on Cooperative Oracles: Stratified Pareto Optima and ... | 1 like

I think AsDT has a limited
by Abram Demski on An Approach to Logically Updateless Decisions | 2 likes

RSS

Privacy & Terms