Intelligent Agent Foundations Forumsign up / log in
Humans are not agents: short vs long term
post by Stuart Armstrong 19 days ago | 2 comments

A putative new idea for AI control; index here.

This is an example of humans not being (idealised) agents.

Imagine a human who has a preference to not live beyond a hundred years. However, they want to live to next year, and it’s predictable that every year they are alive, they will have the same desire to survive till the next year.


This human (not a completely implausible example, I hope!) has a contradiction between their long and short term preferences. So which is accurate? It seems we could resolve these preferences in favour of the short term (“live forever”) or the long term (“die after a century”) preferences.

Now, at this point, maybe we could appeal to meta-preferences - what would the human themselves want, if they could choose? But often these meta-preferences are un- or under-formed, and can be influenced by how the question or debate is framed.

Specifically, suppose we are scheduling this human’s agenda. We have the choice of making them meet one of two philosophers (not meeting anyone is not an option). If they meet Professor R. T. Long, he will advise them to follow long term preferences. If instead, they meet Paul Kurtz, he will advise them to pay attention their short term preferences. Whichever one they meet, they will argue for a while and will then settle on the recommended preference resolution. And then they will not change that, whoever they meet subsequently.

Since we are doing the scheduling, we effectively control the human’s meta-preferences on this issue. What should we do? And what principles should we use to do so? We are trying to maximise human preferences, but we can also control what they are (and have to control what they are, though our choice of which philosopher they meet first).

It’s clear that this can apply to AIs: if they are simultaneously aiding humans as well as learning their preferences, they will have multiple opportunities to do this sort of preference-shaping.





NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

The AI defers to anything
by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

Thus anything that can
by Stuart Armstrong on Corrigibility thoughts II: the robot operator | 0 likes

Ah, thanks! That seems more
by Stuart Armstrong on Loebian cooperation in the tiling agents problem | 0 likes

It doesn't mean computation
by Vladimir Slepnev on Loebian cooperation in the tiling agents problem | 1 like

I'm not sure this would work,
by Stuart Armstrong on Loebian cooperation in the tiling agents problem | 0 likes

>How can the short term
by Stuart Armstrong on Humans are not agents: short vs long term | 0 likes

I expect a workable approach
by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

Not sure what your argument
by Stuart Armstrong on Corrigibility thoughts II: the robot operator | 0 likes

It is ‘a preference for
by Stuart Armstrong on Humans are not agents: short vs long term | 0 likes

Note that we don't need to
by Paul Christiano on ALBA requires incremental design of good long-term... | 0 likes

If I want my boat to travel
by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

I don't think it's much like
by Abram Demski on An Approach to Logically Updateless Decisions | 0 likes

Yeah, I like tail dependence.
by Sam Eisenstat on An Approach to Logically Updateless Decisions | 0 likes

This is basically the
by Paul Christiano on Cooperative Oracles: Stratified Pareto Optima and ... | 1 like

I think AsDT has a limited
by Abram Demski on An Approach to Logically Updateless Decisions | 2 likes

RSS

Privacy & Terms