Intelligent Agent Foundations Forumsign up / log in
Our values are underdefined, changeable, and manipulable
discussion post by Stuart Armstrong 22 days ago | discuss

A putative new idea for AI control; index here.

When asked whether “communist” journalists could report freely from the USA, only 36% of 1950 Americans agreed. A follow up question about Amerian journalists reporting freely from the USSR got 66% agreement. When the order of the questions was reversed, 90% were in favour of American journalists - and an astounding 73% in favour of the communist ones.

There are many examples of survey responses depending on question order, or subtle issues of phrasing.

So there are people whose answers depended on question order. What then are the “true” values of these individuals?

Underdetermined values

I think the best way of characterising their values is to call them “underdetermined”. There were/are presumably some people for which universal freedom of the press or strict national security were firm and established values. But for most, there were presumably some soft versions of freedom of the press and nationalism, and the first question triggered one narrative more strongly than the other. What then, are their “real” values? That’s the wrong question - akin to asking if Argentina really won the 1986 world cup.

Politicians can change the opinions of a large sector of the voting public with a single pronouncement - were the people’s real opinions the ones before, or the ones after? Again, this seems to be the wrong question. But don’t people fret about this inconsistency? I’d wager that they aren’t really aware of this, because people are the most changeable on issues they’ve given the least thought to.

And rationalists and EAs are not immune to this - we presumably don’t shift much on what we identify as our core values, but on less important values, we’re probably as changeable as anyone. But such contingent values can become very strong if attacked, thus becoming a core part of our identity - even if it’s very plausible we could have held the opposite position in a world slightly different.

Frameworks and moral updating

People often rely on a small number of moral frameworks and principles to guide them. When a new moral issue arises, we generally try and fit it into a moral framework - and when there are multiple ones that could fit, we can go in multiple directions, driven by mood, bias, tribalism, and many other contingent factors.

The moral frameworks themselves can and do shift, due to issues like tribalism, cognitive dissonance, life experience, and our own self-analysis. Or the frameworks can accumulate so many exceptions or refinements, that they transform in practice if not in name - it’s very interesting that my leftist opinions agree with Anders Sandberg’s libertarian opinions on most important issues. We seem to have changed positions without changing labels.

Metaethics

In a sense, you could see all of metaethics as the refinement and analysis of these frameworks. There are urges towards simplicity, to get a more stable and elegant system, and towards complexity, to capture the full spectrum of human values. Much of philosophical disagreement can be seen as “Given A, proposition B (generally acceptable conclusion) implies C (controversial position I endorse)”, to which the response is “C is wrong, thus A (or B) is wrong as stated and needs to be refined or denied” - the logic is generally accepted, but which position is kept varies.

Since ethical disagreements are rarely resolved, it’s likely that the positions of professional philosophers, though more consistent, are also often driven by contingent and random factors. The process is not completely random - ethical ideas that are the least coherent, like the moral foundation of purity, tend to get discarded - but is certainly contingent. As before, I argue you should focus on the procedure P by which philosophers update their opinions, rather than the (hypothetical) R to which P may be supposed to converge to.

Most people, however, will not have consistent meta-ethics, as they haven’t considered these questions. So their meta-opinions there will be even more subject to random influences that their base-level opinions.

Future preferences

There is an urgent question dividing the future world: should local FLOOBS be allowed to restrict use of BLARGS, or instead ORFOILS should pressure COLATS to agree to FLAPPLE the SNARFS.

Ok, we don’t currently know what future political issues will be, but it’s clear there will be new issues (how do we know this? Because nobody cares today whether Richard Lionheart and Phillip August of France lacked in their feudal duties to each other, nor did the people of that period worry much about medical tort reform). And people will take positions on them, and they will be incorporated into moral frameworks, causing those frameworks to change, and eventually philosophers may incorporate enough change into new metaethical frameworks.

I think it’s fair to say that our current positions on these future issues are even more under-determined than most of our values.

Contingent means manipulable

If our future values are determined by contingent facts, then a sufficiently powerful and intelligent agent can manipulate our values, by manipulating those facts. However, without some sort of learning-processes-with-contingent-facts, our values are underdetermined, and hence an agent that wanted to maximise human values/reward wouldn’t know what to do.

It was this realisation, that the agent could manipulate the values it was supposed to maximise, that caused me to look at ways of avoiding this.

Choices need to be made

We want a safe way to resolve the under-determination in human values, a task that gets more and more difficult as we move away from the usual world of today and into the hypothetical world that a superpowered AI could build.

But, precisely because of the under-determination, there are doing to be multiple ways of resolving this safely. Which means that choices will need to be made as to how to do so. The process of making human values fully rigorous, is not value-free.

(A minor example, that illustrated for me a tiny part of the challenge: does the way we behave when we’re drunk reveal our true values? And the answer: do you want it to? If there is a divergence in drunk and sober values, then accommodating drunk values is a decision - one that will likely be made sober.)



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
by Stuart Armstrong on Predictable Exploration | 0 likes

Thinking about this more, I
by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
by Abram Demski on Predictable Exploration | 0 likes

RSS

Privacy & Terms