Intelligent Agent Foundations Forumsign up / log in
Infinite ethics comparisons
post by Stuart Armstrong 803 days ago | Abram Demski likes this | 1 comment

Work done with Amanda Askell; the errors are mine.

It’s very difficult to compare utilities across worlds with infinite populations. For instance, it seems clear that world \(w_1\) is better than \(w_2\), if the number indicate the utilities of various agents:

  • \(w_1 = 1,0,1,0,1,0,1,0,1,0, \ldots\)
  • \(w_2 = 1,0,1,0,0,1,0,0,0,1, \ldots\)

However, up to relabelling of the agents, these two worlds are actually identical. For this post, we’ll only care about countable infinities of agents, and we’ll assume that all utilities must occupy a certain finite range. This means that \(\limsup\) and \(\liminf\) of utilities in a world are defined and finite, independently of the ordering of the agents in that world. For a world \(w\), label these as \(s(w)\) and \(i(w)\).

Unambiguous gains and losses

Then compare the following worlds, where \(\{a\}_{\omega}\) means there are infinitely many agents with utility \(a\):

  • \(w_3 = \{4\}_{\omega}, \{3\}_{\omega}, \{0\}_{\omega}\)
  • \(w_4 = \{4\}_{\omega}, \{2\}_{\omega}, \{0\}_{\omega}\)

It seems that \(w_3\) is better than \(w_4\), because the middle category is higher. But this is deceptive, as we’ll see.

Let’s restrict ourselves to actions that change the utilities of agents in worlds, without creating or removing any agents.

Given any such action \(a\), call \((k,k')\) the signature of \(a\) if \(k\) and \(-k\) are the \(\limsup\) and \(\liminf\) of all utility changes caused by \(a\). If \(k'\leq 0\) and \(k>0\), we’ll call \(a\) an unambiguous gain; if \(k \leq 0\) and \(k'>0\), we’ll call \(a\) an unambiguous loss.

Then consider the action \(a\) that transforms \(w_4\) by moving all the agents at utility \(2\) to utility \(3\). This is certainly an unambiguous gain. But now consider the action \(a\) that sends all agents at utility \(3\) down to utility \(0\), and sends infinitely many agents at utility \(4\) down to utility \(3\) (while leaving infinitely many at utility \(4\)). This is certainly an unambiguous loss.

However, both actions will send \(w_4\) to \(w_3\). So it’s not clear at all which of these worlds is better than the other.

Comparing infinite worlds

Define \(m(w)\) as \((s(w)+i(w))/2\), the average of \(\limsup\) and \(\liminf\) of \(w\). Then here are five ways \(<_i\) of comparing worlds, which allow finer and finer comparisons.

  1. If \(s(v) < i(w)\), then \(v <_1 w\).
  2. If \(s(w) \geq s(v)\), \(i(w) \geq i(v)\), and one of these inequalities is strict, then \(v <_2 w\).
  3. If \(m(v) < i(w)\), then \(v <_3 w\).
  4. If \(s(v) < m(w)\), then \(v <_{3'} w\).
  5. If \(m(v) < m(w)\), then \(v <_4 w\).

All these \(<_i\) are transitive preorders, with \(<_4\) being a total preorder. The \(<_2\) is a refinement of \(<_1\) (in that if \(v<_1 w\), then \(v <_2 w\)), as are the \(<_3\) and \(<_{3'}\). The \(<_4\) is a refinement of all of \(<_2\), \(<_3\), \(<_{3'}\), but none of these three are refinements of each other.

In fact, \(<_4\) is a minimal refinement of \(<_3\) and \(<_{3'}\). To see this, assume \(v <_4 w\), hence \(m(v) < m(w)\), and introduce \(u\), a world where everyone has a single utility at a value between \(m(v)\) and \(m(w)\). Then \(v <_{3} u <_{3'} w\).

However, \(<_4\) is not a minimal refinement of \(<_2\) and \(<_3\) (or of \(<_2\) and \(<_{3'}\)). To see this, assume \(v <_3 u <_2 w\); then \(v <_3 w\), and similarly \(v <_2 u <_3 w\) implies \(v <_3 w\). Therefore the minimal refinement of \(<_2\) and \(<_3\) is simply the union of \(<_2\) and \(<_3\) (the same goes for \(<_2\) and \(<_{3'}\)).

Correspondence with actions and signatures

Those five methods of world comparisons correspond neatly to features of actions mapping between worlds. In particular:

  1. If \(v <_1 w\), then every action that changes \(v\) into \(w\) is an unambiguous gain.
  2. If \(v <_2 w\), then there is an unambiguous gain that changes \(v\) into \(w\), and no such action that is an ambiguous loss.
  3. If \(v <_3 w\), then \(i(v)<i(w)\), and if \(a\) is an action transforming \(v\) into \(w\) of signature \((k,k')\), then \(k>k'\).
  4. If \(v <_{3'} w\), then \(s(v)<s(w)\), and if \(a\) is an action transforming \(v\) into \(w\) of signature \((k,k')\), then \(k>k'\).
  5. If \(v <_4 w\), and \(a\) is an action that changes \(v\) into \(w\), of signature \((k,k')\), and \(K\) and \(K'\) are the minimums of all such \(k\) and \(k'\) for different \(a\), then \(K > K'\).

Note that \(i(v)<i(w)\) and \(s(v)<s(w)\) can be rephrased in terms of \(a\): in the first case, \(a\) must be an improvement for infinitely many of the lowest agents in \(v\), in the second case, infinitely many of the highest agents in \(w\) must have been improved by \(a\).

These results can be displayed graphically here, with the blue line being the interval \([i(w),s(w)]\) for a world \(w\). The examples for \(<_1\) and \(<_2\) are pretty clear:

Here are \(<_3\) and \(<_{3'}\):

For \(<_4\), it helps to consider separately cases where the interval \([i(v),s(v)]\) is smaller or larger than \([i(w),s(w)]\):

Finer comparisons and unbounded utilities

We may be able to get finer comparisons than \(<_4\), for instance by looking at the fine structure of the utilities around \(s(w)\) and \(i(w)\).

Now let’s allow unbounded utilities for individual agents. If the utilities are unbounded in one direction only, then all \(<_i\) are still defined, as long as we allow infinity (or minus that) to be a valid value for \(s(w)\) (or \(i(w)\)), and take the average \(m(w)\) of an infinite value and a finite value to be equal to that infinite value.

If we allow unbounded utilities in both directions, then both \(s(w)\) and \(i(w)\) can become infinite. The \(<_1\) and \(<_2\) remain well defined, but not \(<_3\), \(<_{3'}\) or \(<_4\), since \(m(w)\) is not always defined. If we arbitrarily set \(m(w)\) to some fixed value when \(s(w) = \infty\), \(i(w)=-\infty\), then we can define all the \(<_i\).



by Paul Christiano 802 days ago | Patrick LaVictoire likes this | link

It’s not clear whether we have or need to have preferences over worlds with countably infinitely many symmetric individuals.

At a minimum, it’s also worth noting that anthropic reasoning is problematic in such worlds (I guess equivalently) and . A framework which answers questions of the form “what do you expect to happen?” could also probably be used to answer ethical questions.

(This kind of exercise seems potentially worthwhile anyway. I’m not so sure whether it is particularly relevant to AI alignment, for orthogonal reasons—hoping to figure out our values in advance seems to be giving up the game, as does making the kind of structural assumptions about value that could come back to bite us if we were wrong about infinite ethics.)

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms