Strategies for coalitions in unit-sum games
post by Jessica Taylor 761 days ago | Patrick LaVictoire and Stuart Armstrong like this | 3 comments

I’m going to formalize some ideas related to my previous post about pursuing convergent instrumental goals without good priors and prove theorems about how much power a coalition can guarantee. The upshot is that, while non-majority coalitions can’t guarantee controlling a non-negligible fraction of the expected power, majority coalitions can guarantee controlling a large fraction of the expected power.

**

In a unit-sum game:

• there is some unknown environmental variable $$X \in \mathcal{X}$$. (In my previous posts, this would be the color of the sun)
• each of $$n$$ players submits an action $$A_i \in \mathcal{A}$$ (we could consider different action sets for each player but it doesn’t matter)
• player $$i$$ gets $$s_i(X, A_1, ..., A_n)$$ shares, where $$s_i$$ satisfies:
1. Non-negativity: $$s_i(X, A_1, ..., A_n) \geq 0$$
2. Unit sum: $$\sum_{i=1}^n s_i(X, A_1, ..., A_n) = 1$$

A unit-sum game is symmetric if, for any permutation $$p : \{1, ..., n\} \rightarrow \{1, ..., n\}$$, we have $$s_i(x, a_1, ..., a_n) = s_{p(i)}(a, a_{p(1)}, ..., a_{p(n)})$$.

A coalition in a unit-sum game is a set of players. If $$c \subseteq \{1, ..., n\}$$ is a coalition, then a policy for that coalition $$\pi : \Delta \mathcal{A}^c$$ is a distribution assigning an action to each player in that coalition. We will assume that there are coalitions $$c_1, ..., c_m$$ such that each player appears in exactly one coalition.

We will consider the expected amount of shares a coalition will get, based on the coalitions’ policies. Specifically, define $r_j(x, \pi_1, ..., \pi_m) := \sum_{a_1 \in \mathcal{A}, ..., a_n \in \mathcal{A}} \left(\prod_{j'=1}^m \pi_{j'}(a_{c_{j'}}) \right) \sum_{i \in c_j} s_i(x, a_1, ..., a_n)$ where $$a_{c_{j'}} : \mathcal{A}^{c_{j'}}$$ specifies the actions for the players in the coalition $$c_{j'}$$. In general, my goal in proving theorems will be to guarantee a high $$r_j$$ value for a coalition regardless of $$x$$.

A coalition containing a majority (>50%) of the players can, in some cases, gain an arbitrarily high fraction of the shares:

Theorem 1: For any $$\epsilon > 0$$ and $$n > 0$$, there exists a symmetric unit-sum game with $$n$$ players in which any coalition controlling a majority of the players can get at least $$1-\epsilon$$ expected shares.

Proof: Fix $$\epsilon > 0$$, $$n > 0$$. Let $$k$$ be such that $$1/k \leq \epsilon$$. Define the set of actions $$\mathcal{A} := \{1, ..., k\}$$. Define $$s_i$$ to split the shares evenly among players who give the action that the majority players chose (with ties being resolved towards lower actions). The variable $$X$$ is unused. Clearly, this unit-sum game is symmetric.

Let $$c_j$$ be a majority coalition. Consider the following policy for the coalition: select an action uniformly at random and have everyone take that action. Clearly, the action this coalition chooses will always be the majority action.

By symmetry among the different actions, any player outside the coalition has a $$1/k$$ chance of choosing the majority action. Upon choosing the majority action, a player outside the coalition gets at most $$\frac{2}{n}$$ shares. Since there are at most $$\frac{n}{2}$$ players outside the majority coalition, in expectation they get at most $$\frac{1}{k} \leq \epsilon$$ shares in total. So the majority coalition itself gets at least $$1-\epsilon$$ shares in expectation.

$$\square$$

As a result of theorem 1, we won’t be able to design good general strategies for non-majority coalitions. Instead we will focus on good general strategies for majority coalitions.

Theorem 2: In a symmetric unit-sum game, if a coalition $$j$$ has at least a $$\frac{k-1}{k}$$ fraction of the players (for integer $$k > 1$$), then given the policies for the other coalitions $$\pi_{-j}$$, coalition $$j$$ has a policy $$\pi_j$$ resulting in getting at least $$\frac{k-1}{k}$$ expected shares regardless of $$x$$, i.e. $$\forall x \in \mathcal{X}: r_j(x, \pi_1, ..., \pi_m) \geq \frac{k-1}{k}$$.

Proof: Without loss of generality, assume there are only 2 coalitions, $$j = 1$$, and the other coalition has index 2. To define the majority’s policy $$\pi_1$$, divide the coalition $$c_1$$ into $$k-1$$ sub-coalitions of $$|c_2|$$ players each, plus leftover players (who take some arbitrary action). Each sub-coalition will independently select actions for its members according to the distribution $$\pi_2$$. Note that each sub-coalition is “equivalent” to $$c_2$$, so by symmetry of the unit-sum game, each sub-coalition and $$c_2$$ gets the same expected number of shares (regardless of $$x$$). So the coalition $$c_2$$ gets at most a $$\frac{1}{k}$$ expected fraction of the shares. Conversely, $$c_1$$ gets at least a $$\frac{k-1}{k}$$ expected fraction of the shares.

$$\square$$

## Spying

One issue with the formalism is that it seems easier for a small coalition to spy on a large one than for a large coalition to spy on a small one, which makes it implausible that a large coalition can have a shared source of randomness not available to small coalitions.

However, note that the policy defined in Theorem 2 does not rely on the majority coalition having more coordination than the opposing coalition. This is because the policy factors $$c_1$$ into $$k-1$$ independent subcoalitions whose sizes are $$|c_2|$$, so shared sources of randomness are only needed within subcoalitions of size $$|c_2|$$ (and this shared randomness is equivalent to the shared randomness within $$c_2$$ itself).

## Discussion

Theorem 2 is good news if we expect a large majority of powerful AI systems to be aligned with human values. It means that (under some assumptions) these AI systems can achieve a large expected fraction of the universe without having good priors about the random variable $$X$$.

To do this, it is necessary to know something about what the other coalitions’ strategies are, such that these strategies can be copied. A major problem with this is that, in the real world, the action one should take to gain resources depends on relative facts (e.g. one’s location), whereas the actions $$\mathcal{A}$$ are not context-dependent in this way. Therefore, the actions $$\mathcal{A}$$ should be interpreted as “ways of turning one’s context into a resource-gathering strategy”. It is not obvious how to interpret another agent’s policy as a “way of turning their context into a resource-gathering strategy” such that it can be copied, and this seems like a useful topic for further thought.

 by Stuart Armstrong 759 days ago | link Interesting. But theorem 2 may say less than it seems. If you subtract $$1/n$$ from every player, you get a zero-sum game, and then theorem 2 seems to reduce to saying that a majority coalition can always expect to not lose in a symmetric zero-sum game. reply
 by Jessica Taylor 758 days ago | link I agree that Theorem 2 only says that the majority coalition expects to get a fraction of the universe proportional to its size, and does not say they get more. This fact is unsurprising. reply
 by Stuart Armstrong 757 days ago | Ryan Carey, Jessica Taylor and Patrick LaVictoire like this | link Actually, I’m wrong, it is possible for a majority coalition to take a loss in a zero-sum game: http://lesswrong.com/r/discussion/lw/oj4/a_majority_coalition_can_lose_a_symmetric_zerosum/ A consequence of that is that your theorem 2 is sharp. You can’t guarantee more than what you stated. In particular, there exists games with coalitions arbitrarily close to $$n(2/3)$$ that can’t get more than $$1/2$$ of the value. reply

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes