I’m going to formalize some ideas related to my previous post about pursuing convergent instrumental goals without good priors and prove theorems about how much power a coalition can guarantee. The upshot is that, while nonmajority coalitions can’t guarantee controlling a nonnegligible fraction of the expected power, majority coalitions can guarantee controlling a large fraction of the expected power.
**
In a unitsum game:
 there is some unknown environmental variable \(X \in \mathcal{X}\). (In my previous posts, this would be the color of the sun)
 each of \(n\) players submits an action \(A_i \in \mathcal{A}\) (we could consider different action sets for each player but it doesn’t matter)
 player \(i\) gets \(s_i(X, A_1, ..., A_n)\) shares, where \(s_i\) satisfies:
 Nonnegativity: \(s_i(X, A_1, ..., A_n) \geq 0\)
 Unit sum: \(\sum_{i=1}^n s_i(X, A_1, ..., A_n) = 1\)
A unitsum game is symmetric if, for any permutation \(p : \{1, ..., n\} \rightarrow \{1, ..., n\}\), we have \(s_i(x, a_1, ..., a_n) = s_{p(i)}(a, a_{p(1)}, ..., a_{p(n)})\).
A coalition in a unitsum game is a set of players. If \(c \subseteq \{1, ..., n\}\) is a coalition, then a policy for that coalition \(\pi : \Delta \mathcal{A}^c\) is a distribution assigning an action to each player in that coalition. We will assume that there are coalitions \(c_1, ..., c_m\) such that each player appears in exactly one coalition.
We will consider the expected amount of shares a coalition will get, based on the coalitions’ policies. Specifically, define \[r_j(x, \pi_1, ..., \pi_m) := \sum_{a_1 \in \mathcal{A}, ..., a_n \in \mathcal{A}} \left(\prod_{j'=1}^m \pi_{j'}(a_{c_{j'}}) \right) \sum_{i \in c_j} s_i(x, a_1, ..., a_n)\] where \(a_{c_{j'}} : \mathcal{A}^{c_{j'}}\) specifies the actions for the players in the coalition \(c_{j'}\). In general, my goal in proving theorems will be to guarantee a high \(r_j\) value for a coalition regardless of \(x\).
A coalition containing a majority (>50%) of the players can, in some cases, gain an arbitrarily high fraction of the shares:
Theorem 1: For any \(\epsilon > 0\) and \(n > 0\), there exists a symmetric unitsum game with \(n\) players in which any coalition controlling a majority of the players can get at least \(1\epsilon\) expected shares.
Proof: Fix \(\epsilon > 0\), \(n > 0\). Let \(k\) be such that \(1/k \leq \epsilon\). Define the set of actions \(\mathcal{A} := \{1, ..., k\}\). Define \(s_i\) to split the shares evenly among players who give the action that the majority players chose (with ties being resolved towards lower actions). The variable \(X\) is unused. Clearly, this unitsum game is symmetric.
Let \(c_j\) be a majority coalition. Consider the following policy for the coalition: select an action uniformly at random and have everyone take that action. Clearly, the action this coalition chooses will always be the majority action.
By symmetry among the different actions, any player outside the coalition has a \(1/k\) chance of choosing the majority action. Upon choosing the majority action, a player outside the coalition gets at most \(\frac{2}{n}\) shares. Since there are at most \(\frac{n}{2}\) players outside the majority coalition, in expectation they get at most \(\frac{1}{k} \leq \epsilon\) shares in total. So the majority coalition itself gets at least \(1\epsilon\) shares in expectation.
\(\square\)
As a result of theorem 1, we won’t be able to design good general strategies for nonmajority coalitions. Instead we will focus on good general strategies for majority coalitions.
Theorem 2: In a symmetric unitsum game, if a coalition \(j\) has at least a \(\frac{k1}{k}\) fraction of the players (for integer \(k > 1\)), then given the policies for the other coalitions \(\pi_{j}\), coalition \(j\) has a policy \(\pi_j\) resulting in getting at least \(\frac{k1}{k}\) expected shares regardless of \(x\), i.e. \(\forall x \in \mathcal{X}: r_j(x, \pi_1, ..., \pi_m) \geq \frac{k1}{k}\).
Proof: Without loss of generality, assume there are only 2 coalitions, \(j = 1\), and the other coalition has index 2. To define the majority’s policy \(\pi_1\), divide the coalition \(c_1\) into \(k1\) subcoalitions of \(c_2\) players each, plus leftover players (who take some arbitrary action). Each subcoalition will independently select actions for its members according to the distribution \(\pi_2\). Note that each subcoalition is “equivalent” to \(c_2\), so by symmetry of the unitsum game, each subcoalition and \(c_2\) gets the same expected number of shares (regardless of \(x\)). So the coalition \(c_2\) gets at most a \(\frac{1}{k}\) expected fraction of the shares. Conversely, \(c_1\) gets at least a \(\frac{k1}{k}\) expected fraction of the shares.
\(\square\)
Spying
One issue with the formalism is that it seems easier for a small coalition to spy on a large one than for a large coalition to spy on a small one, which makes it implausible that a large coalition can have a shared source of randomness not available to small coalitions.
However, note that the policy defined in Theorem 2 does not rely on the majority coalition having more coordination than the opposing coalition. This is because the policy factors \(c_1\) into \(k1\) independent subcoalitions whose sizes are \(c_2\), so shared sources of randomness are only needed within subcoalitions of size \(c_2\) (and this shared randomness is equivalent to the shared randomness within \(c_2\) itself).
Discussion
Theorem 2 is good news if we expect a large majority of powerful AI systems to be aligned with human values. It means that (under some assumptions) these AI systems can achieve a large expected fraction of the universe without having good priors about the random variable \(X\).
To do this, it is necessary to know something about what the other coalitions’ strategies are, such that these strategies can be copied. A major problem with this is that, in the real world, the action one should take to gain resources depends on relative facts (e.g. one’s location), whereas the actions \(\mathcal{A}\) are not contextdependent in this way. Therefore, the actions \(\mathcal{A}\) should be interpreted as “ways of turning one’s context into a resourcegathering strategy”. It is not obvious how to interpret another agent’s policy as a “way of turning their context into a resourcegathering strategy” such that it can be copied, and this seems like a useful topic for further thought.
