Intelligent Agent Foundations Forumsign up / log in
Anything you can do with n AIs, you can do with two (with directly opposed objectives)
post by Jessica Taylor 956 days ago | Patrick LaVictoire and Stuart Armstrong like this | 2 comments

Summary: For any normal-form game, it’s possible to cast the problem of finding a correlated equilibrium in this game as a 2-player zero-sum game. This seems useful because zero-sum games are easy to analyze and more resistant to collusion.


Consider the following class of games (equivalent to the class of normal-form games):

There are \(n\) players. Player \(i\)’s action set is \(\mathcal{A}_i\). After each player \(i\) chooses their action \(A_i\), the state \(X\) results from these actions (perhaps stochastically), and each player \(i\) receives utility \(U_i(X)\).

Often, we are interested in finding the Nash equilibrium of a game like this. One strategy for this is to instantiate the \(n\) players as agents. However, this could cause collusion; see here and here for previous writing on collusion. Zero-sum games seem more resistant to collusion (although maybe not 100% resistant). Additionally, 2-player zero-sum games are typically easier to reason about than \(n\)-player games.

So we might be interested in finding the Nash equilibrium using a zero-sum game. I don’t actually know how to find a mixed Nash equilibrium, so instead I’ll present a strategy for finding a correlated equilibrium (a superset of Nash equilibria which are computationally easier to find). Here’s how it works:

  1. The actor chooses actions \(A_1, ..., A_n\).
  2. The critic chooses a player index \(I\), observes the action \(A_I\), and suggests an alternative action \(A_I'\).
  3. Flip a fair coin. If it comes up heads, observe the state \(X\) that results from actions \(A_1, ..., A_n\), and give the actor utility \(U_I(X)\).
  4. If it comes up tails, observe the state \(X\) that results from actions \(A_1, ..., A_{I-1}, A_I', A_{I+1}, ..., A_n\), and give the actor utility \(-U_I(X)\).
  5. Either way, the critic’s utility is the negation of the actor’s utility.

It will be useful to use the concept of an \(\epsilon\)-correlated equilibrium. While a correlated equilibrium is where no player can gain any expected utility by strategy modification, an \(\epsilon\)-correlated equilibrium is where no player can gain more than \(\epsilon\) expected utility by strategy modification.

Note that the critic’s policies correspond to mixtures of strategy modifications; the critic can be seen as jointly picking a player \(I\) and a strategy modification \(\phi : \mathcal{A}_I \rightarrow \mathcal{A}_I\) for the player. Furthermore, the critic’s expected utility is half the expected utility gained by the corresponding player for the average strategy modification in this mixture:

\[\text{expected critic utility} = \frac{1}{2}\left(\mathbb{E}[U_I(X) | \text{tails}] - \mathbb{E}[U_I(X) | \text{heads}]\right)\]

because the critic’s expected utility is half the difference between player \(I\)’s expected utility given strategy modification (\(\text{tails}\)) and player \(I\)’s expected utility given no strategy modification (\(\text{heads}\)). Some facts result:

  1. Suppose the actor chooses \(A_1, ..., A_n\) from some joint distribution that is an \(\epsilon\)-correlated equilibrium of the original game. Then the actor’s expected utility is at least \(-\epsilon/2\) regardless of the critic’s policy.
  2. Suppose the actor chooses \(A_1, ..., A_n\) from some joint distribution that is not an \(\epsilon\)-correlated equilibrium of the original game. Then the critic’s best response results in a utility of no more than \(-\epsilon/2\) for the actor.

Correlated equilibria always exist, so at a Nash equilibrium in the zero-sum game, the actor always outputs a correlated equilibrium and gets expected utility 0.

Perhaps in real life, it is inconvenient to observe the state \(X\) resulting from actions \(A_1, ..., A_{I-1}, A_I', A_{I+1}, ..., A_n\), because we can only observe the state by outputting actions, and maybe we always want to output actions from a correlated equilibrium. In this case we could use counterfactual oversight to usually output \(A_1, ..., A_n\), but run the procedure above occasionally to gather training data. It’s not clear when it’s acceptable to occasionally output strategy-modified action profiles (instead of action profiles from a correlated equilibrium).



by Stuart Armstrong 950 days ago | link

We have to be careful that the game is really zero-sum. Some setups, with reward signals, seem zero sum but if the AI’s hack it, could become positive sum.

reply

by Jessica Taylor 950 days ago | link

This is true.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms