Cooperative Oracles: Nonexploited Bargaining post by Scott Garrabrant 194 days ago | Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.) Consider a game between $$n$$ players, $$P_1,\ldots,P_n$$. We denote $$A_i$$ as the finite set of actions available to player $$P_i$$, and $$\Delta_i$$ as the set of distributions on $$A_i$$. Let $$R$$ denote $$\Delta_1\times\ldots\times\Delta_n$$, the set of possible outcomes of this game, and given an $$r\in R$$, let $$r_i\in\Delta_i$$ be the $$i$$th component. Each player $$P_i$$ will be equipped with a “fair set” $$F_i\subseteq R$$ of outcomes in which they feel that they are not being exploited, as well as a utility function, a continuous function $$U_i:R\rightarrow[0,1]$$. Note that the use of the word fair here is a little weird. We will think of agents as identifying an outcome as fair if they do not feel that they have been exploited. They may find it perfectly fair to exploit others. $$F_i$$ cannot be an arbitrary set. There must be a set valued function $$f_i:R\rightarrow\mathscr{P}(\Delta_i)$$ with closed graph and $$f_i(r)$$ nonempty and convex for all $$r\in R$$, such that $$F_i$$ is exactly the set of $$r$$ such that $$r_i\in f_i(r)$$. i.e. $$F_i$$ should be the set of points for which a given Kakutani map fixes the $$\Delta_i$$ coordinate. This condition may seem arbitrary, but it is both exactly what we need for our bargaining procedure to work, and exactly what we would expect to get out of agents that have access to everyone’s actions with something like a reflective oracle. This condition in part implies that for every collection of distributions on actions that all the other players take, there must be some distribution on actions that you can take in which you consider yourself nonexploited. The restriction is slightly stronger that that, requiring a type of continuity in the way that your nonexploited responses are a function of the other players’ actions. Our Bargaining procedure is now simple. We take the set $$F$$ of fair points to be the intersection over all $$i$$ of $$F_i$$, and we choose an $$r\in F$$ which is Pareto optimal in $$F$$ (i.e. there is no other $$r^\prime\in F$$ with $$U_i(r^\prime)\geq U_i(r)$$ for all $$i$$ and $$U_i(r^\prime)> U_i(r)$$ for some $$i$$). There may be more than one Pareto Optimum. If so, we choose one arbitrarily. This procedure is not obviously well defined. In particular, it is not obvious that $$F$$ is nonempty, or that given that it is nonempty that it contains Pareto optimum. Claim: $$F$$ is nonempty and contains a Pareto optimum (over $$F$$). Proof: First, we show that $$F$$ is compact and nonempty. Since each $$f_i$$ is Kakutani, the product of them, $$r\mapsto f_1(r),\ldots,f_n(r)$$ is also Kakutani, and thus must have a nonempty compact set of fixed points. This set is exactly the intersection of the $$N_i$$. Now, since $$F$$ is compact, $$U_1$$ must achieve a maximum value on some nonempty compact set. Let $$S_1$$ be the subset of $$F$$ on which $$U_1$$ is maximized. Define $$S_{i}$$ similarly as the subset of $$S_{i-1}$$ on which $$U_i$$ is maximized. $$S_n$$ is nonempty, and every point in $$P_n$$ must be Pareto optimal in $$F$$. $$\square$$ Now, lets go through an couple examples. Example 1: Two players $$P_1$$ and $$P_2$$ are in a prisoner’s dilemma $$\Delta_1=\Delta_2=[0,1]$$ meaning the probability that the player cooperates. The utility functions are $$U_1(p,q)=2q-p$$ and $$U_2(p,q)=2p-q$$. Both players consider themselves to be exploited if the other player gets more utility than they do. Thus we have $$F_1=\{(p,q)\in R|p\leq q\}$$ and $$F_2=\{(p,q)\in R|p\geq q\}$$. We need to verify that these fair sets satisfy the condition of being the fixed points of a Kakutani map. Indeed, consider $$f_1(p,q)=\min(p,q)$$. $$f_1$$ is continuous, and thus Kakutani. and the set of $$(p,q)$$ for which $$f_1(p,q)=p$$ is exactly $$F_2$$. Similarly we can take $$f_2(p,q)=\min(p,q)$$. Since $$F_1$$ and $$F_2$$ were both valid fair sets, they must have nonempty intersection. In this case, their intersection is the set of $$(p,q)$$ with $$p=q$$, in which both player’s cooperate with the same probability. This set has a unique Pareto optimum, $$(1,1)$$ in which both players cooperate with certainty. Example 2: Two players are again in a prisoner’s dilemma with the same utilities as before. However this time each player thinks that they should cooperate only 90% of the time while the other player cooperates all the time. Thus each player thinks they should get 1.1 utility, while the other only gets 0.8. They refuse to let the other player take advantage of them and get more than 0.8 utility. They also want to make sure that the unique best utility for the other player is to agree to the 1.1, 0.8 split. To do this, they say that for every unit of utility that they get less than 1.1, the opponent should get 0.1 less utility than 0.8. In utility space, a pair of utilities is acceptable to $$P_1$$ are if $$0.8-U_2\geq .1(1.1-U_1)$$. Translating this to action space, we say that $$(p,q)\in F_1$$ if $$p\leq \frac{40q+23}{70}$$, and similarly $$(q,p)\in F_2$$ if $$q\leq \frac{40p+23}{70}$$. Again, you can see these are valid sets by considering $$f_1(p,q)=\min(p, \frac{40q+23}{70})$$ and $$f_2(p,q)=\min(q, \frac{40p+23}{70})$$. (Note that if we changed the numbers, these function might not have been valid, since they might not have mapped the unit interval into itself.) The intersection of $$F_1$$ and $$F_2$$ is a kite shape, with a unique Pareto Optimum at $$p=q=23/30$$ Example 3: Two players are again in a prisoner’s dilemma with the same utilities as before. Player 1 has the same fair set as in Example 1 ($$F_1=\{(p,q)\in R|p\leq q\}$$). Player 2 however is cooperate bot and thinks that a solution is fair if he cooperates ($$F_2=\{(p,q)\in R|q=1\}$$, $$f_2(p,q)=1$$). Now, $$F_2$$ is a subset of $$F_1$$, so the set of mutually fair points $$F$$ is just the set where $$P_2$$ cooperates. Every point in this set is a Pareto optimum, so the outcome is based on how we choose the Pareto optimum, which is under specified.

 by Stuart Armstrong 191 days ago | link This seems a proper version of what I was trying to do here: https://agentfoundations.org/item?id=513 reply
 by Scott Garrabrant 191 days ago | link Yeah. The original generator of these ideas was that we were trying to find (or prove impossible) an improvement on NicerBot: An agent with reflective oracles that cooperates with itself (regardless of what reflective oracle is chosen), but is never exploited in expectation (even by epsilon.) reply
 by Stuart Armstrong 191 days ago | link Every point in this set is a Pareto optimum, so the outcome is based on how we choose the Pareto optimum, which is under specified. Recurse, using the agent’s concepts of fairness, using the new Pareto set as the set of possibilities? And continue until the set stabilises? And then the agents will be truly indifferent between the outcomes. reply
 by Scott Garrabrant 191 days ago | Stuart Armstrong likes this | link The problem is that our original set was a product of the actions available to the players, so they were able to cut things off using their own actions. When you restrict to the Pareto frontier, this is no longer the case. reply
 by Stuart Armstrong 191 days ago | link Does this generalise to giving gradations of fairness, rather than boolean fair/unfair set memberships? reply
 by Scott Garrabrant 191 days ago | Stuart Armstrong likes this | link I don’t see how it would. The closest thing I can think of is letting agents choose randomly between different fair sets, but I don’t see what that would buy for you. reply

### NEW DISCUSSION POSTS

Indeed there is some kind of
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
 by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
 by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
 by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
 by Stuart Armstrong on Predictable Exploration | 0 likes