Cooperative Oracles: Introduction post by Scott Garrabrant 764 days ago | Abram Demski, Jessica Taylor and Patrick LaVictoire like this | 1 comment This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon. Here is my plan for posts in this sequence. I will update this as I go. Introduction Nonexploited Bargaining Stratified Pareto Optima and Almost Stratified Pareto Optima Definition and Existence Proof Alternate Notions of Dependency In this post, I will give a sketchy advertisement of what is to come. Cooperative oracles are a refinement on reflective oracles. We consider the set of Turing machines with access to a reflective oracle, and we think of each Turing machine as having a utility function (perhaps written in a comment in the source code). This utility function is itself computable using the reflective oracle. Since the definition of a reflective oracle use a fixed point procedure that may have had multiple fixed points, there are actually many different reflective oracles. However, we will take something like a Pareto optimum in the class of reflective oracles. For example if two players in a prisoner’s dilemma use a reflective oracles to cooperate with exactly the probability the other player cooperates, there is a continuum of fixed points in which the player cooperate with the same probability. We want to take the fixed point in which the both cooperate. Unfortunately, we do not want to just take a standard Pareto Optimum, since we are working with the class of all Turing Machines, and for every machine with a utility function, there is another with the negation of that utility function, so all points will be Optima. For this, we will use a stratified version of Pareto optima. If a third party wants the two above players to defect against each other, we will still choose mutual cooperation. Since the players only reference each other, and not the third party, the third party is not considered in the notion of optimality for the output of the Oracle on the prisoners’ dilemma game. We will be able to build a cooperative oracle, which is a reflective oracle that (roughly) ensures that all of the outputs of the computations are Pareto optimal for all of the utility function of the machines involved it that computation. This Pareto optimality is only used to choose between multiple fixed points. We will still be restricted to have the computation follow its source code, and have the reflective oracle give honest data about the output of the computations. Using cooperative oracles, we will be able to set up a system similar to modal combat, but with reflective oracles. We will also be able to study bargaining. We will formalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation, and implement this style of bargaining with reflective oracles. This is interesting because it reduces the problem of finding a fair way to split gains from trade from a group endeavor (like in standard cooperative game theory) to an individual endeavor, where each player specifies what they consider to be a fair outcome for them, and the reflective oracle takes care of the rest. The problem of figuring out how to define what is fair to you is still wide open, however. The notion of cooperative oracles presented will be use a clunky syntactic notion of when one computation depends on another. If an algorithm calls the reflective oracle on the output of another, we say the first algorithm depends on the second, and we take the transitive closure of this property. There are many alternatives to this, some involving looking at the actual function the algorithms compute and checking how these functions depend on other values. Specifying better notions of computations depending on each other is a hard problem, and better solutions to this may yield better definitions of cooperative oracles.

 by Scott Garrabrant 668 days ago | link I have stopped working on this sequence, because a coauthor is trying to write it up as a more formal paper instead. reply

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes