Intelligent Agent Foundations Forumsign up / log in
Cooperative Oracles: Introduction
post by Scott Garrabrant 16 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

  1. Introduction
  2. Nonexploited Bargaining
  3. Stratified and Nearly Pareto Optima
  4. Definition and Existence Proof
  5. Alternate Notions of Dependency

In this post, I will give a sketchy advertisement of what is to come.

Cooperative oracles are a refinement on reflective oracles. We consider the set of Turing machines with access to a reflective oracle, and we think of each Turing machine as having a utility function (perhaps written in a comment in the source code). This utility function is itself computable using the reflective oracle.

Since the definition of a reflective oracle use a fixed point procedure that may have had multiple fixed points, there are actually many different reflective oracles. However, we will take something like a Pareto optimum in the class of reflective oracles.

For example if two players in a prisoner’s dilemma use a reflective oracles to cooperate with exactly the probability the other player cooperates, there is a continuum of fixed points in which the player cooperate with the same probability. We want to take the fixed point in which the both cooperate.

Unfortunately, we do not want to just take a standard Pareto Optimum, since we are working with the class of all Turing Machines, and for every machine with a utility function, there is another with the negation of that utility function, so all points will be Optima.

For this, we will use a stratified version of Pareto optima. If a third party wants the two above players to defect against each other, we will still choose mutual cooperation. Since the players only reference each other, and not the third party, the third party is not considered in the notion of optimality for the output of the Oracle on the prisoners’ dilemma game.

We will be able to build a cooperative oracle, which is a reflective oracle that (roughly) ensures that all of the outputs of the computations are Pareto optimal for all of the utility function of the machines involved it that computation. This Pareto optimality is only used to choose between multiple fixed points. We will still be restricted to have the computation follow its source code, and have the reflective oracle give honest data about the output of the computations.

Using cooperative oracles, we will be able to set up a system similar to modal combat, but with reflective oracles.

We will also be able to study bargaining. We will formalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation, and implement this style of bargaining with reflective oracles.

This is interesting because it reduces the problem of finding a fair way to split gains from trade from a group endeavor (like in standard cooperative game theory) to an individual endeavor, where each player specifies what they consider to be a fair outcome for them, and the reflective oracle takes care of the rest. The problem of figuring out how to define what is fair to you is still wide open, however.

The notion of cooperative oracles presented will be use a clunky syntactic notion of when one computation depends on another. If an algorithm calls the reflective oracle on the output of another, we say the first algorithm depends on the second, and we take the transitive closure of this property. There are many alternatives to this, some involving looking at the actual function the algorithms compute and checking how these functions depend on other values. Specifying better notions of computations depending on each other is a hard problem, and better solutions to this may yield better definitions of cooperative oracles.





The "benign induction
by David Krueger on Why I am not currently working on the AAMLS agenda | 0 likes

This comment is to explain
by Alex Mennen on Formal Open Problem in Decision Theory | 0 likes

Thanks for writing this -- I
by Daniel Dewey on AI safety: three human problems and one AI issue | 1 like

I think it does do the double
by Stuart Armstrong on Acausal trade: double decrease | 0 likes

>but the agent incorrectly
by Stuart Armstrong on CIRL Wireheading | 0 likes

I think the double decrease
by Owen Cotton-Barratt on Acausal trade: double decrease | 0 likes

The problem is that our
by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Yeah. The original generator
by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 0 likes

I don't see how it would. The
by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Does this generalise to
by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

>Every point in this set is a
by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This seems a proper version
by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This doesn't seem to me to
by Stuart Armstrong on Change utility, reduce extortion | 0 likes

[_Regret Theory with General
by Abram Demski on Generalizing Foundations of Decision Theory II | 0 likes

It's not clear whether we
by Paul Christiano on Infinite ethics comparisons | 1 like


Privacy & Terms