Intelligent Agent Foundations Forumsign up / log in
Cooperative Oracles: Introduction
post by Scott Garrabrant 281 days ago | Abram Demski, Jessica Taylor and Patrick LaVictoire like this | 1 comment

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

  1. Introduction
  2. Nonexploited Bargaining
  3. Stratified Pareto Optima and Almost Stratified Pareto Optima
  4. Definition and Existence Proof
  5. Alternate Notions of Dependency

In this post, I will give a sketchy advertisement of what is to come.

Cooperative oracles are a refinement on reflective oracles. We consider the set of Turing machines with access to a reflective oracle, and we think of each Turing machine as having a utility function (perhaps written in a comment in the source code). This utility function is itself computable using the reflective oracle.

Since the definition of a reflective oracle use a fixed point procedure that may have had multiple fixed points, there are actually many different reflective oracles. However, we will take something like a Pareto optimum in the class of reflective oracles.

For example if two players in a prisoner’s dilemma use a reflective oracles to cooperate with exactly the probability the other player cooperates, there is a continuum of fixed points in which the player cooperate with the same probability. We want to take the fixed point in which the both cooperate.

Unfortunately, we do not want to just take a standard Pareto Optimum, since we are working with the class of all Turing Machines, and for every machine with a utility function, there is another with the negation of that utility function, so all points will be Optima.

For this, we will use a stratified version of Pareto optima. If a third party wants the two above players to defect against each other, we will still choose mutual cooperation. Since the players only reference each other, and not the third party, the third party is not considered in the notion of optimality for the output of the Oracle on the prisoners’ dilemma game.

We will be able to build a cooperative oracle, which is a reflective oracle that (roughly) ensures that all of the outputs of the computations are Pareto optimal for all of the utility function of the machines involved it that computation. This Pareto optimality is only used to choose between multiple fixed points. We will still be restricted to have the computation follow its source code, and have the reflective oracle give honest data about the output of the computations.

Using cooperative oracles, we will be able to set up a system similar to modal combat, but with reflective oracles.

We will also be able to study bargaining. We will formalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation, and implement this style of bargaining with reflective oracles.

This is interesting because it reduces the problem of finding a fair way to split gains from trade from a group endeavor (like in standard cooperative game theory) to an individual endeavor, where each player specifies what they consider to be a fair outcome for them, and the reflective oracle takes care of the rest. The problem of figuring out how to define what is fair to you is still wide open, however.

The notion of cooperative oracles presented will be use a clunky syntactic notion of when one computation depends on another. If an algorithm calls the reflective oracle on the output of another, we say the first algorithm depends on the second, and we take the transitive closure of this property. There are many alternatives to this, some involving looking at the actual function the algorithms compute and checking how these functions depend on other values. Specifying better notions of computations depending on each other is a hard problem, and better solutions to this may yield better definitions of cooperative oracles.

by Scott Garrabrant 185 days ago | link

I have stopped working on this sequence, because a coauthor is trying to write it up as a more formal paper instead.






[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes


Privacy & Terms