Intelligent Agent Foundations Forumsign up / log in
Strategy Nonconvexity Induced by a Choice of Potential Oracles
discussion post by Alex Appel 294 days ago | Abram Demski likes this | discuss

One notable aspect of the extant cooperative oracle results is that they are in a different setting from the usual reflective oracle results.

The usual reflective oracle results show a fixed point in the space of potential oracles,\([0,1]^{M\times[0,1]\cup\mathbb{Q}}\times[0,1]^{M}\). The first component of this is a function \(query:M\times[0,1]\cup\mathbb{Q}\to[0,1]\) which dictates the probability of the potential oracle outputting 1 (0 otherwise), when asked about whether turing machine M outputs 1 with probability greater than some rational number. The second component of this is a function that maps each turing machine using the oracle to a probability of outputting a 1.

The cooperative oracle results elsewhere on IAFF show a fixed point in the space of strategies for a game, \(\prod_{i\in I}\Delta(S_{i})\), where \(S_{i}\) is the set of strategies for the i’th player.

I was able to successfully translate the cooperative oracle existence proof from strategy-space to function-space. However, the further step of taking an arbitrary point that’s a fixed point in strategy space, and explicitly building a true reflective oracle in function-space that results in that fixed point being played in the associated game, ran into some problems. There were assumptions that needed to be made to get that to work out, but the most unrealistic by far was the requirement of strategy continuity.

As an example, in the game of matching pennies, if the probability of the other player outputting “heads” is 0.5, the set of acceptable strategies goes from a simple “heads” or “tails” to “any possible mixture of heads and tails”, while this requirement mandates that the strategy chosen just shifts very fast from “all heads” to “all tails” in the vicinity of 0.5.

At the heart of this insufficiently strong result is the question of “just what does it mean to have a convex set of possible responses without one picked out in particular?” One possible interpretation is that the convex set comes from the turing machine/player having a probability of not halting, so there’s a degree of freedom in what probability distribution the oracle says about it. This doesn’t seem realistic. An oracle-equipped player playing matching pennies against coin-flip bot has their “set of acceptable strategies” include all mixtures of heads and tails, but it doesn’t make sense to say “ah, well, the player just doesn’t ever halt and select a move, and they can only be said to play heads with 50% probability by consulting the oracle about their behavior, upon which the oracle hallucinates an arbitrary nonstandard play at the end of time”

Well then, assuming halting, it’s possible to interpret “convex set of possible responses” to refer to the convex set being produced by the choice of oracle. (an “oracle” will be defined as something that has the sharp 0-1 transition behavior around the probability of another turing machine outputting something, and accurately has the transition take place at the true probability. Further restrictions on the randomization behavior at the threshold are not imposed)

In the previous example against coinflip bot, the strategy “call the oracle on (coinflipbot,0.5) and respond with the opposite of what the oracle outputs” does indeed produce a convex set of possible responses, induced by what probability the oracle reports back when you give it something that’s exactly on the threshold.

As a quick refresher, reflective oracles can randomize when you give them a query where the rational number is the exact probability of the machine outputting 1, and there’s a degree of freedom in what probability they randomize with.

So, these “convex sets of possible responses” come from the choice of oracle, right? If you specify oracle access to a particular strategy, there are lots of different possible oracles that could be chosen, and when you plot all the responses produced by choosing a particular oracle back into strategy-space, it makes a convex set, doesn’t it?" Well, no. Here’s the explicit counterexample. A player/turing machine whose set of “possible responses” (where possible responses are strategies that could be played by an appropriate choice of oracle) to another player is nonconvex in strategy-space.

Player A just flips a coin to decide their move, and Player B implements the following code.

IF(oracle(A,0.5)=1)

 IF(oracle(A,0.5)=1)
    
      Output Move_1

 ELSE Output Move_2

Else Output Move_3

Now, since player A flips a coin, the probability distribution between moves 1, 2, and 3 will be dictated by how the oracle randomizes on (A,0.5) as an input. Select the oracle that outputs 1 with 50% probability, else 0. Then the probability distribution is 25% for move 1, 25% for move 2, and 50% for move 3. Now select the oracle that outputs 1 with 100% probability. Then the probability distribution is 100% for move 1. A mixture of these two points is the probability distribution 62.5% for move 1, 12.5% for move 2, and 25% for move 3. Thus, the oracle must output 1 on the query (A,0.5) \(\sqrt{.625}\) of the time. And we arrive at approximate probabilities of 62.5% for move 1, 16.6% for move 2, and 20.9% for move 3, thus our 50/50 mixture of those two points does not have an associated oracle which could produce it.

Ok, so what’s the takeaway?

The requirement for “the strategy of a player is continuous” in the theorem that links cooperative oracles in strategy-space to cooperative oracles in function-space can’t be disposed of, until someone figures out an interpretation of “convex set of possible responses” that isn’t

“the player doesn’t halt” or

“the choice of oracle dictates which of the possible responses will be chosen”



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms