Finding reflective oracle distributions using a Kakutani map
discussion post by Jessica Taylor 204 days ago | Vadim Kosoy likes this | discuss

(follow-up to: A correlated analogue to reflective oracles)

## Motivation

Suppose players are playing in a correlated equilibrium using a reflective oracle distribution. How does the equilibrium they play in vary as a function of the parameters of the game, or of the players’ policies? It turns out that the set of equilibria is a Kakutani map of the parameters to the game. This is a lot like it being a continuous map.

This might make it possible for players to reason about the effects of their policy on the equilibrium that they play (since the equilibrium is now a Kakutani map of the players’ policies).

## Definitions

Let $$k$$ be some natural number. We will consider reflective oracle distributions whose queries are parameterized on some vector in $$\theta \in \mathbb{R}^k$$.

To do this, let the Turing machines $$\mathcal{M}$$, instead of outputting a raw query, output a continuous function from $$\theta \in \mathbb{R}^k$$ to the query. (The details of representing continuous functions don’t seem that important). The reflectivity condition on oracle distributions is now relative to $$\theta$$ (since the queries depend on $$\theta$$).

Define the map

$\mathrm{ParamsToDistrs}(\theta) := \{D \in \mathcal{D} | D \text{ is reflective relative to \theta }\}$

which maps the parameters $$\theta$$ to the set of reflective oracle distributions for $$\theta$$.

Theorem: $$\mathrm{ParamsToDistrs}$$ is a Kakutani map.

Proof:

From the previous post, we have:

• For each $$\theta$$, $$\mathrm{ParamsToDistrs}(\theta)$$ is nonempty (Theorem 1)
• For each $$\theta$$, $$\mathrm{ParamsToDistrs}(\theta)$$ is convex (Theorem 2)

So it is sufficient to show that $$\mathrm{ParamsToDistrs}$$ has a closed graph.

Let $$\theta_1, \theta_2, ...$$ be an infinite sequence of $$\mathbb{R}^k$$ values converging to $$\theta_\infty$$. Let $$D_1, D_2, ...$$ be an infinite sequence of oracle distributions in $$\mathcal{D}$$ converging to $$D_\infty$$. Assume that for each natural $$n$$, $$D_n$$ is reflective relative to $$\theta_n$$. We will show that $$D_\infty$$ is reflective relative to $$\theta_\infty$$; this is sufficient to show that $$\mathrm{ParamsToDistrs}$$ has a closed graph.

Let $$M \in \mathcal{M}$$. Let $$a$$ be such that $$D_\infty(O(M) = a) > 0$$. Then $$D_\infty(O(M) = a) > \epsilon$$ for some $$\epsilon > 0$$. By convergence, there is some $$N$$ such that for all $$n \geq N$$, $$D_n(O(M) = a) > \epsilon$$. By reflectivity of each $$D_n$$ relative to $$\theta_i$$, we have, for each $$n \geq N$$,

$a \in \mathrm{Eval}(M)(\theta_n)(\mathrm{Condition}(D_n, M, a))$

Let $$q^\theta := \mathrm{Eval}(M)(\theta)$$. Rewriting the above statement:

$a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_{D_n}[q^{\theta_n}_i(O)]$

Consider the set $$\{(\theta, D) | a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_D[q^\theta_i(O)]\}$$. This is the intersection of a finite number of sets of the form $$\{(\theta, D) | \mathbb{E}_D[q^\theta_a(O) - q^\theta_i(O)] \geq 0\}$$. Each of these sets is closed because $$(\theta, D) \mapsto \mathbb{E}_D[q^\theta_a(O) - q^\theta_i(O)]$$ is continuous (so the preimage of the closed set $$\{x | x \geq 0\}$$ is closed). Therefore the set $$\{(\theta, D) | a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_D[q^\theta_i(O)]\}$$ is closed.

In total, this is sufficient to show $$a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_{D_\infty}[q^{\theta_\infty}_i(O)]$$. Therefore $$D_\infty$$ is reflective relative to $$\theta_\infty$$, as desired.

$$\square$$

An immediate consequence of this theorem is that the set of correlated equilibria of a normal-form game is a Kakutani map of the parameters of the game.

### NEW DISCUSSION POSTS

Indeed there is some kind of
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
 by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
 by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
 by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
 by Stuart Armstrong on Predictable Exploration | 0 likes