Finding reflective oracle distributions using a Kakutani map
discussion post by Jessica Taylor 142 days ago | Vadim Kosoy likes this | discuss

(follow-up to: A correlated analogue to reflective oracles)

## Motivation

Suppose players are playing in a correlated equilibrium using a reflective oracle distribution. How does the equilibrium they play in vary as a function of the parameters of the game, or of the players’ policies? It turns out that the set of equilibria is a Kakutani map of the parameters to the game. This is a lot like it being a continuous map.

This might make it possible for players to reason about the effects of their policy on the equilibrium that they play (since the equilibrium is now a Kakutani map of the players’ policies).

## Definitions

Let $$k$$ be some natural number. We will consider reflective oracle distributions whose queries are parameterized on some vector in $$\theta \in \mathbb{R}^k$$.

To do this, let the Turing machines $$\mathcal{M}$$, instead of outputting a raw query, output a continuous function from $$\theta \in \mathbb{R}^k$$ to the query. (The details of representing continuous functions don’t seem that important). The reflectivity condition on oracle distributions is now relative to $$\theta$$ (since the queries depend on $$\theta$$).

Define the map

$\mathrm{ParamsToDistrs}(\theta) := \{D \in \mathcal{D} | D \text{ is reflective relative to \theta }\}$

which maps the parameters $$\theta$$ to the set of reflective oracle distributions for $$\theta$$.

Theorem: $$\mathrm{ParamsToDistrs}$$ is a Kakutani map.

Proof:

From the previous post, we have:

• For each $$\theta$$, $$\mathrm{ParamsToDistrs}(\theta)$$ is nonempty (Theorem 1)
• For each $$\theta$$, $$\mathrm{ParamsToDistrs}(\theta)$$ is convex (Theorem 2)

So it is sufficient to show that $$\mathrm{ParamsToDistrs}$$ has a closed graph.

Let $$\theta_1, \theta_2, ...$$ be an infinite sequence of $$\mathbb{R}^k$$ values converging to $$\theta_\infty$$. Let $$D_1, D_2, ...$$ be an infinite sequence of oracle distributions in $$\mathcal{D}$$ converging to $$D_\infty$$. Assume that for each natural $$n$$, $$D_n$$ is reflective relative to $$\theta_n$$. We will show that $$D_\infty$$ is reflective relative to $$\theta_\infty$$; this is sufficient to show that $$\mathrm{ParamsToDistrs}$$ has a closed graph.

Let $$M \in \mathcal{M}$$. Let $$a$$ be such that $$D_\infty(O(M) = a) > 0$$. Then $$D_\infty(O(M) = a) > \epsilon$$ for some $$\epsilon > 0$$. By convergence, there is some $$N$$ such that for all $$n \geq N$$, $$D_n(O(M) = a) > \epsilon$$. By reflectivity of each $$D_n$$ relative to $$\theta_i$$, we have, for each $$n \geq N$$,

$a \in \mathrm{Eval}(M)(\theta_n)(\mathrm{Condition}(D_n, M, a))$

Let $$q^\theta := \mathrm{Eval}(M)(\theta)$$. Rewriting the above statement:

$a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_{D_n}[q^{\theta_n}_i(O)]$

Consider the set $$\{(\theta, D) | a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_D[q^\theta_i(O)]\}$$. This is the intersection of a finite number of sets of the form $$\{(\theta, D) | \mathbb{E}_D[q^\theta_a(O) - q^\theta_i(O)] \geq 0\}$$. Each of these sets is closed because $$(\theta, D) \mapsto \mathbb{E}_D[q^\theta_a(O) - q^\theta_i(O)]$$ is continuous (so the preimage of the closed set $$\{x | x \geq 0\}$$ is closed). Therefore the set $$\{(\theta, D) | a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_D[q^\theta_i(O)]\}$$ is closed.

In total, this is sufficient to show $$a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_{D_\infty}[q^{\theta_\infty}_i(O)]$$. Therefore $$D_\infty$$ is reflective relative to $$\theta_\infty$$, as desired.

$$\square$$

An immediate consequence of this theorem is that the set of correlated equilibria of a normal-form game is a Kakutani map of the parameters of the game.

### NEW DISCUSSION POSTS

Note that the problem with
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Typos on page 5: *
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Ah, you're right. So gain
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> Do you have ideas for how
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I think I understand what
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>You don’t have to solve
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

My confusion is the
 by Tom Everitt on Delegative Inverse Reinforcement Learning | 0 likes

> First of all, it seems to
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> figure out what my values
 by Vladimir Slepnev on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I agree that selection bias
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>It seems quite plausible
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

> defending against this type
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

2. I think that we can avoid
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I hope you stay engaged with
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes