Finding reflective oracle distributions using a Kakutani map
discussion post by Jessica Taylor 295 days ago | Vadim Kosoy likes this | discuss

(follow-up to: A correlated analogue to reflective oracles)

Motivation

Suppose players are playing in a correlated equilibrium using a reflective oracle distribution. How does the equilibrium they play in vary as a function of the parameters of the game, or of the players’ policies? It turns out that the set of equilibria is a Kakutani map of the parameters to the game. This is a lot like it being a continuous map.

This might make it possible for players to reason about the effects of their policy on the equilibrium that they play (since the equilibrium is now a Kakutani map of the players’ policies).

Definitions

Let $$k$$ be some natural number. We will consider reflective oracle distributions whose queries are parameterized on some vector in $$\theta \in \mathbb{R}^k$$.

To do this, let the Turing machines $$\mathcal{M}$$, instead of outputting a raw query, output a continuous function from $$\theta \in \mathbb{R}^k$$ to the query. (The details of representing continuous functions don’t seem that important). The reflectivity condition on oracle distributions is now relative to $$\theta$$ (since the queries depend on $$\theta$$).

Define the map

$\mathrm{ParamsToDistrs}(\theta) := \{D \in \mathcal{D} | D \text{ is reflective relative to \theta }\}$

which maps the parameters $$\theta$$ to the set of reflective oracle distributions for $$\theta$$.

Theorem: $$\mathrm{ParamsToDistrs}$$ is a Kakutani map.

Proof:

From the previous post, we have:

• For each $$\theta$$, $$\mathrm{ParamsToDistrs}(\theta)$$ is nonempty (Theorem 1)
• For each $$\theta$$, $$\mathrm{ParamsToDistrs}(\theta)$$ is convex (Theorem 2)

So it is sufficient to show that $$\mathrm{ParamsToDistrs}$$ has a closed graph.

Let $$\theta_1, \theta_2, ...$$ be an infinite sequence of $$\mathbb{R}^k$$ values converging to $$\theta_\infty$$. Let $$D_1, D_2, ...$$ be an infinite sequence of oracle distributions in $$\mathcal{D}$$ converging to $$D_\infty$$. Assume that for each natural $$n$$, $$D_n$$ is reflective relative to $$\theta_n$$. We will show that $$D_\infty$$ is reflective relative to $$\theta_\infty$$; this is sufficient to show that $$\mathrm{ParamsToDistrs}$$ has a closed graph.

Let $$M \in \mathcal{M}$$. Let $$a$$ be such that $$D_\infty(O(M) = a) > 0$$. Then $$D_\infty(O(M) = a) > \epsilon$$ for some $$\epsilon > 0$$. By convergence, there is some $$N$$ such that for all $$n \geq N$$, $$D_n(O(M) = a) > \epsilon$$. By reflectivity of each $$D_n$$ relative to $$\theta_i$$, we have, for each $$n \geq N$$,

$a \in \mathrm{Eval}(M)(\theta_n)(\mathrm{Condition}(D_n, M, a))$

Let $$q^\theta := \mathrm{Eval}(M)(\theta)$$. Rewriting the above statement:

$a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_{D_n}[q^{\theta_n}_i(O)]$

Consider the set $$\{(\theta, D) | a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_D[q^\theta_i(O)]\}$$. This is the intersection of a finite number of sets of the form $$\{(\theta, D) | \mathbb{E}_D[q^\theta_a(O) - q^\theta_i(O)] \geq 0\}$$. Each of these sets is closed because $$(\theta, D) \mapsto \mathbb{E}_D[q^\theta_a(O) - q^\theta_i(O)]$$ is continuous (so the preimage of the closed set $$\{x | x \geq 0\}$$ is closed). Therefore the set $$\{(\theta, D) | a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_D[q^\theta_i(O)]\}$$ is closed.

In total, this is sufficient to show $$a \in \arg\max_{i \in \{1, ..., l(q)\}} \mathbb{E}_{D_\infty}[q^{\theta_\infty}_i(O)]$$. Therefore $$D_\infty$$ is reflective relative to $$\theta_\infty$$, as desired.

$$\square$$

An immediate consequence of this theorem is that the set of correlated equilibria of a normal-form game is a Kakutani map of the parameters of the game.

NEW DISCUSSION POSTS

[Delegative Reinforcement
 by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
 by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
 by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
 by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
 by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
 by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
 by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
 by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
 by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
 by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
 by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
 by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
 by Abram Demski on Policy Selection Solves Most Problems | 0 likes