Intelligent Agent Foundations Forumsign up / log in
Minimax forecasting
post by Vadim Kosoy 728 days ago | 2 comments

This post continues the research programme of attacking the grain of truth problem by departure from the Bayesian paradigm. In the previous post, I suggested using Savage’s minimax regret decision rule, but here I fall back to the simple minimax decision rule. This is because the mathematics is considerably simpler, and minimax should be sufficient to get IUD play in general games and Nash equilibrium in zero-sum two-player games. I hope to build on these results to get analogous results for minimax regret in the future.

We consider “semi-Bayesian” agents following the minimax expected utility decision rule, in oblivious environments with full monitoring (a setting that we will refer to as “forecasting”). This setting is considered in order to avoid the need to enforce exploration, as a preparation for analysis of general environments. We show that such agents satisfy a certain asymptotic optimality theorem. Intuitively, this theorem means that whenever the environment satisfies an incomplete model that is included in the prior, the agent will eventually learn this model i.e. extract at least as much utility as can be guaranteed for this model.


Appendix A contains the proofs of all results. Appendix B contains some theorems from the literature together with a corollary used in Appendix A.

Notation

Given \({X}\) a topological space, \({{\mathcal{P}}(X)}\) will denote the space of Borel probability measures on \({X}\). We regard it as a topological space using the weak\({^*}\) topology. Given \({x \in X}\), \({\delta_x \in {\mathcal{P}}(X)}\) is defined by \({\delta_x(A):=[[x \in A]]}\). Given \({\mu, \nu \in {\mathcal{P}}(X)}\), \({{\operatorname{d}_{\text{tv}}}(\mu, \nu)}\) stands for the total variation distance between \({\mu}\) and \({\nu}\). We denote \({{\mathcal{P}_{\operatorname{C}}}(X)}\) the set of non-empty convex compact subsets of \({{\mathcal{P}}(X)}\).

Given \({\mathcal{X}}\) a finite set, \({\mathcal{X}^*}\) denotes the set of finite strings in alphabet \({\mathcal{X}}\), i.e. \({\mathcal{X}^*:=\bigsqcup_{n \in {\mathbb{N}}} \mathcal{X}^n}\). \({\mathcal{X}^\omega}\) denotes the set of infinite strings in alphabet \({\mathcal{X}}\). Given \({x \in \mathcal{X}^* \sqcup \mathcal{X}^\omega}\), \({{\lvert x \rvert} \in {\mathbb{N}}\sqcup \{\infty\}}\) is the length of string. Given \({0 \leq n < {\lvert x \rvert}}\), \({x_n \in \mathcal{X}}\) is the \({n}\)-th symbol in \({x}\). Given \({0 \leq n \leq {\lvert x \rvert}}\), \({x_{<n}}\) is the prefix of \({x}\) of length \({n}\). Given \({x,y \in \mathcal{X}^* \sqcup \mathcal{X}^\omega}\), the notations \({x \sqsubset y}\), \({x \sqsubseteq y}\), \({x \not\sqsubset y}\) and \({x \not\sqsubseteq y}\) mean “\({x}\) is a proper prefix of \({y}\)”, “\({x}\) is a prefix of \({y}\)” and their negations. \({{\boldsymbol{\lambda}}_\mathcal{X} \in \mathcal{X}^*}\) is the empty string and \({\mathcal{X}^{+}:=\mathcal{X}^* \setminus {\boldsymbol{\lambda}}_\mathcal{X}}\).

Elementary properties of minimax

Fix \({S}\) and \({E}\) compact Polish spaces, \({S}\) representing the agent’s pure policies and \({E}\) representing the pure environments. Let \({u: S \times E \rightarrow {\mathbb{R}}}\) be a continuous utility function.

Proposition 1

Consider \({\Phi \subseteq {\mathcal{P}}(E)}\). There exists

\[{\pi^* \in {\underset{\pi \in {\mathcal{P}}(S)}{\operatorname{arg\,max}}\,} \inf_{\mu \in \Phi} \operatorname{E}_{\pi \times \mu}[u]}\]

Moreover, let \({\bar{\Phi}}\) be the closure of the convex hull of \({\Phi}\). Then, the same \({\pi^*}\) satisfies

\[{\pi^* \in {\underset{\pi \in {\mathcal{P}}(S)}{\operatorname{arg\,max}}\,} \min_{\mu \in \bar{\Phi}} \operatorname{E}_{\pi \times \mu}[u]}\]

We will say that such a \({\pi^*}\) is a minimax policy for \({\Phi}\).

Proposition 2

Consider \({\Phi, \Phi' \in {\mathcal{P}_{\operatorname{C}}}(E)}\) and \({p \in [0,1]}\). Then, there is \({\nu^* \in \Phi'}\) s.t. any minimax policy for \({p\Phi+(1-p)\Phi'}\) is a minimax policy for \({p\Phi+(1-p)\nu^*}\).


In particular, Proposition 2 implies that a minimax policy for \({\Phi}\) is the optimal policy for some \({\mu^* \in \Phi}\).

Now we ask what policy is implemented by “subagents” created by a minimax policy. Suppose \({S = S_1 \times S_2}\), where \({S_2}\) represents the pure policies of the subagent. Assume that there is a Borel set \({A \subseteq E}\) (the condition for the creation of the subagent), a finite set \({T}\) (the actions taken before the creation of the subagent), a Borel measurable mapping \({\alpha: S_1 \rightarrow T}\) and continuous functions \({u_1: S_1 \times E \rightarrow {\mathbb{R}}}\) and \({u_2: S_2 \times T \times E \rightarrow {\mathbb{R}}}\) s.t.

\[{u(s_1,s_2,e)=u_1(s_1,e)+\chi_A(e) u_2(s_2,\alpha(s_1),e)}\]

Consider \({\Phi \in {\mathcal{P}_{\operatorname{C}}}(E)}\) and \({\pi^* \in {\mathcal{P}}(S)}\) minimax for \({\Phi}\). Denote \({\operatorname{pr}_{1,2}: S \rightarrow S_{1,2}}\) the projection mappings. Define \({\pi_1^* \in {\mathcal{P}}(S_1)}\) by

\[\pi_1^* := \operatorname{pr}_{1*}\pi^*\]

Define \({\pi_2^*: T \rightarrow {\mathcal{P}}(S_2)}\) by

\[\pi_2^*(t) := \operatorname{pr}_{2*} (\pi^* \mid \alpha^{-1}(t) \times S_2)\]

If for some \({t_0 \in T}\) we have \({\pi^*(\alpha^{-1}(t_0) \times S_2)=0}\), then \({\pi^*_2(t_0)}\) can be arbitrary.

Proposition 3

\[\pi_2^* \in {\underset{\pi_2: T \rightarrow {\mathcal{P}}(S_2)}{\operatorname{arg\,max}}\,} \min_{\mu \in \Phi} (\operatorname{E}_{\pi_1^* \times \mu}[u_1] + \mu(A) \operatorname{E}_{t \sim \alpha_*\pi_1^*}[\operatorname{E}_{\pi_2(t) \times \mu \mid A}[u_2(t)]])\]


It should also be possible to make do without \({T}\) and the associated decomposition of \({u}\) by instead decomposing \({\pi^*}\) into \({\pi_1^*}\) and a Markov kernel from \({S_1}\) to \({S_2}\).

Asymptotic optimality

Fix finite sets \({{\mathcal{A}}}\) (actions) and \({{\mathcal{O}}}\) (observations). We now assume that \({E={{\mathcal{O}}^\omega}}\) and \({S={{\mathcal{O}}^* \rightarrow {\mathcal{A}}}}\) with the product topology. We fix \({\gamma: {\mathbb{N}}\rightarrow {\mathbb{R}}^{\geq 0}}\) a time discount function satisfying \({\sum_n \gamma(n) < \infty}\) and \({r: ({\mathcal{A}}\times {\mathcal{O}})^* \rightarrow {\mathbb{R}}}\) a bounded reward function. Given \({e \in {\mathcal{O}}^*}\) and \({s: {{\mathcal{O}}^* \rightarrow {\mathcal{A}}}}\), we define \({e^s \in ({\mathcal{A}}\times {\mathcal{O}})^\omega}\) by

\[{e^s_n:=(s(e_{<n}),e_n)}\]

The utility function is then given by

\[u(s,e)=\sum_{n \in {\mathbb{N}}} \gamma(n) r(e^s_{<n})\]

We will need a notation for a combination of two policies where the agent switches from one to another when observing some \({x \in {\mathcal{O}}^*}\). Given \({s_1: {{\mathcal{O}}^* \rightarrow {\mathcal{A}}}}\) and \({s_2: {{\mathcal{O}}^* \rightarrow {\mathcal{A}}}}\), we define \({s_1 \times_x s_2: {{\mathcal{O}}^* \rightarrow {\mathcal{A}}}}\) by

\[(s_1 \times_x s_2)(y):=\begin{cases}s_1(y) \text{ if } y \not\sqsupseteq x \\s_2(z) \text{ if } y=xz\end{cases}\]

Consider any \({\pi \in {\mathcal{P}}({{\mathcal{O}}^* \rightarrow {\mathcal{A}}})}\) and \({\rho: {\mathcal{A}}^{{\lvert x \rvert}} \rightarrow {\mathcal{P}}({{\mathcal{O}}^* \rightarrow {\mathcal{A}}})}\). We define \({\pi \otimes_x \rho \in {\mathcal{P}}({\mathcal{O}}^* \rightarrow {\mathcal{A}})}\) as follows. For any Borel measurable \({B \subseteq {{\mathcal{O}}^* \rightarrow {\mathcal{A}}}}\):

\[(\pi \otimes_x \rho)(B):=\sum_{t \in {\mathcal{A}}^{{\lvert x \rvert}}} {\operatorname{Pr}_{\substack{s_1 \sim \pi \\ s_2 \sim \rho(t)}}}[s_1 \times_x s_2 \in B \land \forall n < {\lvert x \rvert}: t_n=s_1(x_{<n})]\]

It is easy to see the above indeed defines a probability measure.

Note that changing the policy after a certain event cannot change utility more than \({O(1)}\) times the probability of the event times the corresponding time discount factor. That is, we have the following:

Proposition 4

\[{\lvert \operatorname{E}_{(\pi \otimes_x \rho) \times \mu}[u]-\operatorname{E}_{\pi \times \mu}[u] \rvert} \leq (\sup r - \inf r) \operatorname{Pr}_{e \sim \mu}[x \sqsubset e] \sum_{n > {\lvert x \rvert}} \gamma(n)\]


We are now ready to formulate the optimality theorem. Consider \({\Phi,\Phi' \in {\mathcal{P}_{\operatorname{C}}}({{\mathcal{O}}^\omega})}\) and \({p \in (0,1]}\). Denote \({\Psi = p \Phi + (1-p) \Phi'}\). We think of \({\Psi}\) as the prior, \({\Phi}\) as one of the models in the prior (assigned probability \({p}\)) and \({\Phi'}\) as the convex combination of all other models. Let \({\pi^* \in {\mathcal{P}}({{\mathcal{O}}^* \rightarrow {\mathcal{A}}})}\) be a minimax policy for \({\Psi}\). Define \({v: {\mathcal{O}}^* \rightarrow {\mathbb{R}}}\) by

\[v(x):=\max_{\rho \in {\mathcal{A}}^{{\lvert x \rvert}} \rightarrow {\mathcal{P}}({{\mathcal{O}}^* \rightarrow {\mathcal{A}}})} \min_{\mu \in \Phi} \operatorname{E}_{(\pi^* \otimes_x \rho) \times \mu}[u]\]

That is, \({v}\) is the expected utility that can be guaranteed by changing the policy following \({x}\), assuming that the true environment is in \({\Phi}\). We claim that if the true environment is in \({\Phi}\), then after sufficient time \({\pi^*}\) will achieve at least as much utility as can be guaranteed for \({\Phi}\) (guaranteed under the constraint of following \({\pi^*}\) until the point of making the current observations). That is, \({v}\) can only be greater than \({\operatorname{E}_{\pi^* \times \mu}[u]}\) by a quantity that goes to 0 with probability 1, even after normalizing according to Proposition 4:

Theorem 1

\[\forall \mu \in \Phi: \operatorname{Pr}_{e \sim \mu}[\lim_{n \rightarrow \infty} \max(\frac{v(e_{<n})-\operatorname{E}_{\pi^* \times \mu}[u]}{\operatorname{Pr}_{e' \sim \mu}[e_{<n} \sqsubset e'] \sum_{m > n} \gamma(m)},0)=0] = 1\]

Appendix A

We will repeatedly use the standard fact that, given \({X}\) a compact Polish space, \({{\mathcal{P}}(X)}\) is also a compact Polish space (it is metrizable by the Levy-Prokhorov metric, compact as a consequence of the Banach-Alaoglu theorem and the fact that any probability measure on a Polish space is Radon, and separability is also not hard to prove).

Proposition A.1

Consider \({X, Y}\) compact Polish spaces, \({f: X \times Y \rightarrow {\mathbb{R}}}\) continuous, \({x_\infty \in X}\) and a sequence \({x_n \rightarrow x_\infty}\). Define \({g_n: Y \rightarrow {\mathbb{R}}}\) by \({g_n(y):=f(x_n,y)}\) and \({g_\infty: Y \rightarrow {\mathbb{R}}}\) by \({g_\infty(y):=f(x_\infty,y)}\). Then, \({g_n}\) converges to \({g_\infty}\) uniformly.

Proof of Proposition A.1

Assume to the contrary that there is \({\epsilon > 0}\) and a subsequence \({\{x'_n \in X\}_{n \in {\mathbb{N}}}}\) of \({\{x_n\}}\) s.t.

\[{\max_{y \in Y} {\lvert f(x'_n,y)-f(x_\infty,y) \rvert} > \epsilon}\]

This implies we can choose \({\{y_n \in Y\}_{n \in {\mathbb{N}}}}\) s.t. \({{\lvert f(x'_n,y_n)-f(x_\infty,y_n) \rvert} > \epsilon}\). Let \({\{n_k \in {\mathbb{N}}\}_{k \in {\mathbb{N}}}}\) be an increasing sequence s.t. \({y_{n_k} \rightarrow y_\infty}\) for some \({y_\infty \in Y}\). We get \({{\lvert f(x'_{n_k},y_{n_k})-f(x_\infty,y_{n_k}) \rvert} > \epsilon}\), which is a contradiction because \({f(x'_{n_k},y_{n_k}) \rightarrow f(x_\infty,y_\infty)}\) and \({f(x_\infty,y_{n_k}) \rightarrow f(x_\infty,y_\infty)}\).

Proposition A.2

Consider \({X,Y}\) compact Polish spaces and \({f: X \times Y \rightarrow {\mathbb{R}}}\) continuous. Define \({\phi: X \times {\mathcal{P}}(Y) \rightarrow {\mathbb{R}}}\) by

\[{\phi(x,\nu):=\operatorname{E}_{y \sim \nu}[f(x,y)]}\]

Then, \({\phi}\) is continuous.

Proof of Proposition A.2

Consider any \({x_\infty \in X}\), \({\nu_\infty \in {\mathcal{P}}(Y)}\) and sequences \({x_n \rightarrow x_\infty}\), \({\nu_n \rightarrow \nu_\infty}\). We have

\[\phi(x_n,\nu_n)- \phi(x_\infty,\nu_\infty)=\operatorname{E}_{y \sim \nu_n}[f(x_n,y)-f(x_\infty,y)]+E_{y \sim \nu_n}[f(x_\infty,y)]-E_{y \sim \nu_\infty}[f(x_\infty,y)]\]

On the right hand side, the first term goes to 0 by Proposition A.1 and the second term goes to 0 by definition of the weak* topology.

Proposition A.3

Consider any \({X,Y}\) compact Polish and \({f: X \times Y \rightarrow {\mathbb{R}}}\) continuous. Define \({F: {\mathcal{P}}(X) \times {\mathcal{P}}(Y) \rightarrow {\mathbb{R}}}\) by

\[{F(\mu,\nu):=\operatorname{E}_{\mu \times \nu}[f]}\]

Then, \({F}\) is continuous.

Proof of Proposition A.3

Follows by applying Fubini’s theorem and using Proposition A.2 twice.

Proposition A.4

Consider any \({X,Y}\) compact Polish, \({A \subseteq Y}\) and \({f: X \times Y \rightarrow {\mathbb{R}}}\) continuous. Define \({f_A: X \rightarrow {\mathbb{R}}}\) by \({f_A(x):= \inf_{y \in A} f(x,y)}\). Then, \({f_A}\) is continuous.

Proof of Proposition A.4

Consider any \({x_\infty \in X}\) and a sequence \({x_n \rightarrow x_\infty}\)

\[{\lvert \inf_{y \in A} f(x_n,y) - \inf_{y \in A} f(x_\infty,y) \rvert} \leq \sup_{y \in A} {\lvert f(x_n,y) - f(x_\infty,y) \rvert}\]

By Proposition A.1, the right hand converges to 0, therefore the left hand side also converges to 0.


In the following, we will use the notation \({U: {\mathcal{P}}(S) \times {\mathcal{P}}(E) \rightarrow {\mathbb{R}}}\), defined by \({U(\pi,\mu):=\operatorname{E}_{\pi \times \mu}[u]}\). By Proposition A.3, \({U}\) is continuous.

Proof of Proposition 1

Define \({U_\Phi: {\mathcal{P}}(S) \rightarrow {\mathbb{R}}}\) by \({U_\Phi(\pi):= \inf_{\mu \in \Phi} U(\pi,\mu)}\). By Proposition A.4, \({U_{\Phi}}\) is continuous and therefore, \({\pi^*}\) exists.

\({U}\) is continuous and affine in the 2nd argument (i.e. it sends convex linear combinations to convex linear combinations), and therefore \({\inf_{\mu \in \Phi} U(\pi,\mu)=\min_{\mu \in \bar{\Phi}} U(\pi,\mu)}\), implying the second part of the proposition.

Proof of Proposition 2

Define \({V: {\mathcal{P}}(E) \rightarrow {\mathbb{R}}}\) by \({V(\mu):=\max_{\pi \in {\mathcal{P}}(S)} U(\pi,\mu)=\max_{s \in S} U(\delta_s,\mu)}\). Denote \({\Psi:=p\Phi + (1-p)\Phi'}\). By Proposition A.4, \({V}\) is continuous and therefore, there exists

\[\xi^* \in {\underset{\mu \in \Psi}{\operatorname{arg\,min}}\,} V(\mu)\]

Choose \({\mu^* \in \Phi}\), \({\nu^* \in \Phi'}\) s.t. \({\xi^* = p \mu^* + (1-p) \nu^*}\).

\({{\mathcal{P}}(S)}\) and \({\Psi}\) are compact convex sets in the spaces of finite signed Borel measures on \({S}\) and \({E}\) respectively. Therefore, we can apply the minimax theorem to get

\[\max_{\pi \in {\mathcal{P}}(S)} \min_{\xi \in \Psi} U(\pi,\xi) = \min_{\xi \in \Psi} \max_{s \in S} U(\delta_s,\xi)\]

Denote the above number \({u^*}\). We have

\[\min_{\mu \in \Phi} U(\pi^*,p\mu + (1-p)\nu^*) \geq \min_{\xi \in \Psi} U(\pi^*,\xi)=u^*\]

On the other hand

\[\max_{\pi \in {\mathcal{P}}(S)} \min_{\mu \in \Phi} U(\pi,p\mu + (1-p)\nu^*) \leq \max_{\pi \in {\mathcal{P}}(S)} U(\pi,\xi^*)=u^*\]

Combining the inequalities, we get the desired result.

Definition A.1

In the setting of Proposition 3, consider any \({\pi_1 \in {\mathcal{P}}(S_1)}\) and \({\pi_2: T \rightarrow {\mathcal{P}}(S_2)}\). We define \({\pi_1 \otimes_\alpha \pi_2 \in {\mathcal{P}}(S_1 \times S_2)}\) as follows. For any Borel measurable \({B \subseteq S_1 \times S_2}\):

\[(\pi_1 \otimes_\alpha \pi_2)(B):=\sum_{t \in T} (\pi_1 \times \pi_2(t))(B \cap (\alpha^{-1}(t) \times S_2))\]

It is easy to see the above indeed defines a probability measure.

Proposition A.5

Consider any \({\pi_1 \in {\mathcal{P}}(S_1)}\) and \({\pi_2: T \rightarrow {\mathcal{P}}(S_2)}\) and \({f: S_1 \times S_2 \rightarrow {\mathbb{R}}}\) Borel measurable and bounded. Then:

\[\operatorname{E}_{\pi_1 \otimes_\alpha \pi_2}[f] = \operatorname{E}_{s_1 \sim \pi_1}[\operatorname{E}_{s_2 \sim \pi_2(\alpha(s_1))}[f(s_1, s_2)]]\]

Proof of Proposition A.5

\[\operatorname{E}_{\pi_1 \otimes_\alpha \pi_2}[f] = \sum_{t \in T} \int_{\alpha^{-1}(t) \times S_2} f(s_1, s_2) d(\pi_1 \times \pi_2(t))(s_2)\]

\[\operatorname{E}_{\pi_1 \otimes_\alpha \pi_2}[f] = \sum_{t \in T} \int_{\alpha^{-1}(t)} \int_{S_2} f(s_1, s_2) d\pi_2(t)(s_2) d\pi_1(s_1)\]

\[\operatorname{E}_{\pi_1 \otimes_\alpha \pi_2}[f] = \sum_{t \in T} \int_{\alpha^{-1}(t)} \operatorname{E}_{s_2 \sim \pi_2(t)}[f(s_1,s_2)] d\pi_1(s_1)\]

\[\operatorname{E}_{\pi_1 \otimes_\alpha \pi_2}[f] = \sum_{t \in T} \int_{\alpha^{-1}(t)} \operatorname{E}_{s_2 \sim \pi_2(\alpha(s_1))}[f(s_1,s_2)] d\pi_1(s_1)\]

\[\operatorname{E}_{\pi_1 \otimes_\alpha \pi_2}[f] = \int_{S_1} \operatorname{E}_{s_2 \sim \pi_2(\alpha(s_1))}[f(s_1,s_2)] d\pi_1(s_1)\]

\[\operatorname{E}_{\pi_1 \otimes_\alpha \pi_2}[f] = \operatorname{E}_{s_1 \sim \pi_1}[\operatorname{E}_{s_2 \sim \pi_2(\alpha(s_1))}[f(s_1, s_2)]]\]

Proposition A.6

Given \({\pi_1 \in {\mathcal{P}}(S_1)}\), \({\pi_2: T \rightarrow {\mathcal{P}}(S_2)}\) and \({\mu \in {\mathcal{P}}(E)}\)

\[U(\pi_1 \otimes_\alpha \pi_2, \mu) = \operatorname{E}_{\pi_1 \times \mu}[u_1] + \mu(A) \operatorname{E}_{t \sim \alpha_*\pi_1}[\operatorname{E}_{\pi_2(t) \times \mu \mid A}[u_2(t)]]\]

Proof of Proposition A.6

Applying Proposition A.5:

\[U(\pi_1 \otimes_\alpha \pi_2, \mu) = \operatorname{E}_{s_1 \sim \pi_1}[\operatorname{E}_{s_2 \sim \pi_2(\alpha(s_1))}[\operatorname{E}_{e \sim \mu}[u_1(s_1,e)+\chi_A(e) u_2(s_2,\alpha(s_1),e)]]]\]

\[U(\pi_1 \otimes_\alpha \pi_2, \mu) = \operatorname{E}_{\pi_1 \times \mu}[u_1] + \operatorname{E}_{s_1 \sim \pi_1}[\operatorname{E}_{s_2 \sim \pi_2(\alpha(s_1))}[\mu(A) \operatorname{E}_{e \sim \mu \mid A}[u_2(s_2,\alpha(s_1),e)]]]\]

\[U(\pi_1 \otimes_\alpha \pi_2, \mu) = \operatorname{E}_{\pi_1 \times \mu}[u_1] + \mu(A) \operatorname{E}_{t \sim \alpha_*\pi_1}[\operatorname{E}_{\pi_2(t) \times \mu \mid A}[u_2(t)]]\]

Proposition A.7

Consider any \({\pi \in {\mathcal{P}}(S_1 \times S_2)}\). Define \({\pi_1 \in {\mathcal{P}}(S_1)}\) and \({\pi_2: T \rightarrow {\mathcal{P}}(S_2)}\) by

\[\pi_1 := \operatorname{pr}_{1*}\pi\]

\[\pi_2(t) := \operatorname{pr}_{2*} (\pi \mid \alpha^{-1}(t) \times S_2)\]

If for some \({t_0 \in T}\) we have \({\pi(\alpha^{-1}(t_0) \times S_2)=0}\), then \({\pi_2(t_0)}\) can be arbitrary.

Then,

\[\forall \mu \in {\mathcal{P}}(E):U(\pi,\mu)=U(\pi_1 \otimes_\alpha \pi_2, \mu)\]

Proof of Proposition A.7

\[U(\pi, \mu) = {\operatorname{E}_{\substack{(s_1,s_2) \sim \pi \\ e \sim \mu}}}[u_1(s_1,e) + \chi_A(e) u_2(s_2, \alpha(s_1), e)]\]

\[U(\pi, \mu) = \operatorname{E}_{\pi_1 \times \mu}[u_1] + \mu(A) {\operatorname{E}_{\substack{(s_1,s_2) \sim \pi \\ e \sim \mu \mid A}}}[u_2(s_2, \alpha(s_1), e)]\]

\[U(\pi, \mu) = \operatorname{E}_{\pi_1 \times \mu}[u_1] + \mu(A) \sum_{t \in T} \operatorname{Pr}_{(s_1, s_2) \sim \pi}[\alpha(s_1)=t] {\operatorname{E}_{\substack{(s_1,s_2) \sim \pi \mid \alpha^{-1}(t) \times S_2 \\ e \sim \mu \mid A}}}[u_2(s_2, t, e)]\]

\[U(\pi, \mu) = \operatorname{E}_{\pi_1 \times \mu}[u_1] + \mu(A) \operatorname{E}_{t \sim \alpha_*\pi_1}[\operatorname{E}_{\pi_2(t) \times \mu \mid A}[u_2(t)]]\]

Applying Proposition A.6, we get the desired result.

Proof of Proposition 3

Consider any \({\pi_2: T \rightarrow {\mathcal{P}}(S_2)}\). By definition of \({\pi^*}\)

\[{\min_{\mu \in \Phi} U(\pi_1^* \otimes_\alpha \pi_2, \mu) \leq \min_{\mu \in \Phi} U(\pi^*, \mu)}\]

Applying Proposition A.7 to the right hand side

\[{\min_{\mu \in \Phi} U(\pi_1^* \otimes_\alpha \pi_2, \mu) \leq \min_{\mu \in \Phi} U(\pi_1^* \otimes_\alpha \pi_2^*, \mu)}\]

Applying Proposition A.6 to both sides, we get the desired result.


Given any \({x \in {\mathcal{O}}^*}\), we denote \({x{{\mathcal{O}}^\omega}:=\{xy \mid y \in {{\mathcal{O}}^\omega}\}}\) and \({\bar{x}{{\mathcal{O}}^\omega}:={{\mathcal{O}}^\omega}\setminus x{{\mathcal{O}}^\omega}}\).

Proof of Proposition 4

We have

\[\operatorname{E}_{\pi \times \mu}[u] = \mu(x{{\mathcal{O}}^\omega}) \operatorname{E}_{\pi \times \mu \mid x {{\mathcal{O}}^\omega}}[u] + (1 - \mu(x{{\mathcal{O}}^\omega})) \operatorname{E}_{\pi \times \mu \mid \bar{x} {\mathcal{O}}^*}[u]\]

\[\operatorname{E}_{(\pi \otimes_x \rho) \times \mu}[u] = \mu(x{{\mathcal{O}}^\omega}) \operatorname{E}_{(\pi \otimes_x \rho) \times \mu \mid x {{\mathcal{O}}^\omega}}[u] + (1 - \mu(x{{\mathcal{O}}^\omega})) \operatorname{E}_{\pi \times \mu \mid \bar{x} {{\mathcal{O}}^\omega}}[u]\]

\[\operatorname{E}_{(\pi \otimes_x \rho) \times \mu}[u]-\operatorname{E}_{\pi \times \mu}[u] = \mu(x{{\mathcal{O}}^\omega})(\operatorname{E}_{(\pi \otimes_x \rho) \times \mu \mid x {{\mathcal{O}}^\omega}}[u]-\operatorname{E}_{\pi \times \mu \mid x {{\mathcal{O}}^\omega}}[u])\]

Denote \({u_1(s,e):=\sum_{n \leq {\lvert x \rvert}} \gamma(n) r(e^s_{<n})}\), \({u_2(s,e):=\sum_{n > {\lvert x \rvert}} \gamma(n) r(e^s_{<n})}\). We have

\[\operatorname{E}_{\pi \times \mu \mid x {{\mathcal{O}}^\omega}}[u] = \operatorname{E}_{\pi \times \mu \mid x {{\mathcal{O}}^\omega}}[u_1] + \operatorname{E}_{\pi \times \mu \mid x {\mathcal{O}}^*}[u_2]\]

\[\operatorname{E}_{(\pi \otimes_x \rho) \times \mu \mid x {{\mathcal{O}}^\omega}}[u] = \operatorname{E}_{\pi \times \mu \mid x {{\mathcal{O}}^\omega}}[u_1] + \operatorname{E}_{(\pi \otimes_x \rho) \times \mu \mid x {{\mathcal{O}}^\omega}}[u_2]\]

\[\operatorname{E}_{(\pi \otimes_x \rho) \times \mu}[u]-\operatorname{E}_{\pi \times \mu}[u] = \mu(x{{\mathcal{O}}^\omega})(\operatorname{E}_{(\pi \otimes_x \rho) \times \mu \mid x {{\mathcal{O}}^\omega}}[u_2]-\operatorname{E}_{\pi \times \mu \mid x {{\mathcal{O}}^\omega}}[u_2])\]

The desired result now follows from \({(\inf r) \sum_{n > {\lvert x \rvert}} \gamma(n) \leq u_2 \leq (\sup r) \sum_{n > {\lvert x \rvert}} \gamma(n)}\).

Definition A.2

A splitting of \({(S,E,u)}\) is \({(S_1,S_2,A,T,\alpha,u_1,u_2)}\) s.t. \({S \cong S_1 \times S_2}\), \({A \subseteq E}\) is Borel, \({T}\) is a finite set, \({\alpha: S_1 \rightarrow T}\) is Borel measurable, \({u_1: S_1 \times E \rightarrow {\mathbb{R}}}\) and \({u_2: S_2 \times T \times E \rightarrow {\mathbb{R}}}\) are continuous and

\[u(s_1,s_2,e) = u_1(s_1,e) + \chi_A(e) u_2(s_2, \alpha(s_1), e)\]

Proposition A.8

Consider \({\Phi \in {\mathcal{P}_{\operatorname{C}}}(E)}\), \({\nu^* \in {\mathcal{P}}(E)}\) and \({p \in [0,1]}\), and denote \({\Psi:=p\Phi+(1-p)\nu^*}\). Suppose that \({(S_1,S_2,A,T,\alpha,u_1,u_2)}\) is a splitting of \({(S,E,u)}\). Assume \({\nu^*(A) > 0}\). Fix \({\pi \in {\mathcal{P}}(S_1)}\). Let \({\rho^*: T \rightarrow {\mathcal{P}}(S_2)}\) satisfy

\[\rho^* \in {\underset{\rho: T \rightarrow {\mathcal{P}}(S_2)}{\operatorname{arg\,max}}\,} \min_{\xi \in \Psi} U(\pi \otimes_\alpha \rho, \xi)\]

Denote \({\Gamma := \sup u_2 - \inf u_2}\). Then, for any \({\mu_0 \in \Phi}\) s.t. \({\mu_0(A) > 0}\)

\[U(\pi \otimes_\alpha \rho^*,\mu_0) \geq \max_{\rho: T \rightarrow {\mathcal{P}}(S_2)} \min_{\mu \in \Phi} U(\pi \otimes_\alpha \rho, \mu)-2\Gamma\mu_0(A){\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)\]

Proof of Proposition A.8

Let \({\tau^*: T \rightarrow {\mathcal{P}}(S_2)}\) satisfy

\[\tau^* \in {\underset{\rho: T \rightarrow {\mathcal{P}}(S_2)}{\operatorname{arg\,max}}\,} \min_{\mu \in \Phi} U(\pi \otimes_\alpha \rho, \mu)\]

Denote \({u^*:= \min_{\mu \in \Phi} U(\pi \otimes_\alpha \tau^*, \mu)}\). Consider any \({\mu \in \Phi}\) and denote \({\xi:=p\mu+(1-p)\nu^*}\). We have

\[U(\pi \otimes_\alpha \tau^*, \xi) = p U(\pi \otimes_\alpha \tau^*, \mu') + (1-p) U(\pi \otimes_\alpha \tau^*, \nu^*)\]

\[U(\pi \otimes_\alpha \tau^*, \xi) \geq pu^* + (1-p) U(\pi \otimes_\alpha \tau^*, \nu^*)\]

Applying Proposition A.6 to the right hand side

\[U(\pi \otimes_\alpha \tau^*, \xi) \geq pu^* + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\tau^*(t) \times \nu^*}[u_2 \mid A]])\]

\[U(\pi \otimes_\alpha \tau^*, \xi) \geq pu^* + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)(\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\tau^*(t) \times \mu_0}[u_2 \mid A]]- \Gamma{\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)))\]

Also, we have

\[U(\pi \otimes_\alpha \tau^*, \mu_0) \geq u^*\]

Applying Proposition A.6 again

\[\operatorname{E}_{\pi \times \mu_0}[u_1] + \mu_0(A)\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\tau^*(t) \times \mu_0}[u_2 \mid A]] \geq u^*\]

\[\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\tau^*(t) \times \mu_0}[u_2 \mid A]] \geq \frac{u^* - \operatorname{E}_{\pi \times \mu_0}[u_1]}{\mu_0(A)}\]

Combining the inequalities

\[U(\pi \otimes_\alpha \tau^*, \xi) \geq pu^* + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)(\frac{u^* - \operatorname{E}_{\pi \times \mu_0}[u_1]}{\mu_0(A)}- \Gamma{\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)))\]

\[U(\pi \otimes_\alpha \tau^*, \xi) \geq pu^* + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \frac{\nu^*(A)}{\mu_0(A)}(u^* - \operatorname{E}_{\pi \times \mu_0}[u_1]- \Gamma\mu_0(A){\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)))\]

\[U(\pi \otimes_\alpha \tau^*, \xi) \geq (p + (1-p) \frac{\nu^*(A)}{\mu_0(A)})u^* + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] - \frac{\nu^*(A)}{\mu_0(A)}(\operatorname{E}_{\pi \times \mu_0}[u_1]+ \Gamma\mu_0(A){\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)))\]

The above inequality holds for any \({\xi}\), therefore

\[\min_{\xi \in \Psi} U(\pi \otimes_\alpha \tau^*, \xi) \geq (p + (1-p) \frac{\nu^*(A)}{\mu_0(A)})u^* + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] - \frac{\nu^*(A)}{\mu_0(A)}(\operatorname{E}_{\pi \times \mu_0}[u_1]+ \Gamma\mu_0(A){\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)))\]

By definition of \({\rho^*}\), it follows that

\[\min_{\xi \in \Psi} U(\pi \otimes_\alpha \rho^*, \xi) \geq (p + (1-p) \frac{\nu^*(A)}{\mu_0(A)})u^* + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] - \frac{\nu^*(A)}{\mu_0(A)}(\operatorname{E}_{\pi \times \mu_0}[u_1]+ \Gamma\mu_0(A){\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)))\]

Denote \({\xi_0:= p \mu_0 + (1-p) \nu^*}\) and \({u:=U(\pi \otimes_\alpha \rho^*, \mu_0)}\). We have

\[U(\pi \otimes_\alpha \rho^*, \xi_0) = pu + (1-p) U(\pi \otimes_\alpha \rho^*, \nu^*)\]

Applying Proposition A.6 to the right hand side

\[U(\pi \otimes_\alpha \rho^*, \xi_0) = pu + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\rho^*(t) \times \nu^*}[u_2 \mid A]])\]

\[U(\pi \otimes_\alpha \rho^*, \xi_0) \leq pu + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)(\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\rho^*(t) \times \mu_0}[u_2 \mid A]]+\Gamma {\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A))\]

On the other hand, Proposition A.6 implies that

\[\operatorname{E}_{\pi \times \mu_0}[u_1] + \mu_0(A)\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\rho^*(t) \times \mu_0}[u_2 \mid A]] = u\]

\[\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\rho^*(t) \times \mu_0}[u_2 \mid A]] = \frac{u-\operatorname{E}_{\pi \times \mu_0}[u_1]}{\mu_0(A)}\]

Substituting in the inequality, we get

\[U(\pi \otimes_\alpha \rho^*, \xi_0) \leq pu + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)(\frac{u-\operatorname{E}_{\pi \times \mu_0}[u_1]}{\mu_0(A)}+\Gamma {\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A))\]

\[U(\pi \otimes_\alpha \rho^*, \xi_0) \leq pu + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \frac{\nu^*(A)}{\mu_0(A)}(u-\operatorname{E}_{\pi \times \mu_0}[u_1]+\Gamma \mu_0(A) {\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A))\]

\[U(\pi \otimes_\alpha \rho^*, \xi_0) \leq (p + (1-p)\frac{\nu^*(A)}{\mu_0(A)})u + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] - \frac{\nu^*(A)}{\mu_0(A)}(\operatorname{E}_{\pi \times \mu_0}[u_1]-\Gamma \mu_0(A) {\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A))\]

Observing that \({U(\pi \otimes_\alpha \rho^*, \xi_0) \geq \min_{\xi \in \Psi} U(\pi \otimes_\alpha \rho^*, \xi)}\) we can combine the inequalities and get

\[(p + (1-p)\frac{\nu^*(A)}{\mu_0(A)})(u-u^*)\geq 2 (1-p) ( - \frac{\nu^*(A)}{\mu_0(A)}\Gamma\mu_0(A){\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A))\]

If \({p = 1}\), we get \({u \geq u^*}\), as desired. If \({p < 1}\), we divide both sides by \({(1-p)\frac{\nu^*(A)}{\mu_0(A)}}\) and get

\[(\frac{p}{1-p}\frac{\mu_0(A)}{\nu^*(A)}+1)(u - u^*) \geq - 2\Gamma\mu_0(A){\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)\]

\[u \geq u^* - \frac{2\Gamma\mu_0(A){\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)}{\frac{p}{1-p}\frac{\mu_0(A)}{\nu^*(A)}+1}\]

\[u \geq u^* - 2\Gamma\mu_0(A){\operatorname{d}_{\text{tv}}}(\mu_0 \mid A, \nu^* \mid A)\]

Proposition A.9

In the setting of Proposition A.8, assuming \({p > 0}\) but sans the assumption \({\nu^*(A) > 0}\):

\[U(\pi \otimes_\alpha \rho^*,\mu_0) \geq \max_{\rho: T \rightarrow {\mathcal{P}}(S_2)} \min_{\mu \in \Phi} U(\pi \otimes_\alpha \rho, \mu)-\frac{1-p}{p}\Gamma\nu^*(A)\]

Proof of Proposition A.9

We use the same notation as in the proof of Proposition A.8. As before, we have

\[\min_{\xi \in \Psi} U(\pi \otimes_\alpha \tau^*, \xi) \geq pu^* + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\tau^*(t) \times \nu^*}[u_2 \mid A]])\]

\[U(\pi \otimes_\alpha \rho^*, \xi_0) = pu + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\rho^*(t) \times \nu^*}[u_2 \mid A]])\]

Combining, we get

\[pu + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\rho^*(t) \times \nu^*}[u_2 \mid A]]) \geq pu^* + (1-p) (\operatorname{E}_{\pi \times \nu^*}[u_1] + \nu^*(A)\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\tau^*(t) \times \nu^*}[u_2 \mid A]])\]

\[u \geq u^* + \frac{1-p}{p} \nu^*(A)\operatorname{E}_{t \sim \alpha_* \pi}[\operatorname{E}_{\tau^*(t) \times \nu^*}[u_2 \mid A]-\operatorname{E}_{\rho^*(t) \times \nu^*}[u_2 \mid A]]\]

\[u \geq u^* + \frac{1-p}{p} \Gamma \nu^*(A)\]

Definition A.3

Denote \({x{\mathcal{O}}^*:=\{xy \mid y \in {\mathcal{O}}^*\}}\), \(\bar{x}{\mathcal{O}}^*:={\mathcal{O}}^* \setminus x{\mathcal{O}}^*\). Given any \({x \in {\mathcal{O}}^*}\), the splitting of \({({{\mathcal{O}}^* \rightarrow {\mathcal{A}}},{{\mathcal{O}}^\omega},u)}\) at \({x}\) is defined as follows.

\[S_1 := \bar{x}{\mathcal{O}}^* \rightarrow {\mathcal{A}}\]

\[S_2 := x{\mathcal{O}}^* \rightarrow {\mathcal{A}}\]

\[A := x{{\mathcal{O}}^\omega}\]

\[T := {\mathcal{A}}^{{\lvert x \rvert}}\]

\[\alpha(s)_n:=s(x_{<n})\]

Given \({s: \bar{x}{\mathcal{O}}^* \rightarrow {\mathcal{A}}}\) and \({e \in x{{\mathcal{O}}^\omega}}\), define \({e^s \in ({\mathcal{A}}\times {\mathcal{O}})^{{\lvert x \rvert}}}\) by \({e_n^s:=(s(e_{<n}),e_n)}\). Given \({e \in \bar{x}{{\mathcal{O}}^\omega}}\), define \({e^s \in ({\mathcal{A}}\times {\mathcal{O}})^\omega}\) by \({e_n^s:=(s(e_{<n}),e_n)}\). Define \({u_1}\) by

\[u_1(s,e):=\sum_{n = 0}^{{\lvert e^s \rvert}} \gamma(n) r(e^s_{<n})\]

Given \({t \in {\mathcal{A}}^{{\lvert x \rvert}}}\), \({s: x{\mathcal{O}}^* \rightarrow {\mathcal{A}}}\) and \({e \in x{{\mathcal{O}}^\omega}}\), define \({e^{ts} \in ({\mathcal{A}}\times {\mathcal{O}})^\omega}\) by

\[e^{ts}_n:=\begin{cases}(t_n,e_n) \text{ if } n < {\lvert x \rvert}\\(s(e_{<n},e_n) \text{ if } n \geq {\lvert x \rvert}\end{cases}\]

Define \({u_2}\) by

\[u_2(s,t,e):=\begin{cases}\sum_{n = {\lvert x \rvert}+1}^{\infty} \gamma(n) r(e^{ts}_{<n}) \text{ if } x \sqsubset e\\0 \text{ otherwise}\end{cases}\]

It is easy to see that this is indeed a splitting.

Proof of Theorem 1

Fix \({\mu_0 \in \Phi}\). Let \({\nu^* \in \Phi'}\) be as in Proposition 2. Define \({M,N \subseteq {{\mathcal{O}}^\omega}}\) by

\[M:=\{e \in {{\mathcal{O}}^\omega}\mid \lim_{n \rightarrow \infty} {\operatorname{d}_{\text{tv}}}(\mu_0 \mid e_{<n}{{\mathcal{O}}^\omega}, \nu^* \mid e_{<n}{{\mathcal{O}}^\omega}) = 0\}\]

\[N:=\{e \in {{\mathcal{O}}^\omega}\mid \lim_{n \rightarrow \infty} \frac{\nu^*(e_{<n}{{\mathcal{O}}^\omega})}{\mu_0(e_{<n}{{\mathcal{O}}^\omega})} = 0\}\]

By Corollary B, \({\mu_0(M \cup N) = 1}\).

Consider any \({e \in M}\). By definition of \({M}\), for any \({n \in {\mathbb{N}}}\) we have \({\mu_0(e_{<n}{{\mathcal{O}}^\omega}) > 0}\) and \({\nu^*(e_{<n}{{\mathcal{O}}^\omega}) > 0}\). Therefore, we can apply Proposition A.8 to the splitting of \({({{\mathcal{O}}^* \rightarrow {\mathcal{A}}},{{\mathcal{O}}^\omega},u)}\) at \({e_{<n}}\), with \({\pi = \pi^*}\). This gives us

\[U(\pi^*,\mu_0) \geq v(e_{<n})-2\Gamma_n \mu_0(e_{<n}{{\mathcal{O}}^\omega}){\operatorname{d}_{\text{tv}}}(\mu_0 \mid e_{<n}{{\mathcal{O}}^\omega}, \nu^* \mid e_{<n}{{\mathcal{O}}^\omega})\]

Here, \({\Gamma_n \leq (\sup r - \inf r) \sum_{m > n} \gamma(n)}\).

\[\frac{v(e_{<n}) - U(\pi^*,\mu_0)}{\Gamma_n \mu_0(e_{<n}{{\mathcal{O}}^\omega})} \leq 2{\operatorname{d}_{\text{tv}}}(\mu_0 \mid e_{<n}{{\mathcal{O}}^\omega}, \nu^* \mid e_{<n}{{\mathcal{O}}^\omega})\]

By definition of \({M}\), this implies

\[\lim_{n \rightarrow \infty} \max(\frac{v(e_{<n}) - U(\pi^*,\mu_0)}{\Gamma_n \mu_0(e_{<n}{{\mathcal{O}}^\omega})},0) = 0 \]

Now, consider any \({e \in N}\). By definition of \({N}\), for any \({n \in N}\) we have \({\mu_0(e_{<n}{{\mathcal{O}}^\omega}) > 0}\). Therefore, we can apply Proposition A.9 to the splitting of \({({{\mathcal{O}}^* \rightarrow {\mathcal{A}}},{{\mathcal{O}}^\omega},u)}\) at \({e_{<n}}\), with \({\pi = \pi^*}\). This gives us

\[U(\pi^*,\mu_0) \geq v(e_{<n})-\frac{1-p}{p} \Gamma_n \nu^*(e_{<n}{{\mathcal{O}}^\omega})\]

\[\frac{v(e_{<n}) - U(\pi^*,\mu_0)}{\Gamma_n\mu_0(e_{<n}{{\mathcal{O}}^\omega})} \leq \frac{1-p}{p} \frac{\nu^*(e_{<n}{{\mathcal{O}}^\omega})}{\mu_0(e_{<n}{{\mathcal{O}}^\omega})}\]

By definition of \({N}\), this also implies

\[\lim_{n \rightarrow \infty} \max(\frac{v(e_{<n}) - U(\pi^*,\mu_0)}{\Gamma_n \mu_0(e_{<n}{{\mathcal{O}}^\omega})},0) = 0 \]

Appendix B

The following theorem is adapted from Blackwell and Dubbins (1962).

Theorem B.1

Consider any \({\mu, \nu \in {\mathcal{P}}({{\mathcal{O}}^\omega})}\). Assume that \({\mu}\) is absolutely continuous with respect to \({\nu}\). Then:

\[\operatorname{Pr}_{e \sim \mu}[\lim_{n \rightarrow \infty} {\operatorname{d}_{\text{tv}}}(\mu \mid e_{<n}{{\mathcal{O}}^\omega}, \nu \mid e_{<n}{{\mathcal{O}}^\omega}) = 0] = 1\]

Theorem B.2

Consider any \({\mu, \nu \in {\mathcal{P}}({{\mathcal{O}}^\omega})}\). Define \({\{D_n: {{\mathcal{O}}^\omega}\rightarrow {\mathbb{R}}\sqcup \{+\infty\}\}_{n \in {\mathbb{N}}}}\) by

\[D_n(x):=\begin{cases}\frac{\mu(x_{<n}{{\mathcal{O}}^\omega})}{\nu(x_{<n}{{\mathcal{O}}^\omega})} \text{ if } \nu(x_{<n}{{\mathcal{O}}^\omega}) > 0\\+\infty \text{ otherwise}\end{cases}\]

Define \({D:= \limsup_{n \rightarrow \infty} D_n}\). Then, for any Borel \({A \subseteq {{\mathcal{O}}^\omega}}\):

\[\mu(A) = \int_A D(x) d\nu(x) + \mu(A \cap D^{-1}(+\infty))\]

Proof of Theorem B.2

This is almost a special case of Theorem 5.3.3 in Durret (2010), except that we don’t assume \({\mu}\) is locally absolutely continuous w.r.t. \({\nu}\) (i.e. we don’t assume that \({\mu(x{{\mathcal{O}}^\omega})=0}\) whenever \({\nu(x{{\mathcal{O}}^\omega})=0}\)). Careful reading of the proof shows that in our case it doesn’t matter: \({D_n}\) (\({X_n}\) is Durret’s notation) is no longer a \({\nu}\)-martingale but is still a \({\nu}\)-supermartingale, and the identity \({X_n = \frac{Y_n}{Z_n}}\) still holds if the fraction is understood to stand for \({+\infty}\) whenever \({Z_n = 0}\).

Corollary B

Consider any \({\mu, \nu \in {\mathcal{P}}({{\mathcal{O}}^\omega})}\). Define

\[A:=\{e \in {{\mathcal{O}}^\omega}\mid \lim_{n \rightarrow \infty} {\operatorname{d}_{\text{tv}}}(\mu \mid e_{<n}{{\mathcal{O}}^\omega}, \nu \mid e_{<n}{{\mathcal{O}}^\omega}) = 0\}\]

\[B:=\{e \in {{\mathcal{O}}^\omega}\mid \lim_{n \rightarrow \infty} \frac{\nu(e_{<n}{{\mathcal{O}}^\omega})}{\mu(e_{<n}{{\mathcal{O}}^\omega})} = 0\}\]

Then, \({\mu(A \cup B) = 1}\).

Proof of Corollary B

Assume that \({\operatorname{E}_\nu[D] > 0}\). Then, we can define \({\mu_r \in {\mathcal{P}}({{\mathcal{O}}^\omega})}\) by

\[\mu_r(C):=\frac{\int_{C} D(x) d\nu(x)}{\int_{{{\mathcal{O}}^\omega}} D(x) d\nu(x)}\]

Obviously, \({\mu_r}\) is absolutely continuous w.r.t. \({\nu}\), therefore we can use Theorem B.1 to conclude

\[\operatorname{Pr}_{e \sim \mu_r}[\lim_{n \rightarrow \infty} {\operatorname{d}_{\text{tv}}}(\mu_r \mid e_{<n}{{\mathcal{O}}^\omega}, \nu \mid e_{<n}{{\mathcal{O}}^\omega}) = 0] = 1\]

By Theorem B.2, \({\mu_r}\) is dominated by \({\mu}\) and in particular is absolutely continuous w.r.t. \({\mu}\). Therefore, again by Theorem B.1:

\[\operatorname{Pr}_{e \sim \mu_r}[\lim_{n \rightarrow \infty} {\operatorname{d}_{\text{tv}}}(\mu_r \mid e_{<n}{{\mathcal{O}}^\omega}, \mu \mid e_{<n}{{\mathcal{O}}^\omega}) = 0] = 1\]

Using the triangle inequality for \({{\operatorname{d}_{\text{tv}}}}\), we get

\[\operatorname{Pr}_{e \sim \mu_r}[\lim_{n \rightarrow \infty} {\operatorname{d}_{\text{tv}}}(\mu \mid e_{<n}{{\mathcal{O}}^\omega}, \nu \mid e_{<n}{{\mathcal{O}}^\omega}) = 0] = 1\]

By definition of \({\mu_r}\) and \({A}\), this means that

\[\frac{\int_{A} D(x) d\nu(x)}{\int_{{{\mathcal{O}}^\omega}} D(x) d\nu(x)} = 1\]

\[\int_{A} D(x) d\nu(x) = \int_{{{\mathcal{O}}^\omega}} D(x) d\nu(x)\]

This holds even without the assumption \({\operatorname{E}_\nu[D] > 0}\), since if \({\operatorname{E}_\nu[D] = 0}\) then both sides vanish.

\({\frac{\nu(e_{<n}{{\mathcal{O}}^\omega})}{\mu(e_{<n}{{\mathcal{O}}^\omega})}}\) is \({\mu}\)-supermatingale, therefore (by Doob’s convergence theorem) its limit exists \({\mu}\)-almost surely. If the limit exists for some \({e \in {{\mathcal{O}}^\omega}}\), then it vanishes iff \({D(e)=+\infty}\). It follows that

\[\mu(B \cap D^{-1}(+\infty)) = \mu(D^{-1}(+\infty))\]

By Theorem B.2

\[\mu(A \cup B) = \int_{A \cup B} D(x) d\nu(x) + \mu((A \cup B) \cap D^{-1}(+\infty))\]

\[\mu(A \cup B) \geq \int_{A} D(x) d\nu(x) + \mu(B \cap D^{-1}(+\infty))\]

\[\mu(A \cup B) \geq \int_{{{\mathcal{O}}^\omega}} D(x) d\nu(x) + \mu(D^{-1}(+\infty))\]

Applying Theorem B.2 to the right hand side:

\[\mu(A \cup B) \geq \mu({{\mathcal{O}}^\omega}) = 1\]



by Stuart Armstrong 727 days ago | link

Would you mind putting the break *** closer to the top of the post? Cheers!

reply

by Stuart Armstrong 718 days ago | link

Thanks! :-)

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms