 A measure-theoretic generalization of logical induction
discussion post by Vanessa Kosoy 919 days ago | Jessica Taylor and Scott Garrabrant like this | discuss

Logical induction is defined in terms of logical sentences and theories, but its principles are applicable in much greater generality and abstraction. Indeed, one such generalization was studied under the name “universal induction.” We proposed a slightly different generalization in order to model reasoning with incomplete models. Here, we describe a formalism that includes all these cases and many more, using the language of measure theory. This provides the following advantages:

• The formalism is applicable to event spaces substantially different from truth assignments or bit sequences, e.g. we can consider sequences of real numbers.

• The formalism treats probabilities and expectations on the same footing, rather than constructing expectations as in section 4.8 of original paper. We consider this more convenient.

• In our opinion, this language is more mathematically natural than the original formalism, at least for applications unrelated to formal logic.

On the other hand, we ignore all computational considerations. Obviously these are often important, but in the study of purely information-theoretic questions the use of numerical approximations only serves to obscure.

All proofs are in the Appendix.

## Results

Fix $${X}$$ a compact Polish space. For example, $${X}$$ might be $${\mathcal{O}^\omega}$$ or the space of propositionally consistent truth assignments in some language or $${[0,1]^\omega}$$. The role of “pricings” is served by $${{\mathcal{P}}(X)}$$: the space of probability measures on $${X}$$. A market is thus a sequence $${\{\mu_n \in {\mathcal{P}}(X)\}_{n \in {\mathbb{N}}}}$$. The “deductive process” is replaced by a sequence of closed sets $${X = X_0 \supseteq X_1 \supseteq X_2 \supseteq \ldots}$$ A trading strategy is a continuous function $${\tau: {\mathcal{P}}(X) \rightarrow C(X)}$$, were $${{\mathcal{P}}(X)}$$ is equipped with the weak* topology (as before) and $${C(X)}$$ is the space of continuous functions from $${X}$$ to $${{\mathbb{R}}}$$ equipped with the uniform convergence topology. Here, we should think of $${\tau(\mu)}$$ as the share portfolio acquired by the strategy given pricing $${\mu}$$, where the cost of the acquisition is understood to be $${\operatorname{E}_\mu[\tau(\mu)]}$$ while the ultimate value of the portfolio is $${\tau(\mu)(x)}$$ (in the following we will use the less clumsy notation $${\tau(\mu,x)}$$; in fact, we can equivalently define $${\mathcal{T}(X)}$$ as the set of continuous functions from $${\mathcal{P}}(X) \times X$$ to $${\mathbb{R}}$$) for some $${x \in \bigcap_n X_n}$$. We denote the set of trading strategies by $${{\mathcal{T}(X)}}$$. A trader is a sequence $${\{T_n: {\mathcal{P}}(X)^n \rightarrow {\mathcal{T}(X)}\}_{n \in {\mathbb{N}}}}$$, where the functions can be arbitrary (don’t have to be continuous in any sense). The argument of these functions refers to the market pricings on previous days.

Analogously to Lemma 5.1.1 in “Logical Induction” (existence of “market maker”), we have:

# Proposition 1

For any $${\tau \in {\mathcal{T}(X)}}$$, there is $${\mu \in {\mathcal{P}}(X)}$$ s.t.

$\operatorname{E}_{x \sim \mu}[\tau(\mu,x)] = \max_{x \in X} \tau(\mu,x)$

Analogous to Definition 5.2.1 (“budgeter”), we have:

# Proposition 2

Given $${\tau \in {\mathcal{T}(X)}}$$, define $${{\operatorname{W}}\tau: {\mathcal{P}}(X) \rightarrow C(X)}$$ by

${\operatorname{W}}\tau(\mu,x):=\tau(\mu,x)-\operatorname{E}_{y \sim \mu}[\tau(\mu,y)]$

Fix a trader $${T}$$. Define $${{\operatorname{\Sigma W}}T_n: {\mathcal{P}}(X)^n \rightarrow C(X)}$$ by

${\operatorname{\Sigma W}}T_n(\{\mu_m\}_{m < n}, x):=\sum_{m<n} {\operatorname{W}}T_m(\{\mu_l\}_{l \leq m},x)$

Define $${{\operatorname{\Sigma W}_{\min}}T_n: {\mathcal{P}}(X)^n \rightarrow {\mathbb{R}}}$$ by

${\operatorname{\Sigma W}_{\min}}T_n(\{\mu_m\}_{m < n}):=\min_{x \in X_{n-1}} {\operatorname{\Sigma W}}T_n(\{\mu_m\}_{m < n}, x)$

(For $${n=0}$$, the above definition is understood to mean $$0$$)

Fix $${b > 0}$$. Assume $${n \in {\mathbb{N}}}$$ and $$\{\mu_m \in {\mathcal{P}}(X)\}_{m < n}$$ are s.t.

${\operatorname{\Sigma W}_{\min}}T_n(\{\mu_m\}_{m < n}) > -b$

Then, we can define $${{\operatorname{N}_b}T_n(\{\mu_m\}_{m < n}): {\mathcal{P}}(X) \rightarrow {\mathbb{R}}}$$ by

${\operatorname{N}_b}T_n(\{\mu_m\}_{m \leq n}) :=\max(1,\max_{x \in X_n} \frac{-{\operatorname{W}}T_n(\{\mu_m\}_{m \leq n}, x)}{b + {\operatorname{\Sigma W}}T_n(\{\mu_m\}_{m < n},x)})^{-1}$

Finally, define $${\{{\operatorname{B}_b}T_n: {\mathcal{P}}(X)^n \times {\mathcal{P}}(X) \rightarrow C(X)\}_{n \in {\mathbb{N}}}}$$ by

${\operatorname{B}_b}T_n(\{\mu_m\}_{m \leq n})=\begin{cases}0 \text{ if } \exists m \leq n: {\operatorname{\Sigma W}_{\min}}T_m(\{\mu_l\}_{l < m}) \leq -b, \\ {\operatorname{N}_b}T_n(\{\mu_m\}_{m \leq n}) \cdot T_n(\{\mu_m\}_{m \leq n}) \text{ otherwise} \end{cases}$

Then:

• $${{\operatorname{B}_b}T}$$ is also a trader (i.e. it is continuous in the last argument).

• If $${\{\mu_m\}_{m<n}}$$ is s.t. $$\forall m \leq n: {\operatorname{\Sigma W}_{\min}}T_m(\{\mu_l\}_{l<m}) > -b$$, then $${\forall m < n: {\operatorname{B}_b}T_m(\{\mu_l\}_{l<m}) = T_m(\{\mu_l\}_{l<m})}$$.

• $${\operatorname{\Sigma W}_{\min}}{\operatorname{B}_b}T_n \geq -b$$

The analogue of Definition 5.3.2 (“trading firm”) is as follows:

# Proposition 3

Consider a family of traders $$\{T^k\}_{k \in {\mathbb{N}}}$$ and $${\zeta: {\mathbb{N}}\times {\mathbb{N}}\rightarrow [0,1]}$$ s.t.

$\sum_{k=0}^\infty \sum_{b=1}^\infty \zeta(k,b) b = b_\zeta < \infty$

Define $${T^\zeta_n}$$ as follows:

$T^\zeta_n = \sum_{k=0}^n \sum_{b=1}^\infty \zeta(k,b) {\operatorname{B}_b}T^k_n$

Then, the above sum is convergent and defines a trader. Moreover:

${\operatorname{\Sigma W}_{\min}}T^\zeta_n \geq -b_\zeta$

The analogue of Definition 5.4.1 (logical inductor construction):

# Proposition 4

Consider the setting of Proposition 3. Then, there are $${\{\mu^*_n \in {\mathcal{P}}(X_n)\}_{n \in {\mathbb{N}}}}$$ s.t.

$\operatorname{E}_{x \sim \mu^*_n}[T^\zeta_n(\{i_{m*}\mu^*_m\}_{m \leq n},x)] = \max_{x \in X_n} T^\zeta_n(\{i_{m*}\mu^*_m\}_{m \leq n},x)$

Analogously to Definition 3.5.1 (“exploitation”) we have:

# Definition

A market $${\mu}$$ is said to dominate a trader $${T}$$ relatively to $${\{X_n\}}$$ when

• Denoting $${i_n: X_n \rightarrow X}$$ the inclusion mapping, $${\mu_n}$$ is in the image of $${i_{n*}}$$.

• The following set of real numbers is either unbounded from below or bounded from above:

${\mathcal{W}}(T,\mu):=\{{\operatorname{\Sigma W}}T_{n+1}(\{\mu_m\}_{m \leq n},x) \mid n \in {\mathbb{N}},\, x \in X_n\}$

Finally, the analogue of Theorem 5.4.2:

# Theorem

Consider the setting of Proposition 4, and assume that $${\forall k,b: \zeta(k,b) > 0}$$. Then, $${\{i_{n*}\mu^*_n\}_{n \in {\mathbb{N}}}}$$ dominates $${T^k}$$ for every $${k}$$.

# Proposition A.1

Given $${\tau \in {\mathcal{T}(X)}}$$, define $${E_\tau: {\mathcal{P}}(X) \times {\mathcal{P}}(X) \rightarrow {\mathbb{R}}}$$ by

$E_\tau(\nu,\mu):=\operatorname{E}_\nu[\tau(\mu)]$

Then, $${E_\tau}$$ is continuous.

# Proof of Proposition A.1

Consider $${\mu_i \rightarrow \mu}$$ and $${\nu_i \rightarrow \nu}$$. We have

$\operatorname{E}_{\nu_i}[\tau(\mu_i)] = \operatorname{E}_{\nu_i}[\tau(\mu)] + \operatorname{E}_{\nu_i}[\tau(\mu_i) - \tau(\mu)]$

${\lvert \operatorname{E}_{\nu_i}[\tau(\mu_i)] - \operatorname{E}_{\nu_i}[\tau(\mu)] \rvert} \leq \max{\lvert \tau(\mu_i) - \tau(\mu) \rvert}$

By continuity of $${\tau}$$, $${\tau(\mu_i) \rightarrow \tau(\mu)}$$ and therefore, $$\max{\lvert \tau(\mu_i) - \tau(\mu) \rvert} \rightarrow 0$$. We get

$\lim_{i \rightarrow \infty} {\lvert \operatorname{E}_{\nu_i}[\tau(\mu_i)] - \operatorname{E}_{\nu_i}[\tau(\mu)] \rvert} = 0$

Since $${\nu_i \rightarrow \nu}$$ and $${\tau(\mu)}$$ is continuous, we have $${\operatorname{E}_{\nu_i}[\tau(\mu)] \rightarrow \operatorname{E}_{\nu}[\tau(\mu)]}$$ and therefore

$\lim_{i \rightarrow \infty} \operatorname{E}_{\nu_i}[\tau(\mu_i)] = \operatorname{E}_{\nu}[\tau(\mu)]$

# Proof of Proposition 1

Define $${K \subseteq {\mathcal{P}}(X) \times {\mathcal{P}}(X)}$$ as follows:

$K:=\{(\mu,\nu) \mid \operatorname{E}_\nu[\tau(\mu)] = \max \tau(\mu)\}$

For any $${\mu}$$, denote $${K(\mu):=K \cap (\mu \times {\mathcal{P}}(X))}$$. $${K}(\mu)$$ is convex due to linearity of expected value. $${K(\mu)}$$ is non-empty because given $${x^* \in {\underset{x \in X}{\operatorname{arg\,max}}\,} \tau(\mu,x)}$$, $${\delta_{x^*} \in K(\mu)}$$.

Consider $${\mu_i \rightarrow \mu}$$ and $${\nu_i \rightarrow \nu}$$ s.t. $${(\mu_i,\nu_i) \in K}$$. We have

$\operatorname{E}_{\nu_i}[\tau(\mu_i)] = \max \tau(\mu_i)$

By Proposition A.1, the left hand side converges to $${\operatorname{E}_{\nu}[\tau(\mu)]}$$. Since $${\tau(\mu_i) \rightarrow \tau(\mu)}$$, the right hand side converges to $${\max \tau(\mu)}$$. We get:

$\operatorname{E}_{\nu}[\tau(\mu)] = \max \tau(\mu)$

Therefore, $${(\mu,\nu) \in K}$$ and $${K}$$ is a closed set. Applying the Kakutani-Glicksberg-Fan theorem, we get the desired result.

# Proof of Proposition 2

$${{\operatorname{W}}\tau}$$ is continuous by Proposition A.1. Therefore, $${{\operatorname{N}_b}T}$$ is continuous in the last argument and $${{\operatorname{B}_b}T}$$ is also continuous in the last argument.

Assume $${\{\mu_m \in {\mathcal{P}}(X)\}_{m<n}}$$ is s.t.

$\forall m \leq n: {\operatorname{\Sigma W}_{\min}}T_m(\{\mu_l\}_{l<m}) > -b$

Then, for any $${m < n}$$ we are in the second case in the definition of $${{\operatorname{B}_b}T_m(\{\mu_l\}_{l<m})}$$. Moreover, we have

${\operatorname{\Sigma W}_{\min}}T_{m+1}(\{\mu_l\}_{l \leq m}) >\ -b$

$\forall x \in X_{m}: {\operatorname{\Sigma W}}T_{m+1}(\{\mu_l\}_{l \leq m}, x) >\ -b$

$\forall x \in X_{m}: {\operatorname{\Sigma W}}T_{m}(\{\mu_l\}_{l < m}, x) + {\operatorname{W}}T_{m}(\{\mu_l\}_{l \leq m}, x) >\ -b$

$\forall x \in X_{m}: b + {\operatorname{\Sigma W}}T_{m}(\{\mu_l\}_{l < m}, x) > -{\operatorname{W}}T_{m}(\{\mu_l\}_{l \leq m}, x)$

Using the assumption again, the left hand side is positive. It follows that

$\forall x \in X_{m+1}: 1 > \frac{-{\operatorname{W}}T_{m}(\{\mu_l\}_{l \leq m}, x)}{b + {\operatorname{\Sigma W}}T_{m}(\{\mu_l\}_{l < m}, x)}$

${\operatorname{N}_b}T_m(\{\mu_l\}_{l \leq m}) = 1$

${\operatorname{B}_b}T_m(\{\mu_l\}_{l \leq m}) = T_m(\{\mu_l\}_{l \leq m})$

Now, fix any $${\{\mu_m \in {\mathcal{P}}(X)\}_{m<n}}$$. Let $${m_0 \in {\mathbb{N}}}$$ be the largest number s.t. $${m_0 \leq n}$$ and

$\forall m \leq m_0: {\operatorname{\Sigma W}_{\min}}T_m(\{\mu_l\}_{l<m}) > -b$

For any $${m \leq m_0}$$, we have

${\operatorname{\Sigma W}_{\min}}{\operatorname{B}_b}T_m(\{\mu_l \}_{l<m})={\operatorname{\Sigma W}_{\min}}T_m(\{\mu_l \}_{l<m}) > -b$

(Note that the sum in the definition of $${{\operatorname{\Sigma W}_{\min}}{\operatorname{B}_b}T_m}$$ only involves $${{\operatorname{B}_b}T_l}$$ for $${l < m \leq m_0}$$)

For $${m=m_0+1}$$, we have

${\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m_0+1}(\{\mu_l \}_{l \leq m_0},x) = {\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m_0}(\{\mu_l \}_{l < m_0},x) + {\operatorname{W}}{\operatorname{B}_b}T_{m_0}(\{\mu_l \}_{l \leq m_0},x)$

The first term only involves $${{\operatorname{B}_b}T_l}$$ for $${l < m_0}$$, and we are still in the second case in the definition of the second term, therefore

${\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m_0+1}(\{\mu_l \}_{l \leq m_0},x) = {\operatorname{\Sigma W}}T_{m_0}(\{\mu_l \}_{l < m_0},x) + {\operatorname{N}_b}T_{m_0}(\{\mu_l \}_{l \leq m_0}) {\operatorname{W}}T_{m_0}(\{\mu_l \}_{l \leq m_0},x)$

If $${x}$$ is s.t. $${{\operatorname{W}}T_{m_0}(\{\mu_l \}_{l \leq m_0},x) \geq 0}$$, then

${\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m_0+1}(\{\mu_l \}_{l \leq m_0},x) \geq {\operatorname{\Sigma W}}T_{m_0}(\{\mu_l \}_{l < m_0},x)$

${\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m_0+1}(\{\mu_l \}_{l \leq m_0},x) \geq {\operatorname{\Sigma W}_{\min}}T_{m_0}(\{\mu_l \}_{l < m_0}) > -b$

If $${x}$$ is s.t. $${{\operatorname{W}}T_{m_0}(\{\mu_l \}_{l \leq m_0},x) < 0}$$, then

${\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m_0+1}(\{\mu_l \}_{l \leq m_0},x) \geq {\operatorname{\Sigma W}}T_{m_0}(\{\mu_l \}_{l < m_0},x) + \frac{b + {\operatorname{\Sigma W}}T_{m_0}(\{\mu_l\}_{l < {m_0}},x)}{-{\operatorname{W}}T_{m_0}(\{\mu_l\}_{l \leq {m_0}}, x)} {\operatorname{W}}T_{m_0}(\{\mu_l \}_{l \leq m_0},x)$

${\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m_0+1}(\{\mu_l \}_{l \leq m_0}) \geq -b$

We got that $${\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m_0+1}(\{\mu_l \}_{l \leq m_0},x) \geq -b$$ for all $${x \in X_{m_0}}$$, and therefore

${\operatorname{\Sigma W}_{\min}}{\operatorname{B}_b}T_{m_0+1}(\{\mu_l \}_{l \leq m_0}) = \min_{x \in X_{m_0}} {\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m_0+1}(\{\mu_l \}_{l \leq m_0},x) \geq -b$

Finally, consider $${m > m_0 + 1}$$.

${\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m}(\{\mu_l \}_{l < m},x) = {\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m - 1}(\{\mu_l \}_{l < m - 1},x) + {\operatorname{W}}{\operatorname{B}_b}T_{m-1}(\{\mu_l \}_{l < m},x)$

Now we are in the first case in the definition of $${{\operatorname{W}}{\operatorname{B}_b}T_{m-1}}$$, therefore the second term vanishes.

${\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m}(\{\mu_l \}_{l < m},x) = {\operatorname{\Sigma W}}{\operatorname{B}_b}T_{m - 1}(\{\mu_l \}_{l < m - 1},x)$

By induction on $${m}$$, we conclude:

${\operatorname{\Sigma W}_{\min}}{\operatorname{B}_b}T_{m}(\{\mu_l \}_{l < m}) \geq {\operatorname{\Sigma W}_{\min}}{\operatorname{B}_b}T_{m - 1}(\{\mu_l \}_{l < m - 1}) \geq -b$

# Proposition A.2

If $${X,Y}$$ are compact Polish spaces and $${f: X \times Y \rightarrow {\mathbb{R}}}$$ is continuous, then $${F: X \rightarrow C(Y)}$$ defined by $${F(x)(y):=f(x,y)}$$ is continuous.

# Proof of Proposition A.2

We already proved an equivalent proposition: see “Proposition A.1” here.

# Proof of Proposition 3

The definition of $${{\operatorname{B}_b}}$$ implies that

${\lvert {\operatorname{B}_b}T^k_n(\{\mu_m \}_{m \leq n},x) \rvert} \leq {\lvert T^k_n(\{\mu_m \}_{m \leq n},x) \rvert}$

Observe that

$Z(k):=\sum_{b=1}^\infty \zeta(k,b) \leq \sum_{b=1}^\infty \zeta(k,b)b < \infty$

As a result, the definition of $${T^\zeta_n}$$ is pointwise absolutely convergent:

$\sum_{b=1}^\infty \zeta(k,b) {\lvert {\operatorname{B}_b}T^k_n(\{\mu_m \}_{m \leq n},x) \rvert} \leq Z(k) {\lvert T^k_n(\{\mu_m \}_{m \leq n},x) \rvert}$

Moreover, this series converges uniformly absolutely in $${\mu_n}$$ and $${x}$$:

$\sum_{b=b_0}^\infty \zeta(k,b) {\lvert {\operatorname{B}_b}T^k_n(\{\mu_m \}_{m \leq n},x) \rvert} \leq \max_{\substack{\nu \in {\mathcal{P}}(X) \\ y \in X}} {\lvert T^k_n(\{\mu_m \}_{m < n},\nu,y) \rvert}\sum_{b=b_0}^\infty \zeta(k,b) \underset{b_0 \rightarrow \infty}{\rightarrow} 0$

By the uniform limit theorem, the series defines a continuous function of $${\mu_n}$$ and $${x}$$. By Proposition A.2, it follows that $${T^\zeta}$$ is a trader.

Now, let’s examine $${{\operatorname{\Sigma W}}T^\zeta}$$:

${\operatorname{\Sigma W}}T^\zeta_n(\{\mu_m \}_{m \leq n},x)={\operatorname{\Sigma W}}\sum_{k=0}^n \sum_{b=1}^\infty \zeta(k,b) {\operatorname{B}_b}T^k_n(\{\mu_m \}_{m \leq n},x)$

Since convergence is uniform absolute, the sum commutes with $${{\operatorname{\Sigma W}}}$$.

${\operatorname{\Sigma W}}T^\zeta_n(\{\mu_m \}_{m \leq n},x)=\sum_{k=0}^n \sum_{b=1}^\infty \zeta(k,b) {\operatorname{\Sigma W}}{\operatorname{B}_b}T^k_n(\{\mu_m \}_{m \leq n},x)$

By Proposition 2:

${\operatorname{\Sigma W}}T^\zeta_n(\{\mu_m \}_{m \leq n},x)\geq -\sum_{k=0}^n \sum_{b=1}^\infty \zeta(k,b)b = -\zeta_b$

# Proof of Proposition 4

We construct $${\mu^*_n}$$ recursively in $${n}$$. Define $${\tau_n \in \mathcal{T}(X_n)}$$ by

$\tau_n(\nu,x):=T_n^\zeta(\{i_{m*}\mu^*_m\}_{m<n}, i_{n*} \nu, x)$

(pushforward is obviously continuous in the weak* topology)

Now construct $${\mu^*_n}$$ by applying Proposition 1 to $${\tau_n}$$.

# Proof of Theorem

We have

${\operatorname{W}}T^\zeta_n(\{i_{m*}\mu^*_m\}_{m \leq n}, x) = T^\zeta_n(\{i_{m*}\mu^*_m\}_{m \leq n}, x) - \operatorname{E}_{y \sim \mu_n^*}[T^\zeta_n(\{i_{m*}\mu^*_m\}_{m \leq n}, y)]$

By definition of $${\mu^*}$$

${\operatorname{W}}T^\zeta_n(\{i_{m*}\mu^*_m\}_{m \leq n}, x) = T^\zeta_n(\{i_{m*}\mu^*_m\}_{m \leq n}, x) - \max_{y \in X_n} T^\zeta_n(\{i_{m*}\mu^*_m\}_{m \leq n}, y)$

$\forall x \in X_n: {\operatorname{W}}T^\zeta_n(\{i_{m*}\mu^*_m\}_{m \leq n}, x) \leq 0$

$\forall x \in X_n: {\operatorname{\Sigma W}}T^\zeta_{n+1}(\{i_{m*}\mu^*_m\}_{m \leq n}, x) \leq 0$

Fix $${k \in {\mathbb{N}}}$$ and assume $${\inf {\mathcal{W}}(T^k,i_*\mu^*) > -b}$$ for some $${b > 0}$$ (otherwise $${T^k}$$ is dominated). Define $${\xi: {\mathbb{N}}\times {\mathbb{N}}\rightarrow [0,1]}$$ by

$\xi(j,c):=\begin{cases}\zeta(j,c) \text { when } (j,c) \ne (k,b)\\0 \text{ when } (j,c)=(k,b)\end{cases}$

We get $${T^\zeta_n = T^\xi_n + [[n \geq k]] \zeta(k,b) {\operatorname{B}_b}T^k}$$ and therefore

$\forall x \in X_n: {\operatorname{\Sigma W}}T^\xi_{n+1}(\{i_{m*}\mu^*_m\}_{m \leq n}, x) + [[n \geq k]] \zeta(k,b) {\operatorname{\Sigma W}}{\operatorname{B}_b}T^k_{n+1}(\{i_{m*}\mu^*_m\}_{m \leq n}, x) \leq 0$

By Proposition 2 and the definition of $${b}$$, we can remove $${{\operatorname{B}_b}}$$ in the second term.

$\forall x \in X_n: {\operatorname{\Sigma W}}T^\xi_{n+1}(\{i_{m*}\mu^*_m\}_{m \leq n}, x) + [[n \geq k]] \zeta(k,b) {\operatorname{\Sigma W}}T^k_{n+1}(\{i_{m*}\mu^*_m\}_{m \leq n}, x) \leq 0$

$\forall x \in X_n: [[n \geq k]] \zeta(k,b) {\operatorname{\Sigma W}}T^k_{n+1}(\{i_{m*}\mu^*_m\}_{m \leq n}, x) \leq -{\operatorname{\Sigma W}}T^\xi_{n+1}(\{i_{m*}\mu^*_m\}_{m \leq n}, x)$

By Proposition 3, the right hand side is bounded from above by $${b_\xi}$$, therefore

$\forall x \in X_n: [[n \geq k]] \zeta(k,b) {\operatorname{\Sigma W}}T^k_{n+1}(\{i_{m*}\mu^*_m\}_{m \leq n}, x) \leq b_\xi$

$\sup {\mathcal{W}}(T^k,i_* \mu^*) \leq \max(\frac{b_\xi}{\zeta(k,b)} , \max_{\substack{n < k \\ x \in X_n}} {\operatorname{\Sigma W}}T^k_{n+1}(\{i_{m*}\mu^*_m\}_{m \leq n}, x))$

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes