post by Vadim Kosoy 1269 days ago | Nate Soares likes this | 2 comments

We introduce a variant of optimal predictor schemes where optimality holds within the space of random algorithms with logarithmic advice. These objects are also guaranteed to exist for the error space $$\Delta_{avg}^2$$. We introduce the class of generatable problems and construct a uniform universal predictor scheme for this class which is optimal in the new sense with respect to the $$\Delta^2_{avg}$$ error space. This is achieved by a construction similar to Levin’s universal search.

# New notation

Given $$n \in {\mathbb{N}}$$, $$ev_n: {\mathbb{N}}\times {{\{ 0, 1 \}^*}}^{n+1} \xrightarrow{alg} {{\{ 0, 1 \}^*}}$$ is the following algorithm. When $$ev_n^k(Q,x_1 \ldots x_n)$$ is computed, $$Q$$ is interpreted as a program and $$Q(x_1 \ldots x_n)$$ is executed for time $$k$$. The resulting output is produced.

The notation $$ev^k(Q,x_1 \ldots x_n)$$ means $$ev_n^k(Q,x_1 \ldots x_n)$$.

$$\beta: {{\{ 0, 1 \}^*}}\rightarrow [0,1]$$ is the mapping from a binary expansion to the corresponding real number.

Given $$\mu$$ a word ensemble, $$X$$ a set, $$Q: {{\{ 0, 1 \}^*}}^2 \xrightarrow{alg} X$$, $$T_Q^{\mu}(k,s)$$ stands for the maximal runtime of $$Q(x,y)$$ for $$x \in \operatorname{supp}\mu^k$$, $$y \in {{\{ 0, 1 \}^{s}}}$$.

Previous posts focused on prediction of distributional decision problems, which is the “computational uncertainty” analogue of probability. Here, we use the broader concept of predicting distributional estimation problems (functions), which is analogous to expectation value.

# Definition 1

A distributional estimation problem is a pair $$(f, \mu)$$ where $$f: {{\{ 0, 1 \}^*}}\rightarrow [0,1]$$ is an arbitrary function (even irrational values are allowed) and $$\mu$$ is a word ensemble.

# Definition 2

Given an appropriate set $$X$$, consider $$P: {\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} X$$, $$r: {\mathbb{N}}^2 \rightarrow {\mathbb{N}}$$ polynomial and $$a: {\mathbb{N}}^2 \rightarrow {{\{ 0, 1 \}^*}}$$. The triple $$(P, r, a)$$ is called an $$X$$-valued $$(poly, log)$$-bischeme when

1. The runtime of $$P(k, j, x, y, z)$$ is bounded by $$p(k, j)$$ with $$p$$ polynomial.

2. $$|a(k,j)| \leq c_1 + c_2 \log (k+1) + c_3 \log (j+1)$$ for some $$c_1,c_2,c_3 \in {\mathbb{N}}$$.

A $$[0,1]$$-valued $$(poly, log)$$-bischeme will also be called a $$(poly, log)$$-predictor scheme.

We think of $$P$$ as a random algorithm where the second word parameter represents its internal coin tosses. The third word parameter represents the advice and we usually substitute $$a$$ there.

We will use the notations $$P^{kj}(x,y,z):=P(k,j,x,y,z)$$, $$a^{kj}:=a(k,j)$$.

# Definition 3

Fix $$\Delta$$ an error space of rank 2 and $$(f, \mu)$$ a distributional estimation problem. Consider $$(P, r, a)$$ a $$(poly, log)$$-predictor scheme. $$(P, r, a)$$ is called a $$\Delta(poly, log)$$-optimal predictor scheme for $$(f,\mu)$$ when for any $$(poly, log)$$-predictor scheme $$(Q,s,b)$$, there is $$\delta \in \Delta$$ s.t.

$E_{\mu^k \times U^{r(k,j)}}[(P^{kj}(x,y,a^{kj})-f(x))^2] \leq E_{\mu^k \times U^{s(k,j)}}[(Q^{kj}(x,y,b^{kj})-f(x))^2] + \delta(k,j)$

# Note 1

The notation $$(poly, log)$$ is meant to remind us that we allow a polynomial quantity of random bits $$r(k,j)$$ and a logarithmic quantity of advice $$|a^{kj}|$$. In fact, the definitions and some of the theorems can be generalized to other quantities of random and advice (see also Note B.1). Thus, predictor schemes from previous posts are $$(poly,poly)$$-predictor schemes, $$(poly,O(1))$$-predictor schemes are limited to O(1) advice, $$(log,0)$$-predictor schemes use a logarithmic number of random bits and no advice and so on. As usual in complexity theory, it is redundant to consider more advice than random since advice is strictly more powerful.

$$\Delta(poly, log)$$-optimal predictor scheme satisfy properties analogical to $$\Delta$$-optimal predictor schemes. These properties are listed in Appendix A. The proofs of Theorem A.1 and A.4 are given in Appendix B. The other proofs are straightforward adaptions of corresponding proofs with polynomial advice.

We also have the following existence result:

# Theorem 1

Consider $$(f,\mu)$$ a distributional estimation problem. Define $$\Upsilon: {\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} [0,1]$$ by

$\Upsilon^{kj}(x,y,Q) := \beta(ev^j(Q,x,y))$

Define $$\upsilon_{f,\mu}: {\mathbb{N}}^2 \rightarrow {{\{ 0, 1 \}^*}}$$ by

$\upsilon_{f,\mu}^{kj}:={\underset{|Q| \leq \log j}{\operatorname{arg\,min}}\,} E_{\mu^k \times U^j}[(\Upsilon^{kj}(x,y,Q)-f(x))^2]$

Then, $$(\Upsilon, j, \upsilon_{f,\mu})$$ is a $$\Delta_{avg}^2(poly, log)$$-optimal predictor scheme for $$(f,\mu)$$.

# Note 2

Consider a distributional decision problem $$(D, \mu)$$. Assume $$(D, \mu)$$ admits $$n \in {\mathbb{N}}$$, $$A: {\mathbb{N}}\times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} \{0,1\}$$, $$a: {\mathbb{N}}\rightarrow {{\{ 0, 1 \}^*}}$$ and a function $$r: {\mathbb{N}}\rightarrow {\mathbb{N}}$$ s.t.

1. $$A(k,x,y,z)$$ runs in quasi-polynomial time ($$O(2^{\log^n k})$$).

2. $|a(k)| = O(\log^n k)$

3. ${\lim_{k \rightarrow \infty}{Pr_{\mu^k \times U^{r(k)}}[A(k,x,y,a(k)) \neq \chi_D(x)]}} = 0$

Then it is easy to see we can construct a $$(poly,log)$$-predictor scheme $$P_A$$ taking values in $$\{0,1\}$$ s.t. $$E[(P_A-f)^2] \in \Delta_{avg}^2$$. The implication doesn’t work for larger sizes of time or advice. Therefore, the uncertainty represented by $$\Delta_{avg}^2(poly,log)$$-optimal predictor schemes is associated with the resource gap between quasi-polynomial time plus advice $$O(\log^n k)$$ and the resources needed to (heuristically) solve the problem in question.

The proof of Theorem 1 is given in Appendix C: it is a straightforward adaptation of the corresponding proof for polynomial advice. Evidently, the above scheme is non-uniform. We will now describe a class of problems which admits uniform $$\Delta_{avg}^2(poly, log)$$-optimal predictor schemes.

# Definition 4

Consider $$\Delta^1$$ an error space of rank 1. A word ensemble $$\mu$$ is called $$\Delta^1(log)$$-sampleable when there is $$S: {\mathbb{N}}\times {{\{ 0, 1 \}^*}}^2 \xrightarrow{alg} {{\{ 0, 1 \}^*}}$$ that runs in polynomial time in the 1st argument, $$a^S: {\mathbb{N}}\rightarrow {{\{ 0, 1 \}^*}}$$ of logarithmic size and $$r^S: {\mathbb{N}}\rightarrow {\mathbb{N}}$$ a polynomial such that

$\sum_{x \in {{\{ 0, 1 \}^*}}} |\mu^k(x) - Pr_{U^{r^S(k)}}[S^k(y,a^S(k))=x]| \in \Delta^1$

$$(S, r^S, a^S)$$ is called a $$\Delta^1(log)$$-sampler for $$\mu$$.

# Definition 5

Consider $$\Delta^1$$ an error space of rank 1. A distributional estimation problem $$(f, \mu)$$ is called $$\Delta^1(log)$$-generatable when there are $$S: {\mathbb{N}}\times {{\{ 0, 1 \}^*}}^2 \xrightarrow{alg} {{\{ 0, 1 \}^*}}$$ and $$F: {\mathbb{N}}\times {{\{ 0, 1 \}^*}}^2 \xrightarrow{alg} [0,1]$$ that run in polynomial time in the 1st argument, $$a^S: {\mathbb{N}}\rightarrow {{\{ 0, 1 \}^*}}$$ of logarithmic size and $$r^S: {\mathbb{N}}\rightarrow {\mathbb{N}}$$ a polynomial such that

1. $$(S, r^S, a^S)$$ is a $$\Delta^1(log)$$-sampler for $$\mu$$.

2. $E_{U^{r^S(k)}}[(F^k(y,a^S(k))-f(S^k(y,a^S(k))))^2] \in \Delta^1$

$$(S, F, r^S, a^S)$$ is called a $$\Delta^1(log)$$-generator for $$(f, \mu)$$.

When $$a^S$$ is the empty string, $$(S,F,r^S)$$ is called a $$\Delta^1(0)$$-generator for $$(f, \mu)$$. Such $$(f, \mu)$$ is called $$\Delta^1(0)$$-generatable.

# Note 3

The class of $$\Delta^1(0)$$-generatable problems can be regarded as an average-case analogue of $$NP \cap coNP$$. If $$f$$ is a decision problem (i.e. its range is $$\{0,1\}$$), words $$y \in {{\{ 0, 1 \}^{r^S(k)}}}$$ s.t. $$S^k(y)=x$$, $$F^k(y)=1$$ can be regarded as “proofs” of $$f(x)=1$$ and words $$y \in {{\{ 0, 1 \}^{r^S(k)}}}$$ s.t. $$S^k(y)=x$$, $$F^k(y)=0$$ can be regarded as “proofs” of $$f(x)=0$$.

# Theorem 2

There is an oracle machine $$\Lambda$$ that accepts an oracle of signature $$SF: {\mathbb{N}}\times {{\{ 0, 1 \}^*}}\rightarrow {{\{ 0, 1 \}^*}}\times [0,1]$$ and a polynomial $$r: {\mathbb{N}}\rightarrow {\mathbb{N}}$$ where the allowed oracle calls are $$SF^k(x)$$ for $$|x|=r(k)$$ and computes a function of signature $${\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^2 \rightarrow [0,1]$$ s.t. for any $$(f, \mu)$$ a distributional estimation problem and $$G:=(S, F, r^S, a^S)$$ a corresponding $$\Delta_0^1(log)$$-generator, $$\Lambda[G]$$ is a $$\Delta_{avg}^2(poly,log)$$-optimal predictor scheme for $$(f,\mu)$$.

In particular if $$(f,\mu)$$ is $$\Delta_0^1(0)$$-generatable, we get a uniform $$\Delta_{avg}^2(poly,log)$$-optimal predictor scheme.

The following is the description of $$\Lambda$$. Consider $$SF: {\mathbb{N}}\times {{\{ 0, 1 \}^*}}\rightarrow {{\{ 0, 1 \}^*}}\times [0,1]$$ and a polynomial $$r: {\mathbb{N}}\rightarrow {\mathbb{N}}$$. We describe the computation of $$\Lambda[SF,r]^{kj}(x)$$ where the extra argument of $$\Lambda$$ is regarded as internal coin tosses.

We loop over the first $$j$$ words in lexicographic order. Each word is interpreted as a program $$Q: {{\{ 0, 1 \}^*}}^2 \xrightarrow{alg} [0,1]$$. We loop over $$jk$$ “test runs”. At test run $$i$$, we generate $$(x_i \in {{\{ 0, 1 \}^*}}, t_i \in [0,1])$$ by evaluating $$SF^k(y_i)$$ for $$y_i$$ sampled from $$U^{r(k)}$$. We then sample $$z_i$$ from $$U^j$$ and compute $$s_i:=ev^{j}(Q,x_i,z_i)$$. At the end of the test runs, we compute the average error $$\epsilon(Q):=\frac{1}{jk}\sum_{i} (s_i - t_i)^2$$. At the end of the loop over programs, the program $$Q^*$$ with the lowest error is selected and the output $$ev^{j}(Q^*,x)$$ is produced.

The proof that this construction is $$\Delta_{avg}^2(poly,log)$$-optimal is given in Appendix C.

## Appendix A

Fix $$\Delta$$ an error space of rank 2.

# Theorem A.1

Suppose there is a polynomial $$h: {\mathbb{N}}^2 \rightarrow {\mathbb{N}}$$ s.t. $$h^{-1} \in \Delta$$. Consider $$(f, \mu)$$ a distributional estimation problem and $$(P,r,a)$$ a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f, \mu)$$. Suppose $$\{p_{kj} \in [0,1]\}_{k,j \in {\mathbb{N}}}$$, $$\{q_{kj} \in [0,1]\}_{k,j \in {\mathbb{N}}}$$ are s.t.

$\exists \epsilon > 0 \; \forall k,j: (\mu^k \times U^{r(k,j)})\{(x,y) \in {{\{ 0, 1 \}^*}}^2 \mid p_{kj} \leq P^{kj}(x,y,a^{kj}) \leq q_{kj}\} \geq \epsilon$

Define

$\phi_{kj} := {E_{\mu^k \times U^{r(k,j)}}[f(x)-P^{kj}(x,y,a^{kj}) \mid p_{kj} \leq P^{kj}(x,y,a^{kj}) \leq q_{kj}]}$

Assume that either $$p_{kj}, q_{kj}$$ have a number of digits logarithmically bounded in $$k,j$$ or $$P^{kj}$$ produces outputs with a number of digits logarithmically bounded in $$k,j$$ (by Theorem A.7 if any $$\Delta(poly,log)$$-optimal predictor scheme exists for $$(f, \mu)$$ then a $$\Delta(poly,log)$$-optimal predictor scheme with this property exists as well). Then, $$|\phi| \in \Delta$$.

# Theorem A.2

Consider $$\mu$$ a word ensemble and $$f_1, f_2: {{\{ 0, 1 \}^*}}\rightarrow [0,1]$$ s.t. $$f_1 + f_2 \leq 1$$. Suppose $$(P_1,r_1,a_1)$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f_1, \mu)$$ and $$(P_2,r_2,a_2)$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f_2, \mu)$$. Define $$P: {\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} [0,1]$$ by $$P^{kj}(x, y_1 y_2, (z_1, z_2)):=\eta(P^{kj}_1(x,y_1,z_1) + P^{kj}_2(x,y_2,z_2))$$ for $$|y_i|=r_i(k,j)$$. Then, $$(P,r_1+r_2,a_1 a_2)$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f_1 + f_2, \mu)$$.

# Theorem A.3

Consider $$\mu$$ a word ensemble and $$f_1, f_2: {{\{ 0, 1 \}^*}}\rightarrow [0,1]$$ s.t. $$f_1 + f_2 \leq 1$$. Suppose $$(P_1,r_1,a_1)$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f_1, \mu)$$ and $$(P,r_2,a_2)$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f_1 + f_2, \mu)$$. Define $$P_2: {\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} [0,1]$$ by $$P^{kj}_2(x, y_1 y_2, (z_1, z_2)):=\eta(P^{kj}(x,y_1,z_1) - P^{kj}_1(x,y_2,z_2))$$ for $$|y_i|=r_i(k,j)$$. Then, $$(P_2,r_1+r_2,a_1 a_2)$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f_2, \mu)$$.

# Theorem A.4

Fix $$\Delta^1$$ an error space of rank 1 s.t. given $$\delta^1 \in \Delta^1$$, the function $$\delta(k,j):=\delta^1(k)$$ lies in $$\Delta$$. Consider $$(f_1, \mu_1)$$, $$(f_2, \mu_2)$$ distributional estimation problems with respective $$\Delta(poly,log)$$-optimal predictor schemes $$(P_1,r_1,a_1)$$ and $$(P_2,r_2,a_2)$$. Assume $$\mu_1$$ is $$\Delta^1(log)$$-sampleable and $$(f_2, \mu_2)$$ is $$\Delta^1(log)$$-generatable. Define $$f_1 \times f_2: {{\{ 0, 1 \}^*}}\rightarrow [0,1]$$ by $$(f_1 \times f_2)(x_1,x_2)=f_1(x_1) f_2(x_2)$$ and $$(f_1 \times f_2)(y)=0$$ for $$y$$ not of this form. Define $$P: {\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} [0,1]$$ by $$P^{kj}((x_1,x_2), y_1 y_2, (z_1, z_2)):=P^{kj}_1(x_1,y_1,z_1) P^{kj}_2(x_2,y_2,z_2)$$ for $$|y_i|=r_i(k,j)$$. Then, $$(P,r_1 + r_2,(a_1,a_2))$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f_1 \times f_2, \mu_1 \times \mu_2)$$.

# Theorem A.5

Consider $$f: {{\{ 0, 1 \}^*}}\rightarrow [0,1]$$, $$D \subseteq {{\{ 0, 1 \}^*}}$$ and $$\mu$$ a word ensemble. Assume $$(P_D,r_D,a_D)$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(D, \mu)$$ and $$(P_{f \mid D},r_{f \mid D},a_{f \mid D})$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f, \mu \mid D)$$. Define $$P: {\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} [0,1]$$ by $$P^{kj}(x, y_1 y_2, (z_1, z_2)):=P^{kj}_D(x,y_1,z_1) P^{kj}_{f \mid D}(x,y_2,z_2)$$ for $$|y_i|=r_i(k,j)$$. Then $$(P, r_D+r_{f \mid D},(a_D, a_{f \mid D}))$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(\chi_D f, \mu)$$.

# Theorem A.6

Fix $$h$$ a polynomial s.t. $$2^{-h} \in \Delta$$. Consider $$f: {{\{ 0, 1 \}^*}}\rightarrow [0,1]$$, $$D \subseteq {{\{ 0, 1 \}^*}}$$ and $$\mu$$ a word ensemble. Assume $$\exists \epsilon > 0 \; \forall k: \mu^k(D) \geq \epsilon$$. Assume $$(P_D,r_D,a_D)$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(D, \mu)$$ and $$(P_{\chi_D f},r_{\chi_D f},a_{\chi_D f})$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(\chi_D f, \mu)$$. Define $$P_{f \mid D}: {\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} [0,1]$$ by

$P^{kj}_{f \mid D}(x,y_1 y_2, (z_1, z_2)) := \begin{cases}1 & \text{if } P^{kj}_D(x,y_2,z_2) = 0 \\ \eta(\frac{P^{kj}_{\chi_D f}(x,y_1,z_1)}{P^{kj}_D(x,y_2,z_2)}) & \text{rounded to }h(k,j)\text{ binary places if } P^{kj}_D(x,y_2,z_2) > 0 \end{cases}$

Then, $$(P_{f \mid D},r_D+r_{\chi_D f},(a_{\chi_D f}, a_D))$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f, \mu \mid D)$$.

# Definition A.1

Consider $$\mu$$ a word ensemble and $$\hat{Q}_1:=(Q_1,s_1,b_1)$$, $$\hat{Q}_2:=(Q_2,s_2,b_2)$$ $$(poly,log)$$-predictor schemes. We say $$\hat{Q}_1$$ is $$\Delta$$-similar to $$\hat{Q}_2$$ relative to $$\mu$$ (denoted $$\hat{Q}_1 \underset{\Delta}{\overset{\mu}{\simeq}} \hat{Q}_2$$) when $$E_{\mu^k \times U^{s_1(k,j)} \times U^{s_2(k,j)}}[(Q_1^{kj}(x,y_1,b^{kj}_1)-Q_2^{kj}(x,y_2,b^{kj}_2))^2] \in \Delta$$.

# Theorem A.7

Consider $$(f, \mu)$$ a distributional estimation problem, $$\hat{P}$$ a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f, \mu)$$ and $$\hat{Q}$$ a $$(poly,log)$$-predictor scheme. Then, $$\hat{Q}$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f, \mu)$$ if and only if $$\hat{P} \underset{\Delta}{\overset{\mu}{\simeq}} \hat{Q}$$.

# Note A.1

$$\Delta$$-similarity is not an equivalence relation on the set of arbitrary $$(poly,log)$$-predictor schemes. However, it is an equivalence relation on the set of $$(poly,log)$$-predictor schemes $$\hat{Q}$$ satisfying $$\hat{Q} \underset{\Delta}{\overset{\mu}{\simeq}} \hat{Q}$$ (i.e. the $$\mu$$-expectation value of the intrinsic variance of $$\hat{Q}$$ is in $$\Delta$$). In particular, for any $$f: {{\{ 0, 1 \}^*}}\rightarrow [0,1]$$ any $$\Delta(poly,log)$$-optimal predictor scheme for $$(f,\mu)$$ has this property.

# Definition B.1

Given $$n \in {\mathbb{N}}$$, a function $$\delta: {\mathbb{N}}^{2+n} \rightarrow {\mathbb{R}}^{\geq 0}$$ is called $$\Delta$$-moderate when

1. $$\delta$$ is non-decreasing in arguments $$3$$ to $$2+n$$.

2. For any collection of polynomials $$\{p_i: {\mathbb{N}}^2 \rightarrow {\mathbb{N}}\}_{i < n}$$, $$\delta(k, j, p_0(k,j) \ldots p_{n-1}(k,j)) \in \Delta$$

# Lemma B.1

Fix $$(f, \mu)$$ a distributional estimation problem and $$\hat{P}:=(P,r,a)$$ a $$(poly,log)$$-predictor scheme. Then, $$\hat{P}$$ is $$\Delta(poly,log)$$-optimal iff there is a $$\Delta$$-moderate function $$\delta: {\mathbb{N}}^4 \rightarrow [0,1]$$ s.t. for any $$k,j,s \in {\mathbb{N}}$$, $$Q: {{\{ 0, 1 \}^*}}^2 \xrightarrow{alg} [0,1]$$

$E_{\mu^k \times U^{r(k,j)}}[(P^{kj}(x,y,a^{kj})-f(x))^2] \leq E_{\mu^k \times U^s}[(Q(x,y)-f(x))^2] + \delta(k,j,T_Q^{\mu}(k,s),2^{|Q|})$

# Proof of Lemma B.1

Define

$\delta(k,j,t,u):=\max_{\substack{T_Q^{\mu}(k,s) \leq t \\ |Q| \leq \log u}} \{ E_{\mu^k \times U^{r(k,j)}}[(P^{kj}(x,y,a^{kj})-f(x))^2] - E_{\mu^k \times U^s}[(Q(x,y)-f(x))^2] \}$

# Note B.1

Lemma B.1 shows that the error bound for $$\Delta(poly,log)$$-optimal predictor scheme is in some sense uniform with respect to $$Q$$. This doesn’t generalize to e.g. $$\Delta(poly,O(1))$$-optimal predictor schemes. The latter still admit a weaker version of Theorems A.1 and direct analogues of Theorems A.2, A.3, A.5, A.6 and A.7. Theorem A.4 doesn’t seem to generalize.

# Lemma B.2

Suppose there is a polynomial $$h: {\mathbb{N}}^2 \rightarrow {\mathbb{N}}$$ s.t. $$h^{-1} \in \Delta$$. Fix $$(f, \mu)$$ a distributional estimation problem and $$(P,r,a)$$ a corresponding $$\Delta(poly,log)$$-optimal predictor scheme. Consider $$(Q,s,b)$$ a $$(poly,log)$$-predictor scheme, $$M > 0$$, $$w: {\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} {\mathbb{Q}}\cap [0,M]$$ with runtime bounded by a polynomial in the first two arguments, and $$u: {\mathbb{N}}^2 \rightarrow {{\{ 0, 1 \}^*}}$$ of logarithmic size. Then there is $$\delta \in \Delta$$ s.t.

$E_{\mu^k \times U^{\max(r(k,j),s(k,j))}}[w^{kj}(x,y,u^{kj})(P^{kj}(x,y_{\leq r(k,j)},a^{kj})-f(x))^2] \leq E_{\mu^k \times U^{\max(r(k,j),s(k,j))}}[w^{kj}(x,y,u^{kj})(Q^{kj}(x,y_{\leq s(k,j)},b^{kj})-f(x))^2] + \delta(k, j)$

# Proof of Lemma B.2

Given $$t \in [0,M]$$, define $$\alpha^{kj}(t)$$ to be $$t$$ rounded within error $$h(k,j)^{-1}$$. Thus, the number of digits in $$\alpha^{kj}(t)$$ is logarithmic in $$k$$ and $$j$$. Denote $$q(k,j):=\max(r(k,j),s(k,j))$$. Consider $$\hat{Q}_t:=(Q_t, r+s, b_t)$$ the $$(poly,log)$$-predictor scheme defined by

$Q^{kj}_t(x,y,b^{kj}_t):= \begin{cases} Q^{kj}(x,y_{\leq s(k,j)},b^{kj}) & \text{if } w^{kj}(x,y_{\leq q(k,j)},u^{kj}) \geq \alpha^{kj}(t) \\ P^{kj}(x,y_{\leq r(k,j)},a^{kj}) & \text{if } w^{kj}(x,y_{\leq q(k,j)},u^{kj}) < \alpha^{kj}(t) \end{cases}$

$$\hat{Q}_t$$ satisfies bounds on runtime and advice size uniform in $$t$$. Therefore, Lemma B.1 implies that there is $$\delta \in \Delta$$ s.t.

$E_{\mu^k \times U^{r(k,j)}}[(P^{kj}(x,y,a^{kj})-f(x))^2] \leq E_{\mu^k \times U^{r(k,j)+s(k,j)}}[(Q^{kj}_t(x,y,b^{kj})-f(x))^2] + \delta(k, j)$

$E_{\mu^k \times U^{r(k,j)+s(k,j)}}[(P^{kj}(x,y_{\leq r(k,j)},a^{kj})-f(x))^2-(Q^{kj}_t(x,y,b^{kj})-f(x))^2] \leq \delta(k, j)$

$E_{\mu^k \times U^{q(k,j)}}[\theta(w^{kj}(x,y,u^{kj})-\alpha^{kj}(t))((P^{kj}(x,y_{\leq r(k,j)},a^{kj})-f(x))^2-(Q^{kj}(x,y_{\leq s(k,j)},b^{kj})-f(x))^2)] \leq \delta(k, j)$

$E_{\mu^k \times U^{q(k,j)}}[\int_0^M\theta(w^{kj}(x,y,z,u^{kj})-\alpha^{kj}(t))\,dt\,((P^{kj}(x,y_{\leq r(k,j)},a^{kj})-f(x))^2-(Q^{kj}(x,y_{\leq s(k,j)},b^{kj})-f(x))^2)] \leq M \delta(k, j)$

$E_{\mu^k \times U^{q(k,j)}}[w^{kj}(x,y,z,u^{kj})((P^{kj}(x,y_{\leq r(k,j)},a^{kj})-f(x))^2-(Q^{kj}(x,y_{\leq s(k,j)},b^{kj})-f(x))^2)] \leq M \delta(k, j) + h(k,j)^{-1}$

In the following proofs we will use shorthand notations that omit most of the symbols that are clear for the context. That is, we will use $$P$$ to mean $$P^{kj}(x,y,a^{kj})$$, $$f$$ to mean $$f(x)$$, $$E[\ldots]$$ to mean $$E_{\mu^k \times U^r(k,j)}[\ldots]$$ etc.

# Proof of Theorem A.1

Define $$w: {\mathbb{N}}^2 \times {{\{ 0, 1 \}^*}}^3 \xrightarrow{alg} \{0,1\}$$ and $$u: {\mathbb{N}}^2 \rightarrow {{\{ 0, 1 \}^*}}$$ by

$w:=\theta(P-p)\theta(q-P)$

We have

$\phi = \frac{E[w(f-P)]}{E[w]}$

Define $$\psi$$ to be $$\phi$$ truncated to the first significant binary digit. Denote $$I \subseteq {\mathbb{N}}^2$$ the set of $$(k,j)$$ for which $${\lvert \phi_{kj} \rvert} > h(k,j)^{-1}$$. Consider $$(Q,s,b)$$ a $$(poly,log)$$-predictor scheme satisfying

$\forall (k,j) \in I: Q^{kj}=\eta(P^{kj}+\psi_{kj})$

Such $$Q$$ exists since for $$(k,j) \in I$$, $$\psi_{kj}$$ has binary notation of logarithmically bounded size.

Applying Lemma B.2 we get

$\forall (k,j) \in I: E[w^{kj}(P^{kj}-f)^2] \leq E[w^{kj}(Q^{kj}-f)^2] + \delta(k,j)$

for $$\delta \in \Delta$$.

$\forall (k,j) \in I: E[w^{kj}((P^{kj}-f)^2-(Q^{kj}-f)^2)] \leq \delta(k,j)$

$\forall (k,j) \in I: E[w^{kj}((P^{kj}-f)^2-(\eta(P^{kj}+\psi_{kj})-f)^2)] \leq \delta(k,j)$

Obviously $$(\eta(P^{kj}+\psi_{kj})-f)^2 \leq (P^{kj}+\psi_{kj}-f)^2$$, therefore

$\forall (k,j) \in I: E[w^{kj}((P^{kj}-f)^2-(P^{kj}+\psi_{kj}-f)^2)] \leq \delta(k,j)$

$\forall (k,j) \in I: \psi_{kj} E[w^{kj}(2(f-P^{kj})-\psi_{kj})] \leq \delta(k,j)$

The expression on the left hand side is a quadratic polynomial in $$\psi_{kj}$$ which attains its maximum at $$\phi_{kj}$$ and has roots at $$0$$ and $$2\phi_{kj}$$. $$\psi_{kj}$$ is between $$0$$ and $$\phi_{kj}$$, but not closer to $$0$$ than $$\frac{\phi_{kj}}{2}$$. Therefore, the inequality is preserved if we replace $$\psi_{kj}$$ by $$\frac{\phi_{kj}}{2}$$.

$\forall (k,j) \in I: \frac{\phi_{kj}}{2}E[w^{kj}(2(f-P^{kj})-\frac{\phi_{kj}}{2})] \leq \delta(k,j)$

Substituting the equation for $$\phi_{kj}$$ we get

$\forall (k,j) \in I: \frac{1}{2}\frac{E[w^{kj}(f-P^{kj})]}{E[w^{kj}]}E[w^{kj}(2(f-P^{kj})-\frac{1}{2}\frac{E[w^{kj}(f-P^{kj})]}{E[w^{kj}]})] \leq \delta(k,j)$

$\forall (k,j) \in I: \frac{3}{4}\frac{E[w^{kj}(f-P^{kj})]^2}{E[w^{kj}]} \leq \delta(k,j)$

$\forall (k,j) \in I: \frac{3}{4}E[w^{kj}]\phi_{kj}^2 \leq \delta(k,j)$

$\forall (k,j) \in I: \phi_{kj}^2 \leq \frac{4}{3}E[w^{kj}]^{-1}\delta(k,j)$

$\forall (k,j) \in I: \phi_{kj}^2 \leq \frac{4}{3}(\mu^k \times U^{r(k,j)})\{p_{kj} \leq P^{kj} \leq q_{kj}\}^{-1}\delta(k,j)$

Thus for all $$k,j \in {\mathbb{N}}$$ we have

$|\phi_{kj}| \leq h(k,j)^{-1} + \sqrt{\frac{4}{3}(\mu^k \times U^{r(k,j)})\{p_{kj} \leq P^{kj} \leq q_{kj}\}^{-1}\delta(k,j)}$

In particular, $$|\phi| \in \Delta$$.

# Lemma B.3

Consider $$(f, \mu)$$ a distributional estimation problem and $$(P,r,a)$$ a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f, \mu)$$. Then there are $$c_1,c_2 \in {\mathbb{R}}$$ and a $$\Delta$$-moderate function $$\delta: {\mathbb{N}}^4 \rightarrow [0,1]$$ s.t. for any $$k,j,s \in {\mathbb{N}}$$, $$Q: {{\{ 0, 1 \}^*}}^2 \xrightarrow{alg} {\mathbb{Q}}$$

$|E_{\mu^k \times U^s \times U^{r(k,j)}}[Q(P^{kj}-f)]| \leq (c_1 + c_2 E_{\mu^k \times U^s}[Q^2]) \delta(k,j,T_Q^{\mu}(k,s),2^{|Q|})$

Conversely, consider $$M \in {\mathbb{Q}}$$ and $$(P,r,a)$$ a $${\mathbb{Q}}\cap [-M,+M]$$-valued $$(poly,log)$$-bischeme. Suppose that for any $${\mathbb{Q}}\cap [-M-1,+M]$$-valued $$(poly,log)$$-bischeme $$(Q,s,b)$$ we have $$|E[Q(P-f)]| \in \Delta$$.

Define $$\tilde{P}$$ to be s.t. computing $$\tilde{P}^{kj}$$ is equivalent to computing $$\eta(P^{kj})$$ rounded to $$h(k,j)$$ digits after the binary point, where $$2^{-h} \in \Delta$$. Then, $$\tilde{P}$$ is a $$\Delta(poly,log)$$-optimal predictor scheme for $$(f, \mu)$$.

# Proof of Lemma B.3

Assume $$P$$ is a $$\Delta(poly,log)$$-optimal predictor scheme. Consider $$k,j,s \in {\mathbb{N}}$$, $$Q: {{\{ 0, 1 \}^*}}^2 \xrightarrow{alg} {\mathbb{Q}}$$. Define $$t := \sigma 2^{-a}$$ where $$\sigma \in \{ \pm 1 \}$$ and $$a \in {\mathbb{N}}$$. Define $$R: {{\{ 0, 1 \}^*}}^2 \xrightarrow{alg} [0,1]$$ to compute $$\eta(P + tQ)$$ rounded within error $$2^{-h}$$. By Lemma B.1

$E_{\mu^k \times U^{r(k,j)}}[(P^{kj}-f)^2] \leq E_{\mu^k \times U^{r(k,j)} \times U^s}[(R-f)^2] + \tilde{\delta}(k,j,T_R^\mu(k,r(k,j)+s),2^{|R|})$

where $$\tilde{\delta}$$ is $$\Delta$$-moderate. It follows that

$E_{\mu^k \times U^{r(k,j)}}[(P^{kj}-f)^2] \leq E_{\mu^k \times U^{r(k,j)} \times U^s}[(\eta(P + tQ)-f)^2] + \delta(k,j,T_Q^{\mu}(k,s),2^{|Q|})$

where $$\delta$$ is $$\Delta$$-moderate ($$a$$ doesn’t enter the error bound because of the $$2^{-h}$$ rounding). As in the proof of Theorem A.1, $$\eta$$ can be dropped.

$E_{\mu^k \times U^{r(k,j)} \times U^s}[(P^{kj}-f)^2 - (P^{kj}+tQ-f)^2] \leq \delta(k,j,T_Q^{\mu}(k,s),2^{|Q|})$

The expression on the left hand side is a quadratic polynomial in $$t$$. Explicitly:

$-E_{\mu^k \times U^s}[Q^2]t^2 - 2E_{\mu^k \times U^{r(k,j)} \times U^s}[Q(P^{kj}-f)]t \leq \delta(k,j,T_Q^{\mu}(k,s),2^{|Q|})$

Moving $$E[Q]t^2$$ to the right hand side and dividing both sides by $$2|t|=2^{1-a}$$ we get

$-E_{\mu^k \times U^{r(k,j)} \times U^s}[Q(P^{kj}-f)]\sigma \leq 2^{a-1} \delta(k,j,T_Q^{\mu}(k,s),2^{|Q|}) + E_{\mu^k \times U^s}[Q^2] 2^{-a-1}$

$|E_{\mu^k \times U^{r(k,j)} \times U^s}[Q(P^{kj}-f))]| \leq 2^{a-1} \delta(k,j,T_Q^{\mu}(k,s),2^{|Q|}) + E_{\mu^k \times U^s}[Q^2] 2^{-a-1}$

Take $$a:=-\frac{1}{2}\log \delta(k,j,T_Q^{\mu}(k,s),2^{|Q|})+\phi(k,j)$$ where $$\phi(k,j) \in [-\frac{1}{2}, +\frac{1}{2}]$$ is the rounding error. We get

$|E_{\mu^k \times U^{r(k,j)} \times U^s}[Q(P^{kj}-f)]| \leq 2^{\phi(k,j)-1} \delta(k,j,T_Q^{\mu}(k,s),2^{|Q|})^{\frac{1}{2}} + E_{\mu^k \times U^s}[Q^2] 2^{-\phi(k,j)-1}\delta(k,j,T_Q^{\mu}(k,s),2^{|Q|})^{\frac{1}{2}}$

Conversely, assume that for any $${\mathbb{Q}}\cap [-M-1,+M]$$-valued $$(poly,log)$$-bischeme $$(R,t,c)$$

$|E[R(P-f)]| \leq \delta$

Consider $$(Q,b,s)$$ a $$(poly,log)$$-predictor scheme. We have

$E[(Q - f)^2] = E[(Q - P + P - f)^2]$

$E[(Q - f)^2] = E[(Q - P)^2] + E[(P - f)^2] + 2E[(Q - P)(P - f)]$

$2E[(P - Q)(P - f)] = E[(P - f)^2] - E[(Q - f)^2] + E[(Q - P)^2]$

Taking $$R$$ to be $$P - Q$$ we get

$E[(P - f)^2] - E[(Q-f)^2] + E[(Q-P))^2] \leq \delta$

where $$\delta \in \Delta$$. Noting that $$E[(Q - P)^2] \geq 0$$ and $$(\eta(P) - f)^2 \leq (P - f)^2$$ we get

$E[(\eta(P) - f)^2] - E[(Q - f)^2] \leq \delta$

Observing that $$\tilde{P}-\eta(P)$$ is bounded by a function in $$\Delta$$, we get the desired result.

Theorems A.2 and A.3 follow trivially from Lemma B.3 and we omit the proof.

# Proof of Theorem A.4

We have

$P(x_1,x_2)-(f_1 \times f_2)(x_1,x_2) = (P_1(x_1)- f_1(x_1))f_2(x_2) + P_1(x_1)(P_2(x_2)-f_2(x_2))$

Therefore, for any $${\mathbb{Q}}\cap [-1,+1]$$-valued $$(poly,log)$$-bischeme $$(Q,s,b)$$

$|E[Q(P-f_1 \times f_2)]| \leq |E[Q(x_1,x_2)(P_1(x_1)-f_1(x_1))f_2(x_2)]| + |E[Q(x_1,x_2)P_1(x_1)(P_2(x_2)-f_2(x_2))]|$

By Lemma B.3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. Suppose $$(S_2,F_2,r^S_2,a^S_2)$$ is a $$\Delta^1(log)$$-generator for $$(f_2, \mu_2)$$. For the first term, we have

$|E_{\mu_1^k \times \mu_2^k \times U^{s(k,j)+r_1(k,j)}}[Q^{kj}(x_1,x_2)(P^{kj}_1(x_1)-f_1(x_1))f_2(x_2)]| \leq |E_{\mu_1^k \times U^{r^S_2(k)} \times U^{s(k,j)+r_1(k,j)}}[Q^{kj}(x_1,S^k_2)(P^{kj}_1(x_1)-f_1(x_1))F^k_2]| + \delta^1_2(k)$

where $$\delta^1_2 \in \Delta^1$$. Applying Lemma B.3 for $$P_1$$, we get

$|E_{\mu_1^k \times \mu_2^k \times U^{s(k,j)+r_1(k,j)}}[Q^{kj}(x_1,x_2)(P^{kj}_1(x_1)-f_1(x_1))f_2(x_2)]| \leq \delta_1(k,j) + \delta^1_2(k)$

where $$\delta_1 \in \Delta$$.

Suppose $$(S_1,r^S_1,a^S_1)$$ is a $$\Delta^1(log)$$-sampler for $$\mu_1$$. For the second term, we have

$|E_{\mu_1^k \times \mu_2^k \times U^{s(k,j)+r_1(k,j)}}[Q^{kj}(x_1,x_2)P_1(x_1)(P^{kj}_2(x_2)-f_2(x_2))]| \leq |E_{U^{r^S_1(k)} \times \mu_2^k \times U^{s(k,j)+r_1(k,j)}}[Q^{kj}(S^k_1,x_2)P_1(S^k_1)(P^{kj}_2(x_2)-f_2(x_2))]| + \delta^1_1(k)$

where $$\delta^1_1 \in \Delta^1$$. Applying Lemma B.3 for $$P_2$$, we get

$|E_{\mu_1^k \times \mu_2^k \times U^{s(k,j)+r_1(k,j)}}[Q^{kj}(x_1,x_2)P_1(x_1)(P^{kj}_2(x_2)-f_2(x_2))]| \leq \delta_2(k,j) + \delta^1_1(k)$

where $$\delta_2 \in \Delta$$. Again, we got the required bound.

# Proposition C.1

Consider a polynomial $$q: {\mathbb{N}}^2 \rightarrow {\mathbb{N}}$$. There is a function $$\lambda_q: {\mathbb{N}}^3 \rightarrow [0,1]$$ s.t.

1. $\forall k,j \in {\mathbb{N}}: \sum_{i \in {\mathbb{N}}} \lambda_q(k,j,i) = 1$

2. For any function $$\epsilon: {\mathbb{N}}^2 \rightarrow [0,1]$$ we have

$\epsilon(k,j) - \sum_{i \in {\mathbb{N}}} \lambda_q(k,j,i) \, \epsilon(k,q(k,j)+i) \in \Delta_{avg}^2$

# Proof of Proposition C.1

Given functions $$q_1,q_2: {\mathbb{N}}^2 \rightarrow {\mathbb{N}}$$ s.t. $$q_1(k,j) \geq q_2(k,j)$$ for $$k,j \gg 0$$, the proposition for $$q_1$$ implies the proposition for $$q_2$$ by setting

$\lambda_{q_2}(k,j,i):=\begin{cases}\lambda_{q_1}(k,j,i-q_1(k,j)+q_2(k,j)) & \text{if } i-q_1(k,j)+q_2(k,j) \geq 0 \\ 0 & \text{if } i-q_1(k,j)+q_2(k,j) < 0 \end{cases}$

Therefore, it is enough to prove to proposition for functions of the form $$q(k,j)=j^{m+\frac{n \log k}{\log 3}}$$ for $$m > 0$$.

Consider $$F: {\mathbb{N}}\rightarrow {\mathbb{N}}$$ s.t.

${\lim_{k \rightarrow \infty}{\frac{\log \log k}{\log \log F(k)}}} = 0$

Observe that

${\lim_{k \rightarrow \infty}{\frac{\log (m+\frac{n \log k}{\log 3})}{\log \log F(k) - \log \log 3}}} = 0$

${\lim_{k \rightarrow \infty}{\frac{\int\limits_{x=3}^{3^{m+\frac{n \log k}{\log 3}}} d(\log \log x)}{\log \log F(k) - \log \log 3}}} = 0$

Since $$\epsilon$$ takes values in $$[0,1]$$

${\lim_{k \rightarrow \infty}{\frac{\int\limits_{x=3}^{3^{m+\frac{n \log k}{\log 3}}} \epsilon(k,{\lfloor x \rfloor}) d(\log \log x)}{\log \log F(k) - \log \log 3}}} = 0$

Similarly

${\lim_{k \rightarrow \infty}{\frac{\int\limits_{x=F(k)}^{F(k)^{m+\frac{n \log k}{\log 3}}} \epsilon(k,{\lfloor x \rfloor}) d(\log \log x)}{\log \log F(k) - \log \log 3}}} = 0$

The last two equations imply that

${\lim_{k \rightarrow \infty}{\frac{\int\limits_{x=3}^{F(k)} \epsilon(k,{\lfloor x \rfloor}) d(\log \log x) - \int\limits_{x=3^{m+\frac{n \log k}{\log 3}}}^{F(k)^{m+\frac{n \log k}{\log 3}}} \epsilon(k,{\lfloor x \rfloor}) d(\log \log x)}{\log \log F(k) - \log \log 3}}} = 0$

Raising $$x$$ to a power is equivalent to adding a constant to $$\log \log x$$, therefore

${\lim_{k \rightarrow \infty}{\frac{\int\limits_{x=3}^{F(k)} \epsilon(k,{\lfloor x \rfloor}) d(\log \log x) - \int\limits_{x=3}^{F(k)} \epsilon(k,{\lfloor x^{m+\frac{n \log k}{\log 3}} \rfloor}) d(\log \log x)}{\log \log F(k) - \log \log 3}}} = 0$

${\lim_{k \rightarrow \infty}{\frac{\int\limits_{x=3}^{F(k)} (\epsilon(k,{\lfloor x \rfloor})-\epsilon(k,{\lfloor x^{m+\frac{n \log k}{\log 3}} \rfloor})) d(\log \log x)}{\log \log F(k) - \log \log 3}}} = 0$

Since $${\lfloor x^{m+\frac{n \log k}{\log 3}} \rfloor} \geq {\lfloor x \rfloor}^{m+\frac{n \log k}{\log 3}}$$ we can choose $$\lambda_q$$ satisfying condition (i) so that

$\int\limits_{x=j}^{j+1} \epsilon(k,{\lfloor x^{m+\frac{n \log k}{\log 3}} \rfloor}) d(\log\log x) = (\log\log(j+1)-\log\log j) \sum_i \lambda_q(k,j,i) \, \epsilon(k,j^{m+\frac{n \log k}{\log 3}}+i)$

It follows that

$\int\limits_{x=j}^{j+1} \epsilon(k,{\lfloor x^{m+\frac{n \log k}{\log 3}} \rfloor}) d(\log\log x) = \int\limits_{x=j}^{j+1} \sum_i \lambda_q(k,{\lfloor x \rfloor},i) \, \epsilon(k,{\lfloor x \rfloor}^{m+\frac{n \log k}{\log 3}}+i) d(\log\log x)$

${\lim_{k \rightarrow \infty}{\frac{\int\limits_{x=3}^{F(k)} (\epsilon(k,{\lfloor x \rfloor})-\sum_i \lambda_q(k,{\lfloor x \rfloor},i) \, \epsilon(k,{\lfloor x \rfloor}^{m+\frac{n \log k}{\log 3}}+i)) d(\log \log x)}{\log \log F(k) - \log \log 3}}} = 0$

${\lim_{k \rightarrow \infty}{\frac{\sum_{j=3}^{F(k)-1} (\log\log(j+1)-\log\log j)(\epsilon(k,j)-\sum_i \lambda_q(k,j,i) \, \epsilon(k,j^{m+\frac{n \log k}{\log 3}}+i))}{\log \log F(k) - \log \log 3}}} = 0$

$\epsilon(k,j) - \sum_{i \in {\mathbb{N}}} \lambda_q(k,j,i) \, \epsilon(k,q(k,j)+i) \in \Delta_{avg}^2$

# Lemma C.1

Consider $$(f, \mu)$$ a distributional estimation problem, $$(P,r,a)$$, $$(Q,s,b)$$ $$(poly,log)$$-predictor schemes. Suppose $$p: {\mathbb{N}}^2 \rightarrow {\mathbb{N}}$$ a polynomial and $$\delta \in \Delta_{avg}^2$$ are s.t.

$\forall i,k,j \in {\mathbb{N}}: E[(P^{k,p(k,j)+i}-f)^2] \leq E[(Q^{kj}-f)^2] + \delta(k,j)$

Then $$\exists \delta' \in \Delta_{avg}^2$$ s.t.

$E[(P^{kj}-f)^2] \leq E[(Q^{kj}-f)^2] + \delta'(k,j)$

# Proof of Lemma C.1

By Proposition C.1 we have

$\tilde{\delta}(k,j) := E[(P^{kj}-f)^2] - \sum_i \lambda_p(k,j,i) E[(P^{k,p(k,j)+i}-f)^2] \in \Delta_{avg}^2$

$E[(P^{kj}-f)^2] = \sum_i \lambda_p(k,j,i) E[(P^{k,p(k,j)+i}-f)^2] + \tilde{\delta}(k,j)$

$E[(P^{kj}-f)^2] \leq \sum_i \lambda_p(k,j,i) (E[(Q^{kj}-f)^2] + \delta(k,j)) + \tilde{\delta}(k,j)$

$E[(P^{kj}-f)^2] \leq E[(Q^{kj}-f)^2] + \delta(k,j) + \tilde{\delta}(k,j)$

# Proof of Theorem 1

Define $$\epsilon(k,j)$$ by

$\epsilon(k,j) := E_{\mu^k \times U^j}[(\Upsilon^{kj}(x,y,\upsilon_{f,\mu}^{kj})-f(x))^2]$

It is easily seen that

$\epsilon(k,j) \leq \min_{\substack{|Q| \leq \log j \\ T_Q^{\mu}(k,j) \leq j}} E_{\mu^k \times U^j}[(Q(x,y)-f(x))^2]$

Therefore, there is a polynomial $$p: {\mathbb{N}}^3 \rightarrow {\mathbb{N}}$$ s.t. for any $$(poly,log)$$-predictor scheme $$(Q,s,b)$$

$\forall i,j,k \in {\mathbb{N}}: \epsilon(k,p(s(k,j),T_{Q^{kj}}^{\mu}(k,s(k,j)),2^{|Q|+|b^{kj}|})+i) \leq E_{\mu^k \times U^{s(k,j)}}[(Q^{kj}-f)^2]$

Applying Lemma C.1, we get the desired result.

# Proof of Theorem 2

Consider $$(P,r,a)$$ a $$(poly,log)$$-predictor scheme. Choose $$p: {\mathbb{N}}^2 \rightarrow {\mathbb{N}}$$ a polynomial s.t. evaluating $$\Lambda[G]^{k,p(k,j)}$$ involves running $$P^{kj}$$ until it halts “naturally” (such $$p$$ exists because $$P$$ runs in at most polynomial time and has at most logarithmic advice). Given $$i,j,k \in {\mathbb{N}}$$, consider the execution of $$\Lambda[G]^{k,p(k,j)+i}$$. The standard deviation of $$\epsilon(P^{kj})$$ with respect to the internal coin tosses of $$\Lambda$$ is at most $$((p(k,j)+i)k)^{-\frac{1}{2}}$$. The expectation value is $$E[(P^{kj}-f)^2]+\gamma_P$$ where $$|\gamma_P| \leq \delta(k)$$ for $$\delta \in \Delta_0^1$$ which doesn’t depend on $$i,k,j,P$$. By Chebyshev’s inequality,

$Pr[\epsilon(P^{kj}) \geq E[(P^{kj}-f)^2] + \delta(k) + ((p(k,j)+i)k)^{-\frac{1}{4}}] \leq ((p(k,j)+i)k)^{-\frac{1}{2}}$

Hence

$Pr[\epsilon(Q^*) \geq E[(P^{kj}-f)^2] + \delta(k) + ((p(k,j)+i)k)^{-\frac{1}{4}}] \leq ((p(k,j)+i)k)^{-\frac{1}{2}}$

The standard deviation of $$\epsilon(Q)$$ for any $$Q$$ is also at most $$((p(k,j)+i)k)^{-\frac{1}{2}}$$. The expectation value is $$E[(ev^{p(k,j)+i}(Q)-f)^2]+\gamma_Q$$ where $$|\gamma_Q| \leq \delta(k)$$. Therefore

$Pr[\exists Q < p(k,j)+i: \epsilon(Q) \leq E[(ev^{p(k,j)+i}(Q)-f)^2] - \delta(k) - k^{-\frac{1}{4}}] \leq (p(k,j)+i)(p(k,j)+i)^{-1}k^{-\frac{1}{2}} = k^{-\frac{1}{2}}$

The extra $$p(k,j)+i$$ factor comes from summing probabilities over $$p(k,j)+i$$ programs. Combining we get

$Pr[E[(ev^{p(k,j)+i}(Q^*)-f)^2] \geq E[(P^{kj}-f)^2] + 2\delta(k) + ((p(k,j)+i)^{-\frac{1}{4}} + 1) k^{-\frac{1}{4}}] \leq ((p(k,j)+i)^{-\frac{1}{2}} + 1) k^{-\frac{1}{2}}$

$E[(\Lambda[G]^{k,p(k,j)+i} - f)^2] \leq E[(P^{kj}-f)^2] + 2\delta(k) + ((p(k,j)+i)^{-\frac{1}{4}} + 1) k^{-\frac{1}{4}} + ((p(k,j)+i)^{-\frac{1}{2}} + 1) k^{-\frac{1}{2}}$

$E[(\Lambda[G]^{k,p(k,j)+i} - f)^2] \leq E[(P^{kj}-f)^2]+ 2\delta(k) + (p(k,j)^{-\frac{1}{4}} + 1) k^{-\frac{1}{4}} + (p(k,j)^{-\frac{1}{2}} + 1) k^{-\frac{1}{2}}$

Applying Lemma C.1 we get the desired result.

 by Vadim Kosoy 1269 days ago | link EDIT: Corrected Example 5.2 and added Note 2 (previous Note 2 renamed to Note 3) reply
 by Vadim Kosoy 1213 days ago | link EDIT: Deleted examples of generatable problems since they are wrong. They are in fact examples of a weaker notion which I think also admits uniform OPS but this should be explored elsewhere. reply

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes