Intelligent Agent Foundations Forumsign up / log in
Logical inductor limits are dense under pointwise convergence
post by Sam Eisenstat 500 days ago | Abram Demski, Patrick LaVictoire, Scott Garrabrant and Tsvi Benson-Tilsen like this | discuss

Logical inductors [1] are very complex objects, and even their limits are hard to get a handle on. In this post, I investigate the topological properties of the set of all limits of logical inductors.

I’ll use the language of universal inductors [2], but everything I say will also hold for general logical inductors by conditioning. Also, the proofs can be made to apply directly to general logical inductors with relatively few changes.

Universal inductors are measures on Cantor space (i.e. the space \(2^\omega\) of infinite bit-strings) at every finite time, as well as in the limit (this is a bit nicer than the analogous situation for general logical inductors, which are measures on the space of completions of a theory only in the limit). There are many topologies on spaces of measures, and we can ask about topological properties under any of these. Interestingly, the set of universal inductor limits is dense in the topology of weak convergence, but not in the topology of strong convergence (or, a fortiori, the total variation metric topology).

It’s not immediately clear how to interpret this - should we think of the space of measures as being full of universal inductor limits, or should we think of universal inductor limits as being relatively few, with big holes where none exist? A relatively close analogy is the relationship between computable sets (or limit computable sets for a closer analogy) and arbitrary subsets of \(\mathbb{N}\); any subset of \(\mathbb{N}\) can be approximated pointwise by limit computable sets (indeed, even by finite sets), but may be impossible to approximate under the \(\ell^1\) norm, i.e. the number of differing bits. Now, on with the proofs!

Theorem 1: The set of limits of universal inductors is dense in the topology of weak convergence on the space of measures \(\Delta(2^\omega)\).

Proof outline: We need to find a limit of universal inductors in any neighborhood of any measure \(\mu\). That is, for any finite collection of continuous functions, we need a universal inductor that, in the limit, assigns expectations to each such function that is close to the expectation under \(\mu\).

We do this with a modification of the standard universal inductor construction (section 5 of [1]), adding one additional trader to \(\mathtt{TradingFirm}\). This trader can buy and sell shares in events in order to keep these expectations within certain bounds, in a way similar to the Occam traders in the proof of theorem 4.6.4 in [1]. However, we give this trader a much larger budget than the others, allowing its control to be sufficiently tight.

Proof: It suffices to show that for any measure \(\mu\) on \(2^\omega\), any finite set of bounded continuous functions \(f_i : 2^\omega \to \mathbb{R}\) for \(i \in \{1, \dots, n\}\) and any \(\varepsilon > 0\), there is a universal inductor limit \(\nu\) such that \(\left|\mathbf{E}_\mu f - \mathbf{E}_\nu f\right| < \varepsilon\) for all \(1 \le i \le n\).

It’s easier to work with prefixes of bit-strings than continuous functions, so we’ll pass to such a description. Consider some \(f_i\). For each world \(\mathbb{W} \in 2^\omega\), there is some clopen set \(A \subseteq 2^\omega\) such that \(\left|f_i(\mathbb{W}) - f_i(\widetilde{\mathbb{W}})\right| < \frac{\varepsilon}{3}\) for all \(\widetilde{\mathbb{W}} \in A\). By compactness, we can pick a finite cover \((A_{i,j})_{j=1}^{m_i}\) for each \(i\). Letting \(B_{i,j} = A_{i,j} \setminus \bigcup_{k=1}^{j-1} A_{i,k}\), we get a disjoint cover. By construction, for each \(1 \le j \le m_i\), there is some \(\mathbb{W}_{i,j}\) such that for all \(\widetilde{\mathbb{W}} \in B_{i,j}\), we have \(\left|f_i(\mathbb{W}_{i,j}) - f_i(\widetilde{\mathbb{W}})\right| < \frac{\varepsilon}{3}\), and we can define a simple function \(g_i\) that takes the locally constant value \(f_i(\mathbb{W}_{i,j})\) on each set \(B_{i,j}\). Then, \begin{align*} \left|\mathbf{E}_\mu(f_i) - \mathbf{E}_\nu(f_i)\right| & \le \left|\mathbf{E}_\mu(f_i - g_i)\right| + \left|\mathbf{E}_\mu(g_i) - \mathbf{E}_\nu(g_i)\right| + \left|\mathbf{E}_\nu(g_i - f_i)\right| \\ & \le \left|\mathbf{E}_\mu(g_i) - \mathbf{E}_\nu(g_i)\right| + \frac{2}{3}\varepsilon, \end{align*}

so we need only control what our universal inductor limit \(\nu\) thinks of the clopen sets \(B_{i,j}\).

Pass to a single disjoint cover \((C_k)_{k=1}^p\) of \(2^\omega\) by clopen sets such that for each \(1 \le i \le n\) and \(1 \le j \le m_i\), the set \(B_{i,j}\) is a union of sets \(C_k\). Further, take \(M \in \mathbb{R^+}\) such that \(\left|g_i\right| \le M\) for all \(i\), and pick rational numbers \((a_k)_{k=1}^p\) in \((0,1) \cap \mathbb{Q}\) such that \(\sum_{k=1}^p a_k = 1\) and \(\left|\mu(C_k) - a_k\right| < \frac{\varepsilon}{3pM}\); note we can do this since \(\sum_{k=1}^p \mu(C_k) = 1\). If we can arrange \(\nu\) so that \(\nu(C_k) = a_k\) for all \(k\), we can conclude \begin{align*} \left|\mathbf{E}_\mu(g_i) - \mathbf{E}_\nu(g_i)\right| & \le \sum_{k=1}^p \left| \int_{C_k} g_i d\mu - \int_{C_k} g_i d\nu \right| \\ & = \sum_{k=1}^p \left| g_i(C_k) \cdot (\mu(C_k) - \nu(C_k)) \right| \\ & \le \sum_{k=1}^p M \left|\mu(C_k) - a_k\right| \\ & < \varepsilon/3 \end{align*}

as desired.

We now need only find a universal inductor limit \(\nu\) such that \(\nu(C_k) = a_k\) for all \(1 \le k \le p\). As mentioned above, we modify \(\mathtt{TradingFirm}\) by adding a trader. For each \(C_k\), take a sentence \(\phi_k\) that holds exactly on \(C_k\), and define the trader \[ \mathrm{Vice}_n = \sum_{k=1}^p \left[\mathrm{Ind}_1 \left(\phi^{*n}_k < a_k\right) - \mathrm{Ind}_1 \left(\phi^{*n}_k > a_k\right)\right] \cdot (\phi_k - \phi^{*n}_k). \] Then, with \(S_n^k\) and \(\mathtt{Budgeter}\) as in section 5 of [1], we can define \[\mathtt{TradingFirm}_n \left(\mathbb{P}_{\le n-1}\right) = \mathrm{Vice}_n + \sum_{\ell \in \mathbb{N}^+} \sum_{b \in \mathbb{N}^+} 2^{-\ell-b} \mathtt{Budgeter}_n\left(b, S^k_{\le n}, \mathbb{P}_{\le n-1}\right),\] analogous to (5.3.2) in [1]. This is an \(n\)-strategy by an extension of the argument in [1] so, like in the construction of \(\mathtt{LIA}\) there, we can define a belief sequence \[ \nu_n = \mathtt{MarketMaker}_n\left(\mathtt{TradingFirm}_n\left(\nu_{\le n-1}\right), \nu_{\le n-1}\right),\] and this will not be exploitable by \(\mathtt{TradingFirm}\).

Next, we will use the fact that \(\mathtt{TradingFirm}\) does not exploit \(\overline{\nu}\) to investigate how \(\mathrm{Vice}\)’s holdings behave over time when trading on the market \(\overline{\nu}\). As in lemma 5.3.3 in [1], letting \(F_n = \mathtt{TradingFirm}_n(\nu_{\le n - 1})\), \(V_n = \mathrm{Vice}_n(\nu_{\le n - 1})\) and \(B_n^{b,\ell} = \mathtt{Budgeter}_n(b, S^\ell_{\le n}, \nu_{\le n-1})\), we have that in any world \(\mathbb{W} \in 2^\omega\) and at any step \(n \in \mathbb{N}^+\), \begin{align*} \mathbb{W}\left(\sum_{i \le n} F_n \right) & = \mathbb{W}\left(\sum_{i \le n} V_n \right) + \mathbb{W}\left(\sum_{i \le n} \sum_{\ell \in \mathbb{N}^+} \sum_{b \in \mathbb{N}^+} 2^{-\ell-b} B_n^{b,\ell} \right) \\ & \ge \mathbb{W}\left(\sum_{i \le n} V_n \right) - 2, \end{align*}

so since \(\mathtt{TradingFirm}\) cannot exploit \(\overline{\nu}\) by construction, \(\mathrm{Vice}\) cannot as well.

Now, notice that, for fixed \(n \in \mathbb{N}^+\), the value \(\mathbb{W}(V_n)\) is constant by construction as \(\mathbb{W}\) varies within any one set \(C_k\), so, regarding \((a_k)_{k=1}^p\) as a measure on the finite \(\sigma\)-algebra generated by \((C_k)_{k=1}^p\), it makes sense to integrate \(\mathbb{W}(V_n)\) with respect to that measure. For each \(n\), we have that this integral is \begin{align*} \sum_{k=1}^p C_k(V_n) a_k & = \sum_{k=1}^p \sum_{k'=1}^p C_k\left[ \left(\mathrm{Ind}_1 \left(\nu_n(\phi_{k'}) < a_{k'}\right) - \mathrm{Ind}_1 \left(\nu_n(\phi_{k'}) > a_{k'}\right)\right) \cdot (\phi_{k'} - \nu_n(\phi_{k'})) \right] \cdot a_k \\ & = \sum_{k'=1}^p \left(\mathrm{Ind}_1 \left(\nu_n(\phi_{k'}) < a_{k'}\right) - \mathrm{Ind}_1 \left(\nu_n(\phi_{k'}) > a_{k'}\right)\right) \sum_{k=1}^p \left[ C_k(\phi_{k'}) \cdot a_k - \nu_n(\phi_{k'}) \cdot a_k \right] \\ & = \sum_{k'=1}^p \left(\mathrm{Ind}_1 \left(\nu_n(\phi_{k'}) < a_{k'}\right) - \mathrm{Ind}_1 \left(\nu_n(\phi_{k'}) > a_{k'}\right)\right) \cdot \left(a_{k'} - \nu_n(\phi_{k'})\right) \\ & \ge 0, \end{align*} since for each \(k'\) the two factors are either both positive, both negative, or both zero. This is to say, the value of \(\mathrm{Vice}\)’s holdings according to the measure given by the numbers \(a_k\) is nondecreasing over time. Since \(a_k > 0\) for all \(k\), we can use our upper bound on \(\mathbb{W}(V_n)\) to derive a lower bound; if \(L \in \mathbb{R}\) is such that \(\widetilde{\mathbb{W}}\left(\sum_{i=1}^n V_i\right) \le L\) for all \(\widetilde{\mathbb{W}} \in 2^\omega\) and \(n \in \mathbb{N}^+\), then for any \(\mathbb{W}\), say, in \(C_k\), \begin{align*} \mathbb{W}\left(\sum_{i=1}^n V_i\right) & = \sum_{i=1}^n \mathbb{W}(V_i) \ge \sum_{i=1}^n -\frac{1}{a_k} \sum_{k' \ne k} C_{k'}(V_i) a_{k'} \\ & \ge -\frac{1}{a_k} \sum_{k' \ne k} L \cdot a_{k'}. \end{align*} We can conclude two things from this, which will finish the argument. First, we get an analogue of lemma 5.3.3 from [1], from which it follows that \(\overline{\nu}\) is a universal inductor (ignoring minor differences between the definitions of universal and logical inductors; see [2]). This is since for any of the budgeters \(B^{b,\ell}\) defined above, \begin{align*} \mathbb{W}\left(\sum_{i=1}^n F_i\right) & = 2^{-b-\ell} \mathbb{W}\left(\sum_{i=1}^n B_i^{b,\ell}\right) + \mathbb{W}\left(\sum_{i=1}^n V_i\right) + \mathbb{W}\left(\sum_{i=1}^n \sum_{(b',\ell') \ne (b,\ell)} 2^{-b'-\ell'} B_i^{b',\ell'}\right) \\ & \ge 2^{-b-\ell} \mathbb{W}\left(\sum_{i=1}^n B_i^{b,\ell}\right) + \mathbb{W}\left(\sum_{i=1}^n V_i\right) - 2, \end{align*}

so \(B^{b,\ell}\) has its holdings bounded uniformly in \(\mathbb{W}\) and \(n\).

Second, defining \(\nu = \lim_{n \to \infty} \nu_n\), we get the desired property that \(\nu(C_k) = a_k\) for all \(k\). Suppose for contradiction that \(\nu(C_k) < a_k\). Taking \(\delta > 0\) with \(\nu(C_k) + \delta < a_k\), there is some \(N \in \mathbb{N}^+\) such that if \(n \ge N\), then \(\left|\nu_n(C_k) - \nu(C_k)\right| < \frac{\delta}{2}\). Then, for all such \(n\), it follows that \[\nu_n\left(\mathrm{Ind}_1(\phi_k^{*n} < a_k)\right) > \frac{\delta}{2},\] and \[\nu_n\left(\mathrm{Ind}_1(\phi_k^{*n} > a_k)\right) = 0,\] so in any world \(\mathbb{W} \in C_k\), \[\mathbb{W}(V_n) \ge \frac{\delta}{2} \left(1 - \nu_n(\phi_k)\right) \ge \frac{\delta}{2} (1 - a_k),\] and so \(\mathrm{Vice}\)’s holdings go to infinity in this world, giving the desired contradiction. By a symmetrical argument, we can’t have \(\nu(C_k) > a_k\), so \(\nu(C_k) = a_k\) as desired. \(\square\)

One thing I want to note about the above proof is that, to my knowledge, it is the first construction of a logical inductor for which the limiting probability of a particular independent sentence is known.

Theorem 2: The set of universal inductor limits is not dense in the topology of strong convergence of measures or the total variation distance topology on \(\Delta(2^\omega)\).

Proof: The total variation distance induces a finer topology than the topology of strong convergence, so it suffices to show that this set is not dense under strong convergence. Take any world \(\mathbb{W} \in 2^\omega\) that is not \(\Delta_2\), and let \(\mu \in \Delta(2^\omega)\) be a point mass on \(\mathbb{W}\). The singleton \(\{\mathbb{W}\}\) is a measurable set, so it suffices to show that for any universal inductor limit \(\nu\), we have \(\mu(\{\mathbb{W}\}) = 1\) but \(\nu(\{\mathbb{W}\}) = 0\).

Suppose for contradiction that there was some universal inductor limit \(\nu\) for which this failed. By the Lebesgue density theorem, there is some clopen set \(A\) such that \(\mathbb{W} \in A\) and \[\frac{\nu(\{\mathbb{W}\})}{\nu(A)} > \frac{1}{2}.\] Then, it is possible to compute \(\mathbb{W}\) from \(\nu\) as follows. Each \(k \in \mathbb{N}\) corresponds to some clopen subbase element \(U_k\) for the topology on \(2^\omega\). In order to determine whether \(k \in \mathbb{W}\), we can improve our estimates of \(\nu(U_k \cap A)\) and \(\nu(A)\) until we’ve determined either \(\nu(U_k \cap A)/\nu(A) > 1/2\) or \(\nu(U_k \cap A)/\nu(A) < 1/2\). One of these must be the case, and we will eventually determine it since these are both open conditions. Thus, since \(\nu\) is \(\Delta_2\), the set \(\mathbb{W}\) is as well, contradicting the hypothesis. \(\square\)

[1] Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, and Jessica Taylor. 2016. “Logical induction.” arXiv: 1609.03543 [cs.AI].

[2] Scott Garrabrant. 2016. “Universal inductors.”





[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes


Privacy & Terms