Intelligent Agent Foundations Forumsign up / log in
Logical Inductor Lemmas
discussion post by Alex Appel 200 days ago | discuss

The final lemma in this sequence will be necessary for use in an upcoming post about logical inductors and correlated equilibria, so it will be proved here in order to not clutter up that post.

Much of the notation in this is reused from the previous post, Two Notions of Best Response

Let \(S_{1}\) be the set of actions available to player 1. \(|S_{1}|=m\). Given a matrix \(M\), \(M_{j}\) refers to the \(j\)-th row vector of \(M\). Given some vector \(\vec{v}\), \(|\vec{v}|\) is defined in the usual way. In this post, \(a_{j}\) is typically used to refer to some default action by player 1, and \(a_{j'}\) typically refers to some better action.

Consider the space \(\prod_{ij}\mathbb{R}\). This would be a space of probabilities/prices assigned to the various joint outcomes, by some logical inductor. It’s possible to use expressible features to pick out fuzzy subsets of this space, by taking the prices of the various sentences as input and mapping them to \([0,1]\) (in practice, the prices will always be in a hypercube, but it’ll be convenient to not have to deal with the special case where the prices are on an edge of the hypercube)

The infinite sequence \(\{p_{t}\}_{t\in\mathbb{N^{+}}}\) of points that lie in this space can be identified with the sequence of prices of the logical inductor on day \(t\).

Consider some fuzzy set/\(\mathbb{P}\)-generable region in that space, bounded in [0,1], which we’ll denote by \(C\).

Let \(\alpha_{C}(k)\) be the number in [0,1] that the expressible feature \(\alpha\) which implements \(C\) outputs when given the market prices of the day \(k\). By a slight abuse of notation, \(\alpha_{C}(p)\) refers to what the expressible feature would output if the market prices were given by the point \(p\).

\(C\in\mathcal{B}_{a_{j}}\) iff there’s a point \(p\in supp(C)\) such that \(p\in B_{a_{j}}\). In words, there’s a point in \(C\) where \(a_{j}\) is an evidential best response.

Given some point \(p\), where some action \(a_{j}\) has nonzero probability (for a well-defined expected utility), then an action \(a_{j'}\) dominates \(a_{j}\) by \(\delta\) if \(EU_{lower}(a_{j'})=EU(a_{j})+\delta\).

Lemma 1:

When a point \(p\) has \(|P_{j}|\ge\epsilon\), all points \(q\) in an \(L_{1}\)-ball of size \(\delta\) around \(p\) where \(\delta<<\epsilon\) will have \(|EU_{q}(a_{j})-EU_{p}(a_{j})|\le 2\delta\). (approximately)

Proof:

\[EU_{p}(a_{j})=\frac{U_{j}\cdot P_{j}}{|P_{j}|}=\frac{U_{j}\cdot P_{j}}{\epsilon}\simeq\frac{U_{j}\cdot P_{j}\pm\delta}{\epsilon\pm\delta}=EU_{q}(a_{j})\]

(here, \(\pm\delta\) should be interpreted to be any number in \([-\delta,\delta]\), and it doesn’t necessarily have to be the same number on the top and bottom. Also, in the next line, \(X\) will be used as an abbreviation for \(U_{j}\cdot P_{j}\), which is a number in \([0,\epsilon]\))

\[\frac{U_{j}\cdot P_{j}\pm\delta}{\epsilon\pm\delta}\simeq\frac{(X\pm\delta)(\epsilon\pm\delta)}{\epsilon^{2}-\delta^{2}}\simeq\frac{\epsilon X\pm\delta X\pm\epsilon\delta}{\epsilon^{2}}=\frac{X\pm\delta\frac{X}{\epsilon}\pm\delta}{\epsilon}\simeq EU_{p}(a_{j})\pm 2\delta\]

QED.

Now let \(\max_{n}(a_{j})\) be the n’th largest utility in the \(j\)-th row of the utility matrix. \(\max_{2}(a_{j})\) refers to the highest utility less than the maximum utility, in the event that there are two actions with maximal utility. \(\min_{n}\) is defined similarly.

Lemma 2:

For all points \(p\), there exists a change in probabilities by \(\epsilon\) (according to the \(L_{1}\)-norm), for which the expected utility of the move \(a_{j}\) can be increased by \((\max_{1}(a_{j})-\max_{2}(a_{j}))\frac{\epsilon}{2}\), or pushed to the highest attainable utility, whichever one is lower.

The same argument will also work to establish that the probabilities can be lowered by \((\min_{2}(a_{j})-\max_{2}(a_{j}))\frac{\epsilon}{2}\), or pushed to the lowest attainable utility, whichever one is higher.

Assume that the probability matrix \(P\) has \(P_{j}=\vec{0}\) Then there is an arbitrarily small change in probability that will have \(a_{j}\) have the highest possible expected utility, and it is proved.

Assume that the probability matrix \(P\) has \(P_{j}\) have less than \(\frac{\epsilon}{2}\) probability, total, on the \(i\) which correspond to non-maximal utilities in \(U_{j}\). Then there is an \(\epsilon\)-change in probabilities (take all the probability mass from non-optimal outcomes and put it on the best outcome) that makes \(a_{j}\) have the highest possible expected utility, and it is proved.

Assume that the probability matrix \(P\) has \(P_{j}\) have \(\frac{\epsilon}{2}\) or greater probability, total, on the \(i\) which correspond to non-maximal utilities in \(U_{j}\). Then by taking \(\frac{\epsilon}{2}\) probability mass from the non-optimal outcomes, and putting it on the optimal outcome, it is easy to verify from the expected utility equation that it will produce at least a \((\max_{1}(a_{j})-\max_{2}(a_{j}))\frac{\epsilon}{2}\) increase in the expected utility.

Lemma 3:

If there is a point \(p\) for which no action \(a_{j'}\) dominates \(a_{j}\) by a positive number, and there is a \(\mathbb{P}\)-generable fuzzy set \(C\not\in\mathcal{B}_{a_{j}}\), then \(p\) must be on the boundary of \(supp(C)\), or on the exterior of \(supp(C)\).

Assume the opposite, that this point is on the interior of \(C\). Due to the continuity of the fuzzy set, and the fact that \(a_{j}\) is not dominated by any positive number, then there is an arbitrarily small perturbation of the prices to make \(a_{j}\) an evidential best response. Now there is a point where \(a_{j}\) is an evidential best response in the support of \(C\). However, we assumed that the support of \(C\) had no points where \(a_{j}\) was an evidential best response. Therefore, there is a contradiction, and \(p\) must either be on the boundary of \(supp(C)\), or the exterior of \(supp(C)\).

In both cases, due to continuity, this point must have \(\alpha_{c}(p)=0\).

Lemma 4:

Given a \(\mathbb{P}\)-generable fuzzy set \(C\) such that \(C\not\in\mathcal{M}_{a_{j}}\), and some \(\epsilon\), then let \(C':=\{p|\alpha_{C}(p)-\epsilon\>0}\). Then all points in \(C'\) have some action \(a_{j'}\) that dominates \(a_{j}\) by \(\delta\), for some positive constant \(delta\).

To begin with, the function \(\alpha_{C}\) has some bounded lipschitz constant \(L\). Let \(\epsilon':=\frac{\epsilon}{2L}\).

Given a point \(p\in C'\), associate each action with \(EU_{lower,p}\) for that action, and select an arbitrary highest-value action to be \(a_{j'}\).

Consider the function \(F(p)\) that uses the highest-value action for \(p\) as \(a_{j'}\), and then outputs \[\min\left((\max_{1}(a_{j})-\max_{2}(a_{j}))\frac{\epsilon'}{2},(\min_{2}(a_{j'})-\min_{1}(a_{j'}))\frac{\epsilon'}{2}\right)\] when both of these terms are positive, and the term that is positive when the other term is 0. (we can ignore the case where both terms are 0, as we shall soon see)

To begin, assume that both terms are 0. That means that \(EU(a_{j'})\) and \(EU(a_{j})\) are both constants. If \(EU(a_{j'})>EU(a_{j})\), then the theorem is proved. If they are equal, then one of two situations apply. In case 1, \(a_{j'}\) is never selected as one of the best actions, in which case we can ignore it. In case 2, \(a_{j'}\) is selected as one of the best actions somewhere in \(C'\), and then no action would dominate \(a_{j}\) by a positive number in the interior of \(C\), which is impossible by Lemma 3.

Therefore, assuming the theorem hasn’t been trivially proved yet, \(F(p)\) is always defined for all \(p\in C'\). Also, note that \(F(p)\) is bounded below by a constant (because there are only finitely many actions, all of which have their corresponding term be a constant, or 0)

Select an arbitrary point \(p\), and the corresponding action, \(a_{j'}\) such that

\[EU_{lower,p}(a_{j'})-EU_{upper,p}(a_{j})<F(p)\]

If there isn’t any, we’re done (and \(\delta=argmin_{p}F(p)\).

Otherwise, by Lemma 2, if we look at the \(L_{1}\) ball of \(\epsilon'\)-sized perturbations, one of two events will have occurred. In case 1, there’s a point in that ball where the “domination gap” given by \(F(p)\) has been closed completely. In case 2, the maximum possible utility for \(a_{j}\) is less than the minimum possible utility for \(a_{j'}\), and the ball of size \(\epsilon'\) has been ruled out, as all points in that ball have \(a_{j'}\) dominating \(a_{j}\) by a fixed constant \(\delta\).

In case 1, because the point \(q\) where the utility gap closes is \(\epsilon'\)-away from a point \(p\) where \(\alpha_{C}(p)>\epsilon\), and \(\epsilon'=\frac{\epsilon}{2L}\), then \(\alpha_{C}(q)\ge\frac{\epsilon}{2}\) (due to the bounded lipschitz constant of \(\alpha_{C}\)) Then there’s a point where \(a_{j}\) is not dominated on the interior of \(C\), which by Lemma 3, is impossible.

Therefore, case 2 must apply. However, because the worst-case utility of action \(a_{j'}\) dominates the best-case utility of action \(a_{j}\) by \(\delta\), our assumption that the theorem was false is false.

Lemma 5:

Given some point \(p\) where \(a_{j'}\) dominates \(a_{j}\) by \(\delta\) or more, and \(|P_{j}|>\epsilon>>\delta\), and \(p\) is more than \(2(m-1)\epsilon+\frac{\delta}{3}\) from a point that is in \(B_{a_{j}}\) (according to the \(L_{1}\) norm), then the sequence of prices \(\{p_{t}\}_{t\in\mathbb{N}^{+}}\) will only finitely often be in the \(L_{1}\)-ball of size ~\(\frac{\delta}{6}\) around \(p\).

We will consider two cases, and prove them seperately.

In case 1, \(|P_{j'}|\ge\epsilon\).

By Lemma 1, the change in expected utility for \(a_{j}\) and \(a_{j'}\) over the \(\frac{\delta}{6}\) \(L_{1}\) ball will be about, or less than, \(\frac{\delta}{3}\). Because \(a_{j'}\) dominates \(a_{j}\) by \(\delta\) or more at \(p\), then in this ball, \(a_{j'}\) always dominates \(a_{j}\) by \(\frac{\delta}{3}\) or more.

Now, there is a convex \(\mathbb{P}\)-generable fuzzy set \(C\) which covers this ball (and a tiny region outside of it, due to the continuous indicator functions, but this doesn’t affect anything, because the support of that region still has \(a_{j'}\) dominating \(a_{j}\) by about \(\frac{\delta}{3}\) or more). Assuming that there are infinitely many points in the ball (by contradiction), then by calibration, the empirical joint probabilities, summed over all time, will be within th support of \(C\).

To recap the definition of expected value for a logical inductor,

\[\mathbb{E}_{t}(U_{t}|a_{t}=j)=\frac{1}{t}\sum_{i=0}^{i=t-1}\min\left(\frac{\mathbb{P}_{t}("a_{t}=j"\wedge "U_{t}>\frac{i}{t}")}{\mathbb{P}_{t}(a_{t}=j)},1\right)\]

By assumption, \(a_{t}=j\) has a positive probability around \(\epsilon\) or greater. By coherence, the logical inductor will eventually learn that the price on \(a_{t}=j\) should equal the sum of the prices over all the other joint outcomes, so in the limit, \(\mathbb{P}_{t}(a_{t}=j)\simeq_{t}|P_{j}|\). A similar argument applies to learning the utilities of the various outcomes in the top of the fraction, so in the limit, \(\mathbb{E}_{t}(U_{t}|a_{t}=j)\simeq_{t}EU(a_{j})\).

Now, since \(a_{j'}\) also has probability bounded away from 0, then the same argument applies to it as well. The expected utilities will eventually be learned. Because there’s a \(\frac{\delta}{3}\) or larger difference in expected utilities, \(a_{j}\) will be taken with limiting probability 0 when the prices are inside \(C\). However, due to calibration, the probability of \(a_{j}\) must be bounded away from 0, so we have a contradiction, and the sequences of prices \(\{p_{t}\}_{t\in\mathbb{N}^{+}}\) must have only finitely many elements in the \(L_{1}\)-ball.

Now for case 2. Assume case 1 is false, so for the point \(p\), for all \(a_{j'}\) which dominate \(a_{j}\) by \(\delta\) or more, \(|a_{j'}|<\epsilon\).

Consider the \(L_{1}\)-ball of size \(\frac{\delta}{6}\) around \(p\). There degree of improvement in \(EU(a_{j})\) over the whole ball can be at most \(\frac{\delta}{3}\), by Lemma 1. Let \(\delta'\) be the maximum increase over \(EU_{p}(a_{j})\) for \(a_{j}\), for the whole ball.

For each action \(a_{j'}\) that is not \(a_{j}\) and has \(EU_{lower,p}(a_{j'})>EU_{upper,p}(a_{j})\), there is a perturbation of size \(2\epsilon\) or less that will either drive their expected utility to \(EU_{upper,p}(a_{j})+\delta'\) or below, or drive their expected utility to the minimum possible.

If \(a_{j'}\) has a probability less than \(\epsilon\) (as all the \(a_{j'}\) that \(\delta\)-dominate \(a_{j}\) do), by taking \(\epsilon\) utility mass from that action (to drive it to probability 0) and putting it on \(a_{j}\) in a way that preserves expected value, that \(a_{j'}\) will be counted as having the lowest possible expected utility in the dominance calculation. If \(a_{j'}\) has a probability \(\ge\epsilon\) (and has an expected utility within \(\delta\) of \(a_{j}\)) Lemma 2 works to either drive the expected utility of \(a_{j'}\) below that of \(a_{j}\), or drive it to the minimum possible expected utility.

If the worst-case utility for all those actions is equal to or less than \(EU_{p}(a_{j})+\delta'\), then there’s a point in \(B_{a_{j}}\) that’s within \(2(m-1)\epsilon+\frac{\delta}{3}\) distance of \(p\) (apply the perturbations mentioned above to make all other actions lose), which violates one of the starting assumptions. Therefore, there must be some action \(a_{j'}\) that dominates \(a_{j}\) by more than \(\delta'\) at \(p\), using the worst-case utility, so it dominates \(a_{j}\) by a fixed amount over the whole \(L_{1}\) ball.

Now, we shouldn’t necessarily assume that, over the whole ball, accurate utilities for \(a_{j'}\) will be learned, because the ball may have regions where the probability of \(a_{j'}\) is arbitrarily close to 0. However, density-zero exploration lets the inductor learn a lower bound on the utility of \(a_{j'}\), as we’ll see.

Assume that the \(L_{1}\)-ball has infinitely many price points in it (for a contradiction) Then, we can again make a \(\mathbb{P}\)-generable convex set \(C\) with infinitely many points in it, and because of infinite exploration on divergent \(\mathbb{P}\)-generable subsequences (this is a slight strengthening of Scott’s density-zero exploration), the action \(a_{j'}\) will be taken infinitely often, and in finite time, the inductor will learn that \(\mathbb{E}_{t}(U_{t}|a_{t}=j)\) will be close to, or greater than, the worst-case utility for \(a_{j'}\) when the prices are in the set \(C\).

Again, we get the logical inductor learning to never take action \(a_{j}\), which produces a contradiction with the calibration property in the same way as before, because in this region, the price of \(a_{j}\) is bounded away from 0.

Therefore, in case 2, there are still only finitely many points in the ball around \(p\), and the theorem has been proved.

Lemma 6:

One of the following two properties applies to all convex \(\mathbb{P}\)-generable sets of prices over joint outcomes:

1: \(C\in\mathcal{B}_{a_{j}}\)

2: \(\lim_{t\to\infty}\frac{1}{t}\sum_{k\le t}\alpha_{C}(k)\cdot ind(a_{j},k)=0\)

(I suspect it’s possible to strengthen condition 2 a little bit, but this more than suffices for the upcoming proof in the next post. Also, \(ind(a_{j},k)\) is the function that is 0 when \(a_{j}\) is not taken on turn \(k\), and 1 otherwise.)

Assume 1 is false. If it’s true, the proof is over. Also, assume there are infinitely many points in the interior of \(C\), because otherwise 2 is true and the proof is over. The space of prices is composed entirely of the following three disjoint sets of points. Set \(\Gamma\) is composed of points where \(\alpha_{C}(p)<\epsilon\). in sets \(\Delta\) and \(\Theta\), this doesn’t hold (and by Lemma 4, there must always be some action \(a_{j'}\) that dominates \(a_{j}\) by \(\delta\) or more).

Set \(\Delta\cup\Theta\) is convex, because \(C\) is. Set \(\Delta\) is the (convex) set of points where \(a_{j}<\frac{\epsilon}{L}\), and set \(\Theta\) is the (convex) set of points where \(a_{j}\ge\frac{\epsilon}{L}\).

Set \(\Theta\) can have Lemma 5 applied to it to conclude that the sequence of prices will only be within it finitely often. Just identify \(\frac{\epsilon}{L}\) with \(2(m-1)\epsilon+\frac{\delta}{3}\) in Lemma 5 to get the bounding away from \(B_{a_{j}}\), because that’s the minimum distance to travel to get outside of \(supp(C)\), and identify \(\delta\) or something sufficiently smaller with the \(\delta\) in Lemma 5 to get the domination condition.

We can split up \[\lim_{t\to\infty}\frac{1}{t}\sum_{k\le t}\alpha_{C}(k)\cdot ind(a_{j},k)\] into three sequences (with some overlap, but it won’t matter, because it produces an overestimate)

In the first sequence, the prices for the day are in set \(\Gamma\). In the second (fuzzy) sequence,the expected utility of \(a_{j}\) is less than \(\frac{\epsilon}{L}\) (plus a little bit, so an expressible feature can capture this), and in particular, this fuzzy sequence is \(\mathbb{P}\)-generable. In the third sequence, the prices for the day are in set \(\Theta\). All days are in one of these three sequences.

Over the first sequence, the time-averaged sum must limit to \(\epsilon\) or less, because \(\alpha_{C}(k)<\epsilon\), always.

Over the second (fuzzy) sequence, the long-term frequency of \(a_{j}\) being taken must limit to \(\frac{\epsilon}{L}\) or less, by calibration, so the time-averaged sum must limit to \(\frac{\epsilon}{L}\) or less.

Over the third sequence, because it has finitely many points in it, the time-averaged sum limits to 0.

Therefore, the time-averaged sum must limit to \(\epsilon(1+\frac{1}{L})\) or less. Because \(L\) is a fixed constant, and \(\epsilon\) can be as small as we want, the time-averaged sum must converge to 0.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms