Intelligent Agent Foundations Forumsign up / log in
Concise Open Problem in Logical Uncertainty
discussion post by Scott Garrabrant 1319 days ago | Jessica Taylor and Patrick LaVictoire like this | 7 comments

The purpose of this post is to provide a short fully mathematically specified conjecture which can be worked on with very little background, but which has an important consequence in logical uncertainty. Not too many man-hours have been put into this question yet, so it is plausible that a MIRIx team could solve this problem.

Let \(E\) be a be an envirnoment which is a function from \(\mathbb{N}\) to \(\{0,1\}\).

Let \(M\) be an algorithm which on input \(n\) is given oracle access to \(E(i)\) for all \(i<\log n\), and which outputs a probability \(p(n)\).

Definition: If \[\lim_{n\rightarrow\infty}\prod_{i=1}^n\frac{|p(i)+E(i)-1|}{|E(i)-\frac{1}{3}|}=0,\] then \(M\) is bad. Similarly, if \[\lim_{n\rightarrow\infty}\prod_{i=1}^n\frac{|p(i)+E(i)-1|}{|E(i)-\frac{2}{3}|}=0,\] then \(M\) is bad. Otherwise, \(M\) is good. (Note that if there is no limit, this does not mean \(M\) is bad.)

Conjecture: For every algorithm \(M\), there exists an environment \(E\) such that \(M\) is bad.

Intuitively, \(M\) is slowly seeing bits from \(E\), and much more quickly making predictions about \(E\). If \(M\) is bad in the first sense, that means that \(M\) it has made much worse predictions than if it just output 2/3. If \(M\) is bad in the second sense, that means that \(M\) it has made much worse predictions than if it just output 1/3. It seems easy enough to avoid the first failure mode; we just have to switch to output 2/3 if we cross some threshhold where 2/3 has been doing better. Adding the second failure mode makes this strategy stop working, because an evil environment could cause us to lock in to 2/3, and then switch to giving probability 1/3 forever.

We would like an \(M\) which can take a countable set of advisors, and do at least as well as all of them, even when there is a delay in the observations. However, I believe that \(M\) can’t even do at least as well as two constant advisors. The above conjecture implies that \(M\) cannot even balance the predictions of two constant advisors, and therefore also cannot balance countably many advisors. Note that a disproof would also be very interesting.




def M(p, E):
    p1, p2 = 1./3, 2./3
    prev, this, next = 0, 0, 1

bad1 and bad2 compute log-badnesses of M relative to p1 and p2, on E[:prev]; the goal of M is to ensure neither one goes to \(\infty\). prev, this, next are set in such a way that M is permitted access to this when computing p[this:next].

    def bad(advisor):
        return lambda:
            sum(log(abs((E[i] + advisor(i) - 1) /
                        (E[i] + p[i] - 1)))
                for i in xrange(prev))
    bad1, bad2 = bad(lambda i: p1), bad(lambda i: p2)
    for i in xrange(this, next): p[i] = 0.5
    prev, this, next = this, next, floor(exp(next - 1)) + 1

    while True:
        for i in xrange(this, next): p[i] = p1
        prev, this, next = this, next, floor(exp(next - 1)) + 1

bad1() is now be up to date through E[:this], not just E[:prev]

        bound = 2 * bad1()
        while bad1() > 0:
            # won't let bad1() surpass bound
            eps = (bound - bad1()) / 2 / (1 - p1) / (next - this)

This is just for early iterations; in the limit, eps should be just enough for bad1 to go halfway to bound:

            while eps >= 1 - p1 or
              bound <= bad1() + (next - this) *
              log((1 - p1) / (1 - p1 - eps)):
                eps /= 2
            for i in xrange(this, next): p[i] = p1 + eps
            prev, this, next = this, next, floor(exp(next - 1)) + 1

            for i in xrange(this, next): p[i] = p1
            # this is where the p1 + eps affects bad1()
            prev, this, next = this, next, floor(exp(next - 1)) + 1

Now every iteration (after the first few) where \(\operatorname{mean}(\mathtt{E[prev:this]})\leq \frac{2}{5}\) will decrease bad2() by roughly at least \((\mathtt{this} - \mathtt{prev})(\log(1 - \mathtt{p1}) - \log(1 - \mathtt{p2}) + \frac{2}{5}(\log(\mathtt{p1}) - \log(\mathtt{p2}) - \log(1 - \mathtt{p1}) + \log(1 - \mathtt{p2}))) = (\mathtt{this} - \mathtt{prev})\frac{1}{5}\log(2)\gg \mathtt{prev}\), which is large enough to turn bad2() negative. Therefore, if M is bad for E, there can be only finitely many such iterations until the loop exits. However, every iteration where \(\operatorname{mean}(\mathtt{E[prev:this]})\geq \frac{2}{5}\) will cause bound - bad1() to grow exponentially (by a factor of \(\frac{11}{10}=1+\frac{1}{2}\left(-1+\frac{2}{5}\frac{1}{\mathtt{p1}} \right)\)), so the loop will terminate.

Now we’ll perform the same procedure for bad2():

        for i in xrange(this, next): p[i] = p2
        prev, this, next = this, next, floor(exp(next - 1)) + 1

        bound = 2 * bad2()
        while bad2() > 0:
            # won't let bad2() surpass bound
            eps = (bound - bad2()) / 2 / p2 / (next - this)
            while eps >= p2 or
              bound <= bad2() + (next - this) *
              log( p2 / (p2 - eps)):
                eps /= 2
            for i in xrange(this, next): p[i] = p2 - eps
            prev, this, next = this, next, floor(exp(next - 1)) + 1

            for i in xrange(this, next): p[i] = p2
            # this is where the p2 - eps affects bad2()
            prev, this, next = this, next, floor(exp(next - 1)) + 1

For the same reasons as the previous loop, this loop either stops with bad2() < 0 or runs forever with bad2() bounded and bad1 repeatedly falling back below 0.

Therefore, this algorithm either gets trapped in one of the inner while loops (and succeeds) or turns bad1() and bad2() negative, each an infinite number of times, and therefore succeeds.

reply

by Scott Garrabrant 1315 days ago | link

Nice! I think I believe your claim, and I would like to chat with you to verify stuff and talk about future directions.

I have thought about algorithms very similar to this, and using such an algorithm got an \(M\) which is either good, or bad in the first sense and outputting probabilities converging to 2/3, or bad in the second sense and outputting probabilities converging to 1/3. I had thought that if epsilon was shrinking quickly enough as to not have bad1 go to infinity, it would be shrinking so quickly that you could get locked in the while loop with bad2 increasing. I don’t think I actually checked this claim carefully, so I guess maybe I was wrong.

If this algorithm works as claimed, I wonder if you can extend it to three advisors (which may not be constant).

reply

by Janos Kramar 1292 days ago | link

I don’t yet know whether I can extend it to two nonconstant advisors, but I do know I can extend it to a countably infinite number of constant-prediction advisors. Let \((P_i)_{i=0,\dots}\) be an enumeration of their predictions that contains each one an infinite number of times. Then:

def M(p, E, P):
    prev, this, next = 0, 0, 1
    def bad(i):
        return sum(log(abs((E[k] + P[i] - 1) /
                           (E[k] + p[k] - 1)))
                   for k in xrange(prev))
    for k in xrange(this, next): p[k] = 0.5
    prev, this, next = this, next, floor(exp(next - 1)) + 1

    for i in xrange(0, Inf):
        for k in xrange(this, next): p[k] = P[i]
        prev, this, next = this, next, floor(exp(next - 1)) + 1

bad(i) is now up to date through E[:this], not just E[:prev]

        bound = 2 * bad(i)
        for j in xrange(0, Inf):
            if P[j] == P[i]: continue
            flip = P[j] < P[i]
            p1, p2 = abs(P[i] - flip), abs(P[j] - flip)
            for k in xrange(this, next): p[k] = abs(p1 - flip)
            prev, this, next = this, next, floor(exp(next - 1)) + 1
            
            if bad(i) <= 0: break
            while bad(i) > 0 and bad(j) > 0:
                # won't let bad(i) surpass bound
                eps = (bound - bad(i)) / 2 / abs(1 - p1 - flip) / (next - this)

This is just for early iterations of the inner loop; in the limit, eps should be just enough for bad(i) to go halfway to bound if we let p = abs(p1 + eps - flip):

                while eps >= 1 - p1 or
                      bound <= bad(i) + (next - this) *
                        log((1 - p1) / (1 - p1 - eps)):
                    eps /= 2
                for k in xrange(this, next): p[k] = abs(p1 + eps - flip)
                prev, this, next = this, next, floor(exp(next - 1)) + 1

                for k in xrange(this, next): p[k] = abs(p1 - flip)
                # this is where the P[i] + d * eps affects bad(i)

Consider \(q = \frac{\log(1-\mathtt{p1})-\log(1-\mathtt{p2})}{\log(1-\mathtt{p1})-\log(1-\mathtt{p2})+\log(\mathtt{p2})-\log(\mathtt{p1})}\). This \(q\) is the probability between p1 and p2 such that if E[k] is chosen with probability \(|q-\mathtt{flip}|\) then that will have an equal impact on bad(i) and bad(j). Now consider some \(q'\) between p1 and \(q\). Every iteration where \(\operatorname{mean}\left(\left|\mathtt{E[prev:this] - flip}\right|\right) \leq q'\) will decrease bad(j) by a positive quantity that’s at least linear in this-prev, so (at least after the first few such iterations) this will exceed \(\mathtt{prev}\cdot - \log\left(\max_{k:P_k\mbox{ has been reached}}\max(P_k,1-P_k)\right) > \mathtt{bad(j)}\), so it will turn bad(j) negative. If this happens for all j then M cannot be bad for E. If it doesn’t, then let’s look at the first j where it doesn’t. After a finite number of iterations, every iteration must have \(\operatorname{mean}\left(\left|\mathtt{E[prev:this] - flip}\right|\right) > q'\). However, this will cause bad(i) to decrease by a positive quantity that’s at least proportional to bound - bad(i); therefore, after a finite number of such iterations, we must reach \(\mathtt{bad(i)}<0\). So if M is bad for E then for each value of i we will eventually make \(\mathtt{bad(i)}<0\) and then move on to the next value of i. This implies M is not bad for E.


Emboldened by this, we can also consider the problem of building an \(M\) that isn’t outperformed by any constant advisor. However, this cannot be done, according to the following handwavy argument:

Let \(q\) be some incompressible number, and let \(E(i)\stackrel{iid}\sim\operatorname{Bern}(q)\). When computing \(p(n)\), \(M\) can’t do appreciably better than Laplace’s law of succession, which will give it standard error \(\sqrt{\frac{q(1-q)}{\log(n)}}\), and relative badness \(\sim\frac{q(1-q)}{\log(n)}\left(\frac{1}{q}+\frac{1}{1-q}\right)=\frac{1}{\log(n)}\) (relative to the \(q\)-advisor) on average. For \(i\leq n\), and \(n\gg 0\), the greatest deviation of the badness from the \(\sum_{j=2}^{i}\frac{1}{\log(j)}\geq \frac{i-1}{\log(i)}\) trend is \(\approx\sqrt{2n\log\log(n)q(1-q)}\) (according to the law of the iterated logarithm), which isn’t enough to counteract the expected badness; therefore the badness will converge to infinity.

reply

by Janos Kramar 1289 days ago | link

Ah, I think I can stymy \(M\) with 2 nonconstant advisors. Namely, let \(A_1(n)=\frac{1}{2}-\frac{1}{n+3}\) and \(A_2(n)=\frac{1}{2}+\frac{1}{n+3}\). We (setting up an adversarial \(E\)) precommit to setting \(E(n)=0\) if \(p(n)\geq A_2(n)\) and \(E(n)=1\) if \(p(n)\leq A_1(n)\); now we can assume that \(M\) always chooses \(p(n)\in[A_1(n),A_2(n)]\), since this is better for \(M\).

Now define \(b'_i(j)=\left|A_i(j)+E(j)-1\right|-\left|p(j)+E(j)-1\right|\) and \(b_i(n)=\sum_{j<n}b'_i(j)\). Note that if we also define \(\operatorname{bad}_i(n)=\sum_{j<n} \left(\log\left|A_i(j)+E(j)-1\right|-\log\left|p(j)+E(j)-1\right|\right)\) then \(\sum_{j<n} \left|2b_i(j)-\operatorname{bad}_i(j)\right|\leq \sum_{j<n} \left(2A_1(j)-1-\log(2A_1(j)))\right)=\sum_{j<n} O\left(\left(\frac{1}{2}-A_1(j)\right)^2\right)\) is bounded; therefore if we can force \(b_1(n)\rightarrow\infty\) or \(b_2(n)\rightarrow\infty\) then we win.

Let’s reparametrize by writing \(\delta(n)=A_2(n)-A_1(n)=\frac{2}{n+3}\) and \(q(n)=\frac{p(n)-A_1(n)}{\delta(n)}\), so that \(b'_i(j)=\delta(j)\left(\left|i-2+E(j)\right|-\left|q(j)-1+E(j)\right|\right)\).

Now, similarly to how \(M\) worked for constant advisors, let’s look at the problem in rounds: let \(s_0=0\), and \(s_n=\lfloor\exp(s_{n-1}-1)\rfloor+1\) for \(n>0\). When determining \(E(s_{n-1}),\dots,E(s_n-1)\), we can look at \(p(s_{n-1}),\dots,p(s_n-1)\). Let \(t_n=\left\lfloor b_2(s_n)-\frac{1}{n}\right\rfloor\). Let’s set \(E(s_{n-1}),\dots,E(s_n-1)\) to 1 if \(\sum_{j=s_{n-1}}^{s_n-1}\delta(j)\left(1-q(j)\right)\geq 1\); otherwise we’ll do something more complicated, but maintain the constraint that \(b_2(s_n)\geq b_2(s_{n-1})-\tfrac{1}{n(n-1)}\geq t_{n-1}+\tfrac{1}{n}\): this guarantees that \(t_n\) is nondecreasing and that \(\DeclareMathOperator*{\liminf}{lim\,inf} \liminf_{j\rightarrow\infty} b_2(j)\geq \lim_{n\rightarrow\infty} t_n\).

If \(t_n\rightarrow\infty\) then \(b_2(n)\rightarrow\infty\) and we win. Otherwise, let \(t=\lim_{n\rightarrow\infty} t_n\), and consider \(n\) such that \(t_{n-1}=t\).

We have \(\sum_{j=s_{n-1}}^{s_n-1}\delta(j)\left(1-q(j)\right)<1\). Let \(J\subseteq \{s_{n-1},\dots,s_n-1\}\) be a set of indices with \(q(j)\geq q(j')\) for all \(j\in J,j'\not\in J\), that is maximal under the constraint that \(\sum_{j\in J}\delta(j)(1-q(j))\leq \frac{1}{n(n-1)}\); thus we will still have \(\sum_{j\in J}\delta(j)(1-q(j))\geq \frac{1}{n(n-1)}-\delta(s_{n-1})\). We shall set \(E(j)=0\) for all \(j\in J\).

By the definition of \(J\): \[\begin{aligned} \sum_{j\in J}b_1'(j) &=\sum_{j\in J}\delta(j)q(j)\\ &\geq \sum_{j\in J}\delta(j)\left(1-q(j)\right) \frac{\sum_{j=s_{n-1}}^{s_n-1}\delta(j)q(j)}{\sum_{j=s_{n-1}}^{s_n-1}\delta(j)\left(1-q(j)\right)}\\ &\geq \left(\frac{1}{n(n-1)}-\delta(s_{n-1})\right)\frac{\sum_{j=s_{n-1}}^{s_n-1}\delta(j)-1}{1}\\ &\geq\left(\frac{1}{n(n-1)}-\delta(s_{n-1})\right) \left(2\log\left(\frac{s_n+3}{s_{n-1}+3}\right)-1\right)\\ &\geq 2\mbox{ if }n\gg 0\\ \end{aligned}\]

For \(j'\not\in J\), we’ll proceed iteratively, greedily minimizing \(\left|\sum_{j'=s_{n-1}}^{j}1_{j'\not\in J}\left(b'_1(j'),b'_2(j')\right)\right|\). Then: \[\begin{aligned} \min_{s_{n-1}\leq j< s_n} \sum_{j'=s_{n-1}}^j 1_{j'\not\in J} b'_1(j') &\geq -\sqrt{\sum_{j=s_{n-1}}^{s_n-1} \delta(j)^2}\\ &=-2\sqrt{\sum_{j=s_{n-1}+3}^{s_n-2} \frac{1}{j^2}}\\ &\geq -2\sqrt{\sum_{j=s_{n-1}+3}^{s_n-2} \left(\tfrac{1}{j-1}-\tfrac{1}{j}\right)}\\ &\geq -\frac{2}{\sqrt{s_{n-1}+2}}\\ &\geq -1\mbox{ if }n\gg 0\\ \end{aligned}\]

Keeping this constraint, we can flip (or not flip) all the \(E(j')\)s for \(j'\not\in J\) so that \(\sum_{j'=s_{n-1}}^{s_n-1}1_{j'\not\in J}b'_2(j')>0\). Then, we have \(b_2(s_n)\geq b_2(s_{n-1})-\frac{1}{n(n-1)}\), \(b_1(s_n)-b_1(s_{n-1})=\sum_{j=s_{n-1}}^{s_n}\left(1_{j\in J}+1_{j\not\in J}\right)b'_1(j)\geq 2-1=1\) if \(n\gg 0\), and for \(s_{n-1}\leq j\leq s_n\), \(b_1(j)\geq b_1(s_{n-1})+\sum_{j'=s_{n-1}}^{j-1}1_{j'\not \in J}b'_1(j')\geq b_1(s_{n-1})-1\) if \(n\gg 0\).

Therefore, \(b_1(j)\rightarrow\infty\), so we win.

reply

by Tsvi Benson-Tilsen 1301 days ago | link

Could you spell out the step

every iteration where mean(𝙴[𝚙𝚛𝚎𝚟:𝚝𝚑𝚒𝚜])≥2/5 will cause bound - bad1() to grow exponentially (by a factor of 11/10=1+(1/2)(−1+2/5𝚙𝟷))

a little more? I don’t follow. (I think I follow the overall structure of the proof, and if I believed this step I would believe the proof.)

We have that eps is about (2/3)(1-exp([bad1() - bound]/(next-this))), or at least half that, but I don’t see how to get a lower bound on the decrease of bad1() (as a fraction of bound-bad1() ).

reply

by Scott Garrabrant 1298 days ago | Tsvi Benson-Tilsen likes this | link

You are correct that you use the fact that 1+eps is at approximately e^(eps).

The concrete way this is used in this proof is replacing the ln(1+3eps) you subtract from bad1 when the environment is a 1 with 3eps=(bound - bad1) / (next - this), and replacing the ln(1-3eps/2) you subtract from bad1 when the environment is a 0 with -3eps/2=-(bound - bad1) / (next - this)/2

Therefore, you subtract from bad1 approximately at least (next-this)((2/5)(bound - bad1) / (next - this)-(3/5)*(bound - bad1) / (next - this)/2).

This comes out to (bound - bad1)/10.

I believe the inequality is the wrong direction to just use e^(eps) as a bound for 1+eps, but when next-this gets big, the approximation gets close enough.

reply

by Tsvi Benson-Tilsen 1298 days ago | Scott Garrabrant likes this | link

In case anyone shared my confusion:

The while loop where we ensure that eps is small enough so that

bound > bad1() + (next - this) * log((1 - p1) / (1 - p1 - eps))

is technically necessary to ensure that bad1() doesn’t surpass bound, but it is immaterial in the limit. Solving

bound = bad1() + (next - this) * log((1 - p1) / (1 - p1 - eps))

gives

eps >= (1/3) (1 - e^{ -[bound - bad1()] / [next - this]] })

which, using the log(1+x) = x approximation, is about

(1/3) ([bound - bad1()] / [next - this] ).

Then Scott’s comment gives the rest. I was worried about the fact that we seem to be taking the exponential of the error in our approximation, or something. But Scott points out that this is not an issue because we can make [next-this] as big as we want, if necessary, without increasing bad1() at all, by guessing p1 for a very long time until [bound - bad1()] / [next - this]] is close enough to zero that the error is too small to matter.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms