This post is part of the Asymptotic Logical Uncertainty series. In this post, I give a concrete example of how the Solomonoff Induction Inspired Aproach fails to converge to the correct probabilities when predicting the output of a simple Turing machine.
Let \(f:\mathbb{N}\rightarrow \{0,1\}\) be a function such that (for large \(n\)) \(f(n)\) can be computed in time \(2^n\), but there is no algorithm that can predict \(f(n)\) in time less than \(2^n\) consistently better than a coin which outputs 1 with probability \(1/3\).
Construct a Turing machine \(E\) as follows:
\(E(2k)=f(2k)\), and
\(E(2k+1)=f(2k+1)\) XOR \(f(10^k)\).
Imagine trying to predict \(E(k)\) with an algorithm which takes time at most \(k^2\). If \(k\) is even, then it is determined by an event indistinguishable in time \(k^2\) from a \(1/3\) coin, so the predictor should output 1 or 0 with probability \(1/3\). If \(k\) is odd, then we have the XOR of two bits that are indistinguishable from independently 1 with probability \(1/3\), so the predictor should output 1 with probability \(4/9\).
Therefore, if we let \(T(k)=k^3\) and \(R(k)=k^2\), if SIA worked correctly, then we would have that the limit of the probability that \(SIA(E,T,R)\) outputs 1 on the even bits would be \(1/3\).
However, this is not the case. In particular, the the probability that the \(10^k\)th bit output by \(SIA(E,T,R)\) is a 1 will not converge to \(1/3\), but rather switch between \(4/9\) and \(5/9\).
This is because at the time that a predictor needs to predict \(E(10^k)\), it will have already outputted a 1 with probability \(4/9\) for \(E(2k+1)\), and had enough time to compute \(f(2k+1)\). Therefore, if \(f(2k+1)=0\), then any predictor that does make its prediction for \(E(10^k)\) match its prediction for \(E(2k+1)\) will contradict itself.
The machines that will do well when sampled in \(SIA(E,T,R)\) will be the ones that output 1 independently with probability 1/3 for all even bits and 1 with probability \(4/9\) for all odd bits, EXCEPT when predicting a bit in a location that is a power of 10, in which case, it will copy (or negate) a previous bit, and output 1 with probability either \(4/9\) or \(5/9\), both of which are greater than the desired \(1/3\).
Further, a machine that gives a 1 with probability of \(1/3\) on \(E(10^k)\) for all \(k\) will have a likelyhood going to 0 relative to the machines which just copy or negate their old answers.
Note that the correct probability with which to believe \(E(10^k)=1\) really is \(1/3\), not \(4/9\) or \(5/9\). The prediction given to \(E(2k+1)\) was calculated from a state of ignorance about \(f(2n+1)\), and this prediction should have no effect on the prediction for \(E(10^k)\).
