Intelligent Agent Foundations Forumsign up / log in
Two Questions about Solomonoff Induction
post by Scott Garrabrant 1388 days ago | Sam Eisenstat, Vanessa Kosoy and Jessica Taylor like this | 4 comments

In this post, I ask two questions about Solomonoff Induction. I am not sure if these questions are open or not. If you know the answer to either of them, please let me know. I think that the answers may be very relevant to stuff I am currently working on in Asymptotic Logical Uncertainty.

Consider a universal Turing machine \(U\) which takes in an infinite stream of random bits, and outputs a finite or infinite string of bits. For any bit string \(w\), let \(P(w)\) denote the probability that \(w\) is a prefix of the output of \(U\).

Let \(E\) be an probabilistic environment which outputs an infinite bit string, and set \(e_i\) be the first \(i\) bits output by \(E\).

If \(E\) assigns outputs according to any computable probability distribution, then with probability 1, \(P(e_{i}1)/P(e_i)\) quickly converges to the actual probability that \(E\) outputs a 1 at location \(i+1\) given that its first \(i\) bits were \(e_i\).

In particular, if \(E\) outputs an infinite stream of iid bits which are each 1 with some rational probability \(p\), then \(P(e_{i}1)/P(e_i)\) will converge to \(p\) with probability 1.

However I am curious what happens if \(E\) is not computable. In the simplest case \(E\) outputs an infinite stream of iid bits which are each 1 with some uncomputable real number probability \(r\). In this case, will \(P(e_{i}1)/P(e_i)\) converge to \(r\) with probability 1?

To make things slightly more complicated, assume that the odd index bits of \(E\) come from a fair coin, but the even bits come from an adversary who chooses the bits in an uncomputable way. (This adversary can see all the bits output by \(E\) so far, but not future bits.) In this case, will \(P(e_{2i}1)/P(e_{2i})\) converge to 1/2?



by Sam Eisenstat 747 days ago | Vanessa Kosoy and Abram Demski like this | link

Universal Prediction of Selected Bits solves the related question of what happens if the odd bits are adversarial but the even bits copy the preceding odd bits. Basically, the universal semimeasure learns to do the right thing, but the exact sense in which the result is positive is subtle and has to do with the difference between measures and semimeasures. The methods may also be relevant to the questions here, though I don’t see a proof for either question yet.

reply

by Jessica Taylor 1380 days ago | Scott Garrabrant likes this | link

My intuition is that, for the iid bits with uncomputable probability r, there will exist an optimal (up to a constant) solution of the form:

  1. compute some real number \(\hat{r}\) by running the shortest program computing this
  2. generate bits iid from \(Ber(\hat{r})\)

If this is the case, then the optimal algorithm will choose \(\hat{r}\) to minimize \(K(\hat{r}) + nKL(Ber(r) || Ber(\hat{r}))\) if \(n\) is the number of bits observed. As \(n\) goes to infinity, minimizing this objective causes \(\hat{r}\) to converge to \(r\).

Now I’m not sure how to prove that an optimal (up to a constant) hypothesis of the form above exists. Basically this is saying that \(K(\text{the n bits}) \geq c + \min_{\hat{r}} (K(\hat{r}) + n KL(Ber(r) || Ber(\hat{r})))\) for some constant \(c\) with high probability. Possibly the argument goes something like: for any non-conditionally-iid way of assigning probabilities to the bits, there is a better iid way (by averaging the probabilities). And if you are flipping the bits iid, then at some point you compute the probability of each flip, so you might as well just optimally compress some \(\hat{r}\). But I couldn’t quite formalize this argument well, because of the fact that hypotheses can encode correct correlations through overfitting.

reply

by Vladimir Slepnev 1330 days ago | link

Well, Laplace’s rule of succession is a computable prior that will almost certainly converge to your uncomputable probability value, and I think the difference in log scores from the “true” prior will be finite too. Since Laplace’s rule is included in the Solomonoff mixture, I suspect that things should work out nicely. I don’t have a proof, though.

reply

by Vanessa Kosoy 1014 days ago | link

Can you provide links to the solutions, I am curious? (especially regarding the second question)

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms