Kolmogorov complexity makes reward learning worse
discussion post by Stuart Armstrong 18 days ago | discuss

A putative new idea for AI control; index here.

In a previous post, I argued that Kolmogorov complexity/simplicity priors do not help when learning human values - that some extreme versions of the reward or planners were of roughly equal complexity.

Here I’ll demonstrate that it’s even worse than that: the extreme versions are likely simpler than a “reasonable” one would be.

Of course, as with any statement about Kolmogorov complexity, this is dependent on the computer language used. But I’ll aim to show that for a “reasonable” language, the result holds.

So let $$(p, R)$$ be a reasonable pair that encodes what we want to encode in human rationality and reward. It is compatible with the human policy $$\pi_H$$, in that $$p(R)=\pi_H$$.

Let $$(p_r, R_r)$$ be the compatible pair where $$p_r$$ is the rational Bayesian expected reward maximiser, with $$R_r$$ the corresponding reward so that $$p_r(R_r)=\pi_H$$.

Let $$(p_i, 0)$$ be the indifferent planner (indifferent to the choice of reward), chosen so that $$p_i(R')=\pi_H$$ for all $$R'$$. The reward $$0$$ is the trivial reward.

# Information content present in each pair

The planer $$p_i$$ is simply a map to $$\pi_H$$, so the only information in $$p_i$$ (and $$(p_i, 0)$$) is the definition of $$\pi_H$$.

The policy $$\pi_H$$ and the brief definition of an expected reward maximiser $$p_r$$ are the only information content in $$(p_r, R_r)$$.

On the other hand, $$(p, R)$$ defines not only $$\pi_H$$, but, at every action, it defines the bias or inefficiency of $$\pi_H$$, as the difference between the value of $$\pi_H$$ and the ideal $$R$$-maximising policy $$\pi_R$$. This is a large amount of information, including, for instance, every single human bias and example of bounded rationality.

None of the other pairs have this information (there’s no such thing as bias for the flat reward $$0$$, nor for the expected reward maximiser $$p_r$$), so $$(p, R)$$ contains a lot more information than the other pairs, so we expect it to have higher Kolmogorov complexity.

### NEW DISCUSSION POSTS

Indeed there is some kind of
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
 by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
 by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
 by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
 by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
 by Stuart Armstrong on Predictable Exploration | 0 likes