Thoughts on Quantilizers
post by Stuart Armstrong 485 days ago | Ryan Carey and Abram Demski like this | discuss

A putative new idea for AI control; index here.

This post will look at some of the properties of quantilizers, when they succeed and how they might fail.

Roughly speaking, let $$f$$ be some true objective function that we want to maximise. We haven’t been able to specify it fully, so we have instead a proxy function $$g$$. There is a cost function $$c=f-g$$ which measures how much $$g$$ falls short of $$f$$. Then a quantilizer will choose actions (or policies) radomly from the top $$n\%$$ of actions available, ranking those actions according to $$g$$.

It is plausible that for standard actions or policies, $$g$$ and $$f$$ are pretty similar. But that when we push to maximising $$g$$, then the tiny details where $$g$$ and $$f$$ differ will balloon, and the cost can grow very large indeed.

This could be illustrated roughly by figure I, where $$g$$ and $$c$$ are plotted against each other; imagine that $$c$$ is on a log scale.

The blue areas are possible actions that can be taken. Note a large bunch of actions that are not particularly good for $$g$$ but have low cost, a thinner tail of more optimised actions that have higher $$g$$ and still have low cost, and a much thinner tail that has even higher $$g$$ but high cost. The $$g$$-maximising actions with maximal cost are represented by the red star.

Figure I thus shows a situation ripe for some form of quantilization. But consider figure II:

In figure 2, the only way to get high $$g$$ is to have a high $$c$$. The situation is completely unsuited for quantilization: any $$g$$ maximiser, even a quantilizer, will score terribly under $$f$$. But that means mainly that we have chosen a terrible $$g$$.

Now, back to figure I, where quantilization might work, at least in principle. The ideal would be situation Ia; here blue represents actions below the top $$n\%$$ cut-off, green those above (which include the edge-case red-star actions, as before):

Here the top $$n\%$$ of actions all score a good value under $$g$$, and yet most of them have low cost.

But even within the the broad strokes of figure I, quantilization can fail. Figure Ib shows a first type of failure:

Here the problem is that the quantilizer lefts in too many mediocre actions, so the expectation of $$g$$ (and $$f$$) is mediocre; with a smaller $$n\%$$, the quantilizer would be better.

Another failure mode is figure Ic:

Here the $$n\%$$ is too low: all the quantilized solutions have high cost.

# Another quantilizer design

An idea I had some time ago was that, instead of of taking the top $$n\%$$ of the actions, the quantilizer instead choose among the actions that are within $$n\%$$ of the top $$g$$-maximising actions. Such a design would be less likely to encounter situations like Ib, but more likely to face situations like Ic.

# What can be done?

So, what can be done to improve quantilizers? I’ll be posting some thoughts as they develop, but there are two ideas that spring to mind immediately. First of all, we can use CUA oracles to investigate the shape of the space of actions, at least from the perspective of $$g$$ ($$c$$, like $$f$$, cannot be calculated explicitly).

Secondly, there’s an idea that I had around low-impact AIs. Basically, it was to ensure that there was some action the AI could take that could easily reach some approximation of its goal. For instance, have a utility function that encourages the AI to build one papeclip, and cap that utility at one. Then scatter around some basic machinery to melt steel, stretch it, give the AI some manipulator arms, etc… The idea is to ensure there is at least one safe policy that gives the AI some high expected utility. Then if there is one policy, there’s probably a large amount of similar policies in its vicinity, safe policies with high expectation. Then it seems that quantilization should work, probably best in its ‘within $$n\%$$ of the maximal policy’ version (working well because we know the cap of the utility function, hence have a cap on the maximal policy).

Now, how do we know that a safe policy exists? We have to rely on human predictive abilities, which can be flawed. But the reason we’re reasonably confident in this scenario is that we believe that we could figure out how to build a paperclip, given the stuff the AI has lying around. And the AI would presumably do better than us.

### NEW DISCUSSION POSTS

Note: I currently think that
 by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
 by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
 by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
 by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
 by Alex Appel on Smoking Lesion Steelman | 1 like

 by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
 by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
 by Jessica Taylor on Musings on Exploration | 0 likes

 by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
 by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
 by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

If you drop the
 by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
 by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
 by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like