Intelligent Agent Foundations Forumsign up / log in
Another view of quantilizers: avoiding Goodhart's Law
discussion post by Jessica Taylor 923 days ago | Abram Demski, Patrick LaVictoire and Stuart Armstrong like this | 1 comment

Goodhart’s law states:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

One way of framing this is that, when you are solving some optimization problem, a metric that is correlated with a desired objective will often stop being correlated with the objective when you look at the extreme values of the metric. For example, although the number of paperclips a paperclip factory produces tends to be correlated with how useful the factory is for its owner’s values, a paperclip factory that produces an extremely high number of paperclips is likely to be quite bad for its owner’s values.

Let’s try to formalize this. Suppose you are finding some \(x \in \mathcal{X}\) that optimizes some unknown objective function \(f : \mathcal{X} \rightarrow \mathbb{R}\), and you have some estimate \(g : \mathcal{X} \rightarrow \mathbb{R}\) which you believe to approximate \(f\). Specifically, you have a guarantee that, for some base distribution \(\gamma : \Delta \mathcal{X}\), \(g\) does not incorrectly estimate \(f\) much on average:

\[\mathbb{E}_{X \sim \gamma}[|g(x) - f(x)|] \leq k\]

We might suppose that we only want to take actions if our expected \(f\) is above zero; otherwise, it would be better to do nothing.

Given this, how do you pick an \(x\) to guarantee a good objective value \(f(x)\) across all possible objective functions \(f\)? Naively, you might pick \(x = \arg\max_{x \in \mathcal{X}} g(x)\); however, if this \(x\) has a low probability under \(\gamma\), then it is possible for \(g(x)\) to be much higher than \(f(x)\) without causing \(g\) to overestimate \(f\) much on average.

If \(f\) is chosen adversarially, the optimization problem to solve is: \[\arg\max_{a \in [0, 1], p \in \Delta \mathcal{X}} ~ ~ ~ \min_{f : \mathcal{X} \rightarrow \mathbb{R}, \mathbb{E}_{X \sim \gamma}[|g(x) - f(x)|] \leq k} a\mathbb{E}_{X \sim p}[f(X)]\] where \(a\) is the probability that the agent takes an action at all, and \(p\) is the action distribution if it takes an action. Equivalently, since the most adversarial \(f\) values will not ever be above \(g\): \[\arg\max_{a \in [0, 1], p \in \Delta \mathcal{X}} ~ ~ ~ \min_{f : \mathcal{X} \rightarrow \mathbb{R}, \forall x f(x) \leq g(x), \mathbb{E}_{X \sim \gamma}[g(x) - f(x)] \leq k} a\mathbb{E}_{X \sim p}[f(X)]\] Define \(c(x) = g(x) - f(x)\): \[\arg\max_{a \in [0, 1], p \in \Delta \mathcal{X}} ~ ~ ~ \min_{c : \mathcal{X} \rightarrow \mathbb{R}^+, \mathbb{E}_{X \sim \gamma}[c(x)] \leq k} a\mathbb{E}_{X \sim p}[g(X) - c(X)]\] \[\arg\max_{a \in [0, 1], p \in \Delta \mathcal{X}} ~ ~ ~ a\min_{c : \mathcal{X} \rightarrow \mathbb{R}^+, \mathbb{E}_{X \sim \gamma}[c(x)] \leq k} \mathbb{E}_{X \sim p}[g(X) - c(X)]\] \[\arg\max_{a \in \{0, 1\}, p \in \Delta \mathcal{X}} ~ ~ ~ a\min_{c : \mathcal{X} \rightarrow \mathbb{R}^+, \mathbb{E}_{X \sim \gamma}[c(x)] \leq k} \mathbb{E}_{X \sim p}[g(X) - c(X)]\]

In fact, when \(a = 1\), the solution to this optimization problem is a \(q\)-quantilizer with utility function \(g\) and base distribution \(\gamma\), for some \(q\). The proof can be found in the “Optimality of quantilizers under the cost constraint” section of the post about quantilizers. \(a\) will be set to 1 if and only if this quantilizer is guaranteed positive utility.

This provides another view of what quantilizers are doing. In effect, they are treating the “utility function” \(U\) as an estimate of the true utility function \(U - c\) that tends to be accurate on average across the base distribution \(\gamma\), and conservatively optimizing given adversarial uncertainty about the true utility function \(U - c\).



by Sune Kristian Jakobsen 894 days ago | Jessica Taylor likes this | link

Not sure how closely related it is (I have not read through it), but here is another paper trying to fight Goodhart’s law: http://arxiv.org/abs/1506.06980

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms