The Three Levels of Goodhart's Curse post by Scott Garrabrant 34 days ago | Vadim Kosoy, Abram Demski and Paul Christiano like this | discuss Goodhart’s curse is a neologism by Eliezer Yudkowsky stating that “neutrally optimizing a proxy measure U of V seeks out upward divergence of U from V.” It is related to many near by concepts (e.g. the tails come apart, winner’s curse, optimizer’s curse, regression to the mean, overfitting, edge instantiation, goodhart’s law). I claim that there are three main mechanisms through which Goodhart’s curse operates. Goodhart’s Curse Level 1 (regressing to the mean): We are trying to optimize the value of $$V$$, but since we cannot observe $$V$$, we instead optimize a proxy $$U$$, which is an unbiased estimate of $$V$$. When we select for points with a high $$U$$ value, we will be biased towards points for which $$U$$ is an overestimate of $$V$$. As a simple example imagine $$V$$ and $$E$$ (for error) are independently normally distributed with mean 0 and variance 1, and $$U=V+E$$. If we sample many points and take the one with the largest $$U$$ value, we can predict that $$E$$ will likely be positive for this point, and thus the $$U$$ value will predictably be an overestimate of the $$V$$ value. In many cases, (like the one above) the best you can do without observing $$V$$ is still to take the largest $$U$$ value you can find, but you should still expect that this $$U$$ value overestimates $$V$$. Similarly, if $$U$$ is not necessarily an unbiased estimator of $$V$$, but $$U$$ and $$V$$ are correlated, and you sample a million points and take the one with the highest $$U$$ value, you will end up with a $$V$$ value on average strictly less than if you could just take a point with a one in a million $$V$$ value directly. Goodhart’s Curse Level 2 (optimizing away the correlation): Here, we assume $$U$$ and $$V$$ are correlated on average, but there may be different regions in which this correlation of stronger or weaker. When we optimize $$U$$ to be very high, we zoom in on the region of very large $$U$$ values. This region could in principle have very small $$V$$ values. As a very simple example imagine $$U$$ is integer uniform between 0 and 1000 inclusive, and $$V$$ is equal to $$U$$ mod 1000. Overall, $$U$$ and $$V$$ are correlated. The point where $$U$$ is 1000 and $$V$$ is 0 is an outlier, but it is only one point and does not sway the correlation that much. However, when we apply a lot of optimization pressure, we through away all the points with low $$U$$ values, and left with a small number of extreme points. Since this is a small number of points, the correlation between $$U$$ and $$V$$ says little about what value $$V$$ will take. Another more realistic example is that $$U$$ and $$V$$ are two correlated dimensions in a multivariate normal distribution, but we cut off the normal distribution to only include the disk of points in which \(U^2+V^2

### NEW DISCUSSION POSTS

Note that the problem with
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Typos on page 5: *
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Ah, you're right. So gain
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> Do you have ideas for how
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I think I understand what
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>You don’t have to solve
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

My confusion is the
 by Tom Everitt on Delegative Inverse Reinforcement Learning | 0 likes

> First of all, it seems to
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> figure out what my values
 by Vladimir Slepnev on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I agree that selection bias
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>It seems quite plausible
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

> defending against this type
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

2. I think that we can avoid
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I hope you stay engaged with
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes