Intelligent Agent Foundations Forumsign up / log in
Structural Risk Minimization
post by Abram Demski 836 days ago | Jessica Taylor and Patrick LaVictoire like this | 1 comment

Structural risk minimization, a concept from computational learning theory, is proposed as a satisficing framework.

The goal of this post is similar to Creating a Satisficer and Minimax as an approach to reduced-impact AI: constructing agents which are safe even with a utility function which isn’t quite what is actually wanted. The approach is similar to Paul Christiano’s Model-Free Decisions.

The philosophy behind this solution is to treat it as an example of the optimizer’s curse: when we can’t quite measure what we want to optimize and are forced to optimize a proxy, then even when a strong correlation is present between what we want and what we optimize for, the tails come apart. This could lead to siren worlds (or more realistically, marketing worlds – as described in that article).

This problem is faced by machine learning algorithms on a regular basis, where it is called overfitting. An algorithm can measure the fit of a model to existing data, but this is merely highly correlated with what’s actually wanted: fit to future data. Over-optimizing this criteria will predictably lead to terrible performance, so much so that techniques like cutting optimization short (“early stopping”) are actually better. Early stopping is just one example of regularization, the practice of preventing overfitting by purposefully constraining the optimization process.

In the machine learning case, regularization can usually be interpreted as an approximation to Bayesian methods: like a Bayesian prior, it usefully constrains beliefs.

Smith & Winkler, as cited in Luke’s article mentioned earlier, seem to think that Bayesian modeling is also the solution to the general problem. Perhaps that’s true, and explicit uncertainty about what to value is enough to avoid problems resulting from over-optimization. Or, perhaps it isn’t. A useful approach which comes to mind is to try to extend the results from computational learning theory which deal with this problem, to see if they can say anything about the safety of improperly specified value functions.

Risk Framework

What follows is an admittedly somewhat unnatural attempt to fit AI risk into the structural risk framework. Hopefully it illustrates the idea well enough to indicate whether this might be a productive direction to research.

Consider the problem of constructing a value-learning agent which takes a series of \(m\) moral classifications (thought-experiments with correct decisions supplied) and returns a policy \(p\) (a function stipulating what action to take in different states of belief). For example, \(m\) might represent a sequence of sense-data with two different motor commands at the end. The job of the policy \(p\) would be to look at \(m\) output a bit representing which of the two motor-actions to take.

Define the error to be 1 for selecting an incorrect action, and 0 for the correct action. There is a background distribution over moral problems, \(\mathbb{P}\), which we define expectation with.

The empirical risk \(\hat{R}(p)\) is the average error of a policy on moral problems so far. The generalization risk \(R(p)\) is the expected error across all moral problems (expectation taken in \(\mathbb{P}\)).

Theorem: Let \(P\) be a finite set of \(n\) policies. Then for any \(\delta>0\), with probability at least \(1-\delta\) we have:

\(\forall p \in P, R(p) \leq \hat{R}(p) + \sqrt{\frac{log(n)+log\frac{2}{\delta}}{2m}}\)

(See Theorem 2.2, Foundations of Machine Learning, Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.)

This bounds the generalization risk (with high probability) in terms of the number of policies we optimize over. The implication, then, is that risk can be reduced by purposefully limiting options considered.

Structural Risk Minimization

Empirical risk minimization (ERM) selects the \(p \in P\) with the lowest empirical risk. We can use the formula from the previous section to bound the generalization error with high probability. Realistically, however, this does not give very good bounds when \(P\) is large enough to be useful.

The structural risk minimization algorithm (SRM) considers hypothesis spaces of increasing size, but with a penalty for the increase in potential generalization error. In other words, we optimize over our upper bound on generalization risk. To spell that out: consider a sequence of policy sets, \(P_1 \subset P_2 \subset P_3...\), so that \(P_i\) has size \(n_i\). The score of each policy \(p\) is the generalization bound for the smallest \(P_i\) containing it:

\(\hat{R}(p) + \sqrt{\frac{log(n_i)+log\frac{2}{\delta}}{2m}}\)

We continue optimizing over larger \(P_i\) until it is no longer possible for the bound to be improved.

We can imagine that the policies \(p_1, p_2, p_3 ...\) are computer programs in lexographical order, and take \(P_i = \{p_j | j\leq i \}\). This is like a soft version of early stopping: rather than halting search at some \(n\), we make it harder and harder to choose policies at higher \(n\).

This gives us a way of thinking about satisficing FAI. Like a value learner, this system makes decisions based on limited knowledge of the utility function. However, it limits its own optimization power based on the amount of such knowledge.


One unfortunate assumption is the fixed distribution \(\mathbb{P}\) over moral problems. This distribution defines both the training sample distribution and the expected generalization, which means that the risk bound is based on an assumption that the training set looks like the test. This does not really make sense.

The framework as described also does not allow for the agent to keep learning. It’s fed a number of training examples, and then it selects a policy. Perhaps it could be given more examples while it is running, but even if that was considered safe, it’s much different than learning from its environment. Notice that this applies to all learning, not just the value learning, unless it learns a policy sophisticated enough to learn directly from the environment. This apparently requires far too many training examples.

by Jessica Taylor 834 days ago | link

Cool, this seems similar to some stuff I’ve been thinking about lately. If I get the meaning of this right, then it’s essentially uniform convergence guarantees in statistical learning theory applied to moral problems, right?

The main problem here is generalization to a different distribution over problems. I don’t know what the best way to handle this is, but one idea is that you can also use similar techniques for detecting differences between the training and test sets. Specifically, if you have a number of hypotheses for what the test distribution \(\mathbb{P}\) is, then you can do null hypothesis significance testing to determine that the test \(\mathbb{P}\) is different from the training \(\mathbb{P}\) if you have enough data, and you could probably use this to bound the error.

Roughly, we want something like this:

  • We have some set of possible \(\mathbb{P}\) distributions
  • We get training data from some unknown distribution \(\mathbb{P}_{train}\), and observe the test problems (but not their answers, of course) from some unknown distribution \(\mathbb{P}_{test}\)
  • We test the null hypothesis that \(\mathbb{P}_{train} = \mathbb{P}_{test}\) against the alternative hypothesis that \(\mathbb{P}_{train} \neq \mathbb{P}_{test}\)
  • If the test says they’re different, alert the human and ask them to label the new data
  • If the test fails to show that they’re different, then the distributions are in fact not very different, so (I think) the old uniform convergence guarantees should mostly still apply
  • And depending on our \(\alpha\) value for NHST, we probably don’t alert the human very often when \(\mathbb{P}_{train} = \mathbb{P}_{test}\) (business as usual)

I haven’t worked out the math, but it seems like there should be something useful here.






Note that the problem with
by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Typos on page 5: *
by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Ah, you're right. So gain
by Abram Demski on Smoking Lesion Steelman | 0 likes

> Do you have ideas for how
by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I think I understand what
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>You don’t have to solve
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

Your confusion is because you
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

My confusion is the
by Tom Everitt on Delegative Inverse Reinforcement Learning | 0 likes

> First of all, it seems to
by Abram Demski on Smoking Lesion Steelman | 0 likes

> figure out what my values
by Vladimir Slepnev on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I agree that selection bias
by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>It seems quite plausible
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

> defending against this type
by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

2. I think that we can avoid
by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I hope you stay engaged with
by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes


Privacy & Terms