 Equilibria in adversarial supervised learning
post by Ryan Carey 1006 days ago | Vanessa Kosoy, Nate Soares and Patrick LaVictoire like this | discuss

Note: This describes an idea of Jessica Taylor’s.

# Introduction

An important aim in safe AI development is to make systems whose submodules analyze how they might go wrong. One model for this in academic machine learning research is training on adversarial examples.

In order to better understand the adversarial training process, we outline a specific adversarial learning model and consider what kind of equilibrium state it might possess.

# Setup

Consider a general adversarial learning scenario described by the tuple $$(\mathcal{X},\mathcal{Y}, \mathcal{W}, g, f, \ell)$$ where:

• $$\mathcal{X}$$ is a set of examples
• $$\mathcal{Y}$$ is a set of the labels for these examples
• $$\mathcal{W} \subset \mathbb{R}^{n}$$ is a non-empty, compact, and convex set of hypotheses that can be chosen by the learner
• $$g:\mathcal{X} \mapsto \Delta \mathcal{Y}$$ is a continuous mapping that nature uses to probabilistically assign ground truth labels to each example ($$\Delta \mathcal{Y}$$ represents a probability distribution over $$\mathcal{Y}$$, $$\Delta \mathcal{Y} := \{p:\mathcal{Y} \mapsto [0,\infty): \int_{y \in Y}p(y)dy = 1\}$$)
• $$f:\mathcal{W} \mapsto \Delta \mathcal{X}$$ is a continuous function that adversarially maps a hypothesis to some probability distribution over examples, allocating more probability mass to examples that will give the hypothesis difficulty.
• $$\ell:\mathcal{X} \times \mathcal{Y} \times \mathcal{W} \mapsto \mathbb{R}$$, is a general loss function that appraises the performance of some hypothesis on some labeled example. It is convex in its third argument.

The learner’s aim is to choose a hypothesis $$w \in \mathcal{W}$$ that achieves low expected loss $$\ell(x,y,w)$$, given that the labeled example is sampled from an adversarial distribution, $$x \sim f(w), y \sim g(x)$$.

# An equilibrium in expert selection

One plausible approach for the learner is to consider hypotheses that perform well on some adversarial distribution $$f(w)$$. The hypotheses that perform best on a some fixed distribution $$f(w)$$ are given by the set-valued function:

$h(w) = \text{argmin}_{w^{'} \in \mathcal{W}} \mathbb{E}_{x \sim f(w),y \sim g(x)}[\ell(x,y,w{'})]$

The difficulty with choosing a hypothesis is that the adversarial distribution is not fixed, but rather it varies according to which hypothesis is chosen. One response is to choose some hypothesis $$w^{*}$$ that is located at a fixed point, such that $$w^* \in h(w^{*})$$. There is always at least one such fixed point, $$w^*$$, and this can be shown by Kakutani’s fixed-point theorem.

To apply Kakutani’s fixed point theorem, we need to show that: i) the domain of $$h$$ is compact, convex and nonempty, ii) $$h$$ is convex-valued, and iii) $$h$$ is compact-valued and upper hemicontinuous.

First, by its definition, $$\mathcal{W}$$, the domain of $$h$$, is non-empty, closed, and compact. Second, $$h$$ is convex-valued because it minimizes a convex function. Remember that $$\ell$$ is convex in $$w^{'}$$. The expectation of $$\ell$$ is then a non-negative weighted combination of functions convex in $$w^{'}$$, so it is convex. Hence the values that minimize $$h$$ are some convex set. Finally, $$h$$ is compact-valued and upper hemicontinuous by Berge’s maximum theorem. This is because $$h$$ has a compact domain and outputs the arguments that minimize a continuous function. So by Kakutani’s fixed point theorem, $$h$$ will have at least one fixed point, $$w^*$$.

# Discussion

The motivation for choosing a hypothesis that lies at a fixed point is that this hypothesis need not be changed in response to the examples selected by an adversary. We should be clear though that this does not mean we can design an algorithm with this approach. Locating these fixed points would be computationally expensive and require lots of data in the general case. One open question is under what circumstances we will be able to find such an equilibrium efficiently.

Note that the equilibrium weight vectors do not necessarily minimize loss over all possible weight vectors; it might be possible to get a lower loss by selecting a non-equilibrium weight vector (on which the adversary $$f$$ will produce easier examples). In some applications, it will be desirable to meet the equilibrium condition, whereas in others, it will be sufficient to have no regret compared to alternative weight vectors.

It is worth noting that we can arrive at the same result by thinking of adversarial supervised learning as a continuous two player game. This game has two players: the learner and the adversary. The learner chooses from the strategies $$\mathcal{W}$$ in order to minimize the expectation of $$\ell$$ over $$x$$ and $$y$$, and the adversary chooses some distribution over some compact set of examples $$\Delta \mathcal{X}$$ in order to maximize some general objective function. The Nash equilibria of such a game correspond to the fixed points in our supervised learning setup. It follows that (at least) one fixed point will always exist. Of course, like the fixed points in adversarial supervised learning, Nash equilibria can be difficult to compute, and will not always give any particular agent an optimal score.

In future analysis, we may be able to think of other properties that the adversarial learning setup inherits from two player games.

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes