Summary: this post reframes ideas from “Learning a concept using only positive examples” and “The AI, the best human advisor” in terms of a strong constraint for conservative behavior, and shows that quantilizers are optimal under this constraint.
Defining a quantilizer
A \(q\)quantilizer is an agent that, when faced with a decision problem, gives a random action in the top \(q\) proportion by expected utility. More formally, let \(A\) be the action set and \(p_A\) be some distribution over actions. For any \(0 < q \leq 1\), a \(q\)quantilizer returns an action \(a\) with probability
\[P(a) = 0 \text{ if } \sum_{a' \in A} p_A(a') [ E[U(a', W)] > E[U(a, W)] ] > q\] \[P(a) = p_A(a) / q \text{ if } \sum_{a' \in A} p_A(a') [ E[U(a', W)] \geq E[U(a, W)] ] < q\] \[P(a) = \frac{1}{q}\frac{p_A(a)}{\sum_{a' \in A, a' \text{ also falls into this case}} p_A(a')} \left( q  \sum_{a' \in A} p_A(a') [ E[U(a', W)] > E[U(a, W)] ]\right) \text{ otherwise}\]
where \(U\) is the quantilizer’s utility function. The weird logic in the third case is necessary for handling ties; the details are not especially important. Naturally, a 0quantilizer is just a maximizer. A 0.5quantilizer (which we could also call a 50percentilizer) samples a random aboveaverage action, a 0.01quantilizer samples a random action in the top 1%, and a 1quantilizer samples a random action from \(p_A\).
The conservative cost constraint
Why might we use a quantilizer? Suppose we believe there is some unknown cost function \(c : A \rightarrow \mathbb{R}^+\), which quantifies the negative contribution of the action to our utility function (e.g. we could define \(c(a) = \max\{0, \mathbb{E}[V(\text{do nothing}, W)]  \mathbb{E}[V(a, W)]\}\), where \(V\) is our true utility function). For example, an action that turns the universe into paperclips would be very costly. As explained previously in “Learning a concept using only positive examples”, the fact that an agent’s distribution over actions fits under \(p_A\) indicates that it does not accrue many times more expected cost than would be accrued by sampling from \(p_A\). It turns out that for \(0 < q \leq 1\), a \(q\)quantilizer maximizes expected utility subject to the conservative cost constraint
\[\forall (c : A \rightarrow \mathbb{R}^+) \frac{\sum_{a \in A} P(a) c(a)}{\sum_{a \in A} p_A(a) c(a)} \leq 1/q\]
The cost constraint states that, for any cost function, the expected cost of an action drawn from \(P\) is no more than \(1/q\) times the expected cost of an action drawn from \(p_A\).
Optimality of quantilizers under the cost constraint
From the cost constraint we can prove \[\forall (a \in A) P(a) \leq p_A(a) / q\]
Suppose this were not true. Then for some \(a^*\) we have \(P(a^*) > p_A(a^*) / q\). Now define \(c(a) = [a = a^*] / p_A(a^*)\). We have \[\sum_{a \in A} p_A(a) c(a) = p_A(a^*) c(a^*) = 1\] Also, \[\sum_{a \in A} P(a) c(a) = P(a^*) c(a^*) > p_A(a^*) c(a) / q = 1 / q\] Jointly, these statements contradict our cost constraint.
Since we cannot assign more that \(p_A(a) / q\) probability to any action \(a\), the best we can do is assign \(p_A(a) / q\) probability to the best actions and 0 to the rest. This gets us back the definition of a quantilizer.
Repeated quantilization of games is not quantilization of repeated games
Suppose we have 2 independent games. On game \(i\), we choose an action \(a\) and receive utility \(U(a, W_i)\), where each \(W_i\) independently comes from some distribution over worlds. We could \(q\)quantilize each game independently, so we choose from the best \(q\) propertion of actions in each game independently. Alternatively, we could \(q\)quantilize the pair of the games. Here we define \(W' = (W_1, W_2)\) and \(U'((a_1, a_2), (w_1, w_2)) = U(a_1, w_1) + U(a_2, w_2)\). The expected utility of an action pair is \(\mathbb{E}[U'((a_1, a_2), W')] = \mathbb{E}[U(a_1, W)] + \mathbb{E}[U(a_2, W)]\)
A \(q\)quantilizer will sample a random pair of actions \((a_1, a_2)\) from the best \(q\) proportion of the action pairs. For example, if each game consists of saying a number in the unit interval and then gaining that much utility, and \(p_A\) is uniform, then a 12.5percentilizer will sample uniformly from the set \(\{(a_1, a_2) \in [0, 1] \times [0, 1]  x + y \geq 1.5\}\). On the other hand, if we 12.5percentilize each game independently, we will sample our action pair uniformly from the set \([0.875, 1]^2\).
What could explain this difference? If all of our cost comes from action pairs in the set \([0.875, 1]^2\) (so \(c(a_1, a_2) = 0\) outside this set), then uniformly sampling from \([0.875, 1]^2\) will accrue a cost 64 times that of sampling from \(p_A\) twice independently. On the other hand, uniformly sampling from \(\{(a_1, a_2) \in [0, 1] \times [0, 1]  x + y \geq 1.5\}\) will accrue a cost only 8 times that of sampling from \(p_A\) twice independently.
The additive cost constraint
Suppose we added the assumption that \(c\) is additive over the two games, so \(c(a_1, a_2) = c'(a_1) + c'(a_2)\) for some \(c' : A \rightarrow \mathbb{R}^+\). Then we get the additive cost constraint \[ \forall (c' : A \rightarrow \mathbb{R}^+) \frac{\sum_{a_1, a_2 \in A} P(a_1, a_2) (c'(a_1) + c'(a_2))}{\sum_{a_1, a_2 \in A} p_A(a_1)p_A(a_2) (c'(a_1) + c'(a_2))} \leq 1/q \]
Then a repeated \(q\)quantilizer would get total expected cost \[\mathbb{E}_P[c(A_1, A_2)] = 2\mathbb{E}_P[c'(A_1)] \leq \frac{2}{q} \mathbb{E}_{p_A}[c'(A_1)] = \frac{1}{q} \mathbb{E}_{p_A}[c(A_1, A_2)]\] so repeated quantilization would be considered acceptable under the additive cost constraint.
However, adding this assumption may be dangerous. It is possible to create scenarios in which almost all of the cost comes from correlation between actions in different trials.
For example, say that we’re creating a stocktrading AI. Every minute, the AI either decides to buy or sell a single share of Google. We can set \(p_A\) to be a uniform distribution over \(\{ \text{buy}, \text{sell} \}\). For some reason, there is another computer somewhere that detects whether the AI bought or sold a share each minute appends a bit (0 for buy or 1 for sell) to a file accordingly. It runs this file (say, as a Python program) after collecting a million bits.
Now suppose the AI sells the stock only on minutes corresponding to 1 bits in a UFAI source code (which is less than a million bits long). This would cause the UFAI to be run. Since the probability that \(p_A\) would make any one of these decisions (to buy or sell a share in a particular minute) is 50%, if we assumed that the cost was additive over iterations, then we would conclude that this AI is only twice as costly as independent samples from \(p_A\) would be. While independent quantilization would not lead to this outcome, this hypothetical scenario proves that assuming that the cost function is additive can be very bad.
Questions for future research
Are there reasonable lessrestrictive constraints on cost that allow something like independent quantilization?
Intuitively, it seems like independent quantilization is usually not dangerous. Perhaps there is some cost constraint that allows independent quantilization (or something like it) while forbidding the UFAI encoder. We want something less restrictive than the original cost constraint, but more restrictive than the additive cost constraint.
Can we use this framework to define conservative classifiers (and other learners)?
We would like to create classifiers that learn from training data and, when deciding what to do with a new example, either classify it (if it looks like the training data) or refuse to classify it (if it looks quite different from the training data). For example, if we train an AI to classify smiling humans as positive and frowning humans as negative, perhaps we want it to refuse to classify molecular smiley faces as either positive or negative. These classifiers should usually classify points when they come from the same distribution as the training data, and they should obey some kind of conservative cost constraint. I am working on writing up a solution to this for a future forum post.
When do “ordinary” \(p_A\) distributions have high cost?
It could be that most ordinary actions that people take are either very good or very bad (and a superintelligence could predict which one it is), due to butterfly effects. In this case, the expected cost of sampling from fairly ordinary \(p_A\) distributions is actually quite high, but it is cancelled out by an equally high expected benefit. However, a quantilizer may preserve most of the expected cost of \(p_A\) without preserving the expected benefit. How much of a problem is this in practice, and how can it be reduced (e.g. is it possible to use false thermodynamic miracles to interfere with butterfly effects)?
Can we use something like a quantilizer to “win”?
How could an organization with access to superintelligent AI use a conservative agent (such as a quantilizer or a conservative classifier) to drastically reduce existential risk (e.g. by shutting down rogue AI projects)? It would be useful to know what kind of capabilities would be necessary to solve the strategic situation, so that we can see if a conservative agent can deliver these capabilities.
