Updateless decision theory was informally defined by Wei Dai in terms of logical conditional expected utility, where the condition corresponds to an algorithm (the agent) producing a given output (action or policy). This kind of conditional expected values can be formalised by optimal predictors. However, since optimal predictor systems which are required to apply optimal predictors to decision theory generally have random advice, we need counterfactuals well-defined for random algorithms i.e. algorithms that produce different outputs with different probabilities depending on internal coin tosses. We propose to define these counterfactuals by a generalization of the notion of conditional expected utility which amounts to linear regression of utility with respect to the probabilities of different outputs in the space of “impossible possible worlds.” We formalise this idea by introducing “relative optimal predictors,” prove the analogue of the conditional probability formula (which takes matrix form) and uniqueness theorems.
Motivation
We start by explaining the analogous construction in classical probability theory and proceed to defining the logical counterpart in the Results section.
Consider \(\zeta\) a probability measure on some space, a random variable \(u\) representing utility, a finite set \(\mathcal{A}\) representing possible actions and another random variable \(p\) taking values in \([0,1]^{\mathcal{A}}\) and satisfying \(\sum_{a \in \mathcal{A}} p_a = 1\) representing the probabilities of taking different actions. For a deterministic algorithm, \(p\) takes values \(\{0,1\}^{\mathcal{A}}\) allowing defining conditional expected utility as
\[u_a := \operatorname{E}_\zeta[u \mid p_a = 1] = \frac{\operatorname{E}_\zeta[u p_a]}{\operatorname{E}_\zeta[p_a]}\]
In the general case, it is tempting to consider
\[\operatorname{E}_{\zeta \ltimes p}[u \mid a] = \frac{\operatorname{E}_\zeta[u p_a]}{\operatorname{E}_\zeta[p_a]}\]
where \(\zeta \ltimes p\) stands for the semidirect product of \(\zeta\) with \(p\), the latter regarded as a Markov kernel with target \(\mathcal{A}\). However, this would lead to behavior similar to EDT since conditioning by \(a\) is meaningful even for a single “world” (i.e. completely deterministic \(u\) and \(p\)). Instead, we select \(u^* \in {\mathbb{R}}^{\mathcal{A}}\) that minimizes \(\operatorname{E}_\zeta[(u - p^t u^*)^2]\) (we regard elements of \({\mathbb{R}}^{\mathcal{A}}\) as column vectors so \(p^t\) is a row vector). This means \(u^*\) has to satisfy the matrix equation
\[\operatorname{E}_\zeta[p p^t] u^* = \operatorname{E}_\zeta[u p]\]
The solution to this equation is only unique when \(\operatorname{E}_\zeta[p p^t]\) is non-degenerate. This corresponds to requiring positive probability of the condition for usual conditional expected values. In case \(p\) takes values in \(\{0,1\}^{\mathcal{A}}\), \(u^*\) is the usual conditional expected value.
Preliminaries
|