 If you’re thinking about the counterfactual world where you do X in the process of deciding whether to do X, let’s call that a firstperson counterfactual. If you’re thinking about it in the process of deciding whether another agent A should have done X instead of Y, let’s call that a thirdperson counterfactual. The definition of, e.g., modal UDT uses firstperson counterfactuals, but when we try to prove a theorem showing that modal UDT is “optimal” in some sense, then we need to use thirdperson counterfactuals.
UDT’s firstperson counterfactuals are logical counterfactuals, but our optimality result evaluates UDT by using physical thirdparty counterfactuals: it asks, would another agent have done better, not, would a different action by the same agent have lead to a better outcome? The former is easier to analyze, but the latter seems to be what we really care about. Nate’s recent post on “global UDT” points towards turning UDT into a notion of thirdparty counterfactuals, and describes some problems. In this post, I’ll give a fuller UDTbased notion of logical thirdparty counterfactuals, which at least fails visibly (returns an error) in the kinds of cases Nate describes. However, in a followup post I’ll give an example where this definition returns a nonerror value which intuitively seems wrong.
Before I start, a historical side note: When Kenny Easwaran visited us for two days and we proved the UDT optimality result, the reason we decided to physical counterfactuals wasn’t actually that we thought these were the better kind of counterfactuals. Rather, we actually thought explicitly about the problem of physical vs. logical thirdperson counterfactuals on the first morning of Kenny’s visit, and decided to look at the physical counterfactuals case because it seemed easier to reason about. Which turned out to be a great decision, because—to our surprise—we very quickly ended up proving the first version of what later became the modal UDT optimality result!
But today, let’s talk about logical counterfactuals. As Nate points out in his Global UDT post, there’s a sort of duality between firstperson and thirdperson counterfactuals—given a good thirdperson notion of counterfactuals, you can try to turn it into a firstperson notion by writing an agent that evaluates actions according to it, and given a firstperson notion you can try to turn it into a thirdperson notion. So is there a way to turn, say, the firstperson counterfactuals of modal UDT into a way to evaluate what would have happened in a certain universe if a certain agent had taken a different action?
Nate’s post describes an algorithm, GlobalUDT(U,A) , which tries to tell you what agent \(A()\) should have done in order to achieve the best outcome in universe \(U()\). Here, I want to ask a more intermediate question: What would have happened if \(A()\) had chosen a different action? Of course, we can then say that the agent should have taken the action that leads to the best possible outcome in this sense, but one advantage of my proposal is that it sometimes says, “I don’t know what would have happened in that case”; in particular, in the cases Nate discusses in his post, my proposal would say that it doesn’t have an answer, rather than giving a wrong answer. (However, in a followup post I’ll show that there are cases in which my proposal gives an intuitively incorrect answer.)
So here’s my proposal. Suppose that \(\vec A\) is an \(m\)action agent, that is, a “provably mutually exclusive and exhaustive” (p.m.e.e.) sequence of \(m\) closed modal formulas \((A_1,\dotsc,A_m)\), where \(A_i\) is interpreted as “the agent takes action \(i\)”. “P.m.e.e.” means that it’s provable that exactly one of the \(m\) formulas is true. Similarly, \(\vec U\) is an \(n\)outcome universe, i.e., a p.m.e.e. sequence \((U_1,\dotsc,U_n)\) where \(U_j\) means “the \(j\)’thbest outcome obtains”.
We say that, according to this notion of counterfactuals, action \(i\) leads to outcome \(j\) if (i) \(\mathrm{GL}\vdash A_i\to U_j\), and (ii) \(\mathrm{GL}\nvdash \neg A_i\). So for every \(i\), there are three possible cases:
 If there’s exactly one \(j\) such that \(\mathrm{GL}\vdash A_i\to U_j\), then we say that action \(i\) leads to outcome \(j\).
 If \(\mathrm{GL}\vdash\neg A_i\), then we don’t know what would have happened if the agent had taken action \(i\), because we “don’t have enough counterfactuals”: there is no model of PA in which \(A_i\) is true (we can think of the models of PA as the “impossible possible worlds” we use to evaluate the impact of different actions). In particular, this is the case if we have both \(\mathrm{GL}\vdash A_i\to U_j\) and \(\mathrm{GL}\vdash A_i\to U_{j'}\), for \(j\neq j'\), since this implies \(\mathrm{GL}\vdash\neg A_i\) by the assumption that \(U_j\) and \(U_{j'}\) are provably mutually exclusive.
 If there’s no \(j\) such that \(\mathrm{GL}\vdash A_i\to U_j\), then we don’t know what would have happened if the agent had taken action \(i\), because we have “ambiguous counterfactuals”: there are some distinct \(j\) and \(j'\) such that there’s a model of PA in which \(A_i\wedge U_j\), and a different model in which \(A_i\wedge U_{j'}\). (We know that there is a model in which \(A_i\) is true, because otherwise we’d have \(\mathrm{GL}\vdash\neg A_i\), which would imply \(\mathrm{GL}\vdash A_i\to U_j\) for every \(j\).)
Now, for example, if we consider Nate’s example of an agent that has three possible actions, but always takes the third one (i.e., \(\vec A \equiv (A_1,A_2,A_3) \equiv (\bot,\bot,\top)\)), then it’s clear that our thirdperson counterfactuals will not fail silently, but rather give the reasonable answer that it’s hard to say what outcome the agent would have achieved if it had returned a different value: for example, say that \(U_{14} \equiv \top\wedge(\bot\vee\neg\top)\); are some of the \(\top\)’s and \(\bot\)’s in the definition of this universe invocations of the agent? Which ones? We might hope that there’s a notion of thirdparty counterfactuals which can answer questions like this about the real world, but presumably it would need to make more use of the more complicated structure of the real universe; as posed, the question doesn’t seem to have a good answer.
But when we apply this notion to modal UDT, it returns a nonerror answer sufficiently often to allow us to show an at least superficially sensible (if rather trivial!) optimality result.
Let’s say that a pair of \((\vec A,\vec U)\) is “fully informative” if every \(i\) leads to some \(j\) according to our notion of counterfactuals. Then, given a fully informative pair, we can say that \(\vec A\) is optimal (according to our notion of counterfactuals!) iff the outcome that \(\vec A\)’s actual action leads to is optimal among the outcomes achievable by any of the available actions.
Now it’s rather straightforward to see that modal UDT is optimal, in this sense, on a universe \(\vec U\) whenever the pair \((\vec{\mathrm{UDT}}(\vec U),\vec U)\) is fully informative. Recall the way that modal UDT works:
 For every outcome \(j = 1\) to \(n\) (from best to worst):
 For every action \(i = 1\) to \(m\):
 If \(\square(A_i\to U_j)\), then take action \(i\).
 If you’re still here, take some default action.
Clearly, in the fully informative case, this algorithm will take the optimal action (in the sense we use here): Suppose that \(j\) is optimal, and \(i\) leads to \(j\). The search will not find a proof of an implication \(A_{i'}\to U_{j'}\) with \(j' \lt i'\), because then \(j\) wouldn’t be optimal according to our definition; and the search will terminate when considering the pair \((j,i)\) at the latest; so modal UDT will return some action \(i^*\) for which \(\mathrm{GL}\vdash A_{i^*}\to U_j\).
I’d like to say that this covers all the cases in which we would expect modal UDT to be optimal, but unfortunately that’s not quite the case. Suppose that there are two actions, \(A_1\) and \(A_2\), and two outcomes, \(U_1\) and \(U_2\). In this case, it’s consistent that \(i = 1\) leads to \(j = 1\), but we don’t have enough counterfactuals about \(i = 2\), that is, \(\mathrm{GL}\vdash\neg A_2\) (implying that \((\vec A,\vec U)\) isn’t fully informative). This is because modal UDT doesn’t have an explicit “playing chicken” step that would make it take action \(A_2\) if it can prove that it doesn’t take this action. Now, if we did not have \(\mathrm{GL}\vdash A_1\to U_1\), then \(\mathrm{GL}\vdash\neg A_2\) would imply that the agent would take action \(2\) (because \(\neg A_2\) implies \(A_2\to U_1\)), which would lead to a contradiction (the agent takes an action that it provably doesn’t take), so we can rule out that case; but the case of \(\mathrm{GL}\vdash A_1\to U_1\) plus \(\mathrm{GL}\vdash\neg A_2\) is consistent.
So let’s say that a pair \((\vec A,\vec U)\) is “sufficiently informative” if either it’s fully informative or if there is some action \(i\) such that \(\mathrm{GL}\vdash A_i\to U_1\). Then we can say that \(\vec A\) is optimal if either (i) \((\vec A,\vec U)\) is fully informative and \(\vec A\) is optimal in the sense discussed earlier, or (ii) \(\mathbb{N}\vDash U_1\), that is, the agent actually obtains the best possible outcome. With these definitions, we can show that modal UDT is optimal whenever \((\vec{\mathrm{UDT}}(\vec U),\vec U)\) is sufficiently informative.
The reasoning is simple. In the fully informative case, our earlier proof works. In the other case, there’s some \(i\) such that \(\mathrm{GL}\vdash A_i\to U_1\), so the agent’s search is certainly going to stop when it considers \(A_i\to U_1\) at the latest; in other words, it’s going to stop at some \(i^* \le i\) such that \(\mathrm{GL}\vdash A_{i^*}\to U_1\), and the agent is going to output that action \(i^*\); i.e., we’ll have \(\mathbb{N}\vDash A_{i^*}\). But since GL is sound, we also have \(\mathbb{N}\vDash A_{i^*}\to U_1\), and hence \(\mathbb{N}\vDash U_1\), showing optimality in the extended sense.
It’s not surprising that modal UDT is “optimal” in this sense, of course! Nevertheless, as a conceptual tool, it seems useful to have this definition of logical thirdperson counterfactuals, to complement the firstperson notion of modal UDT.
However, my notsosecret agenda for going through this in detail is that in a followup post, I’ll show that there are universes \(\vec U\) such that \((\vec{\mathrm{UDT}}(\vec U),\vec U)\) is fully informative, but UDT still does the intuitively incorrect thing—because the notion of counterfactuals (and hence the notion of optimality) I’ve defined in this post doesn’t agree with intuition as well as we’d like. This failure turns out to be clearer in the context of the thirdperson counterfactuals described in this post than in modal UDT’s firstperson ones.
