 Optimal and Causal Counterfactual Worlds post by Scott Garrabrant 1534 days ago | Sam Eisenstat, Abram Demski, Daniel Dewey, Nate Soares and Patrick LaVictoire like this | 3 comments Let $$L$$ denote the language of Peano arithmetic. A (counterfactual) world $$W$$ is any subset of $$L$$. These worlds need not be consistent. Let $$\mathcal{W}$$ denote the set of all worlds. The actual world $$W_\mathbb{N}\in\mathcal{W}$$ is the world consisting of all sentences that are true about $$\mathbb{N}$$. Consider the function $$C:L\rightarrow\mathcal{W}$$ which sends the sentence $$\phi$$ to the world we get by “correctly” counterfactually assuming $$\phi$$. The function $$C$$ is not formally defined, because we do not yet have a satisfactory theory of logical counterfactuals. Hopefully we all agree that $$\phi\in C(\phi)$$ and $$\phi\in W_\mathbb{N}\Rightarrow C(\phi)=W_\mathbb{N}$$. Given an (infinite) directed acyclic graph $$G$$, and a map $$v$$ from sentences to vertices of $$G$$, we say that $$C$$ is consistent with $$G$$ and $$v$$ if $$C(\phi)=C(\psi)$$ for all $$v(\phi)=v(\psi)$$, and whenever $$W_{\mathbb{N}}$$ and $$C(\phi)$$ disagree on a sentence $$\psi$$ there must exist some causal chain $$\psi_1,\ldots \psi_n$$ such that: $$v(\psi_1)=v(\phi)$$, $$\psi_n=\psi$$, $$W_{\mathbb{N}}$$ and $$C(\phi)$$ disagree on every $$\psi_i$$, and $$v(\psi_i)$$ is a parent of $$v(\psi_{i+1})$$. These conditions give a kind of causal structure such that changes from $$W_{\mathbb{N}}$$ and $$C(\phi)$$ must propagate through the graph $$G$$. Given a function $$f:\mathcal{W}\rightarrow\mathbb{R}$$, we say that $$C$$ optimizes $$f$$ if for all $$\phi\in W$$ and $$W\neq C(\phi)$$ we have $$f(C(\phi))>f(W)$$. Many approaches to logical counterfactuals can be described either as choosing the optimal world (under some function) in which $$\phi$$ is true or observing the causal consequences of setting $$\phi$$ to be true. The purpose of this post is to prove that these frameworks are actually equivalent, and to provide a strategy for possibly showing that no attempt at logical counterfactuals which could be described within either framework could ever be what we mean by “correct” logical counterfactuals. A nontrivial cycle in $$C$$ is a list of sentences $$\phi_1,\ldots,\phi_n$$, such that $$\phi_{i}\in C(\phi_{i+1})$$, $$\phi_n\in C(\phi_1)$$, and the worlds $$C(\phi_i)$$ are not all the same for all $$i$$. Given a partial order $$\succ$$ , we say that $$C$$ optimizes $$\succ$$ if for all $$\phi\in W$$ and $$W\neq C(\phi)$$ we have $$C(\phi)\succ W$$. Our main result is that the following are equivalent: $$C$$ optimizes $$f$$ for some function $$f$$. $$C$$ is consistent with $$G$$ and $$v$$ for some DAG $$G$$ and map $$v$$. $$C$$ has no nontrivial cycles. $$C$$ optimizes $$\succ$$ for some partial order $$\succ$$. Proof: 1 $$\Rightarrow$$ 2: Construct the graph $$G$$ with a vertex for every world in the image of $$C$$. The map $$v$$ sends $$\phi$$ to the vertex associated with $$C(\phi)$$. Insert an edge to $$W_{\mathbb{N}}$$ from every other vertex. Insert an edge from the vertex associated with $$W_1\neq W_{\mathbb{N}}$$ to the vertex associated with $$W_2\neq W_{\mathbb{N}}$$ whenever $$f(W_1)f(C(\phi))$$. Again, this means that you get a length 1 path from $$v(\phi)$$ to $$v(\psi)$$. Therefore $$C$$ is consistent with $$G$$ and $$v$$. 2 $$\Rightarrow$$ 3: Consider a nontrivial cycle $$\phi_1,\ldots,\phi_n$$. If any of these sentences were in $$W_\mathbb{N}$$, then they would all be true in $$W_\mathbb{N}$$, since $$C(\phi)=W_\mathbb{N}$$ whenever $$\phi\in W_\mathbb{N}$$. This would contradict the fact that $$\phi_1,\ldots,\phi_n$$ is a nontrivial cycle. Otherwise, since $$\phi_i\in C(\phi_{i+1})$$, there must be a path from $$v(\phi_{i+1})$$ to $$v(\phi_i)$$ in $$G$$, since $$C(\phi_{i+1})$$ and $$W_\mathbb{N}$$ differ on $$\phi_i$$. Concatenating these paths together would give a cycle in $$G$$ unless the $$v(\phi_i)$$ are all the same vertex. However, if all of the $$v(\phi_i)$$ are the same vertex, then all of the $$C(\phi_i)$$ would be the same world, which would contradict the fact that $$\phi_1,\ldots,\phi_n$$ is a nontrivial cycle. 3 $$\Rightarrow$$ 4: Consider the partial order on the image of $$C$$ constructed by saying that $$W_1\succ W_2$$ if $$W_1=C(\phi)$$ for some $$\phi\in W_2$$, and taking the transitive closure of these rules. If this were not a partial order, it would have to be because we created a cycle of worlds $$W_1,\ldots,W_n$$, such that each $$W_i=C(\phi_i)$$ with $$\phi_i\in W_{i+1}$$ and $$\phi_n\in W_{1}.$$ This is would be a nontrivial cycle. Extend this partial order to all of $$\mathcal{W}$$ by saying that if $$W_1$$ is in the image of $$C$$ and $$W_2$$ is not, then $$W_1\succ W_2$$. Note that this $$C$$ optimizes this partial order by definition. 4 $$\Rightarrow$$ 1: Let $$C$$ optimize the partial order $$\succ$$. Consider the restriction of $$\succ$$ to the image of $$C$$. This is a partial order on a countable set. Order the worlds in this partial order $$W_1,W_2,\ldots.$$ Embed the partial order into $$\mathbb{R}$$ by repeatedly defining $$f(W_n)$$ such that: $$f(W_n)>0$$, $$f(W_n)>f(W_i)$$ for all $$i  by Charlie Steiner 1532 days ago | Nate Soares and Patrick LaVictoire like this | link This is an interesting way of thinking about logical counterfactuals. It all seems to come down to what desiderata you desiderate. We might assign a DAG to \(W_\mathbb{N}$$ by choosing a reference theorem-prover, which uses theorems/syntactic rules to generate more theorems. We then draw an edge to each sentence in $$W_\mathbb{N}$$ from its direct antecedents in it first proof in the reference theorem-prover. One option is that $$C(\phi)$$ would only be allowed to disagree with sentences in $$W_\mathbb{N}$$ for sentences that are descendants of $$\neg \phi$$. But this doesn’t specify your $$G$$, because it doesn’t assign a place in the graph to sentences not in $$W_\mathbb{N}$$. If we try a similar theorem-prover assignment to specify $$C(\phi)$$ with $$\phi$$ false under PA, we’ll get very silly things as soon as the theorem-prover proves $$\perp$$ and explodes; our graph will no longer follow the route analogous to the one for $$W_\mathbb{N}$$. Is there some way to enforce that analogy? Ideally what I think I desiderate is more complicated than only disagreeing on $$W_\mathbb{N}$$-descendants of $$\neg \phi$$ - for example if $$\phi$$ is a counterexample to a universal statement that is not a descendant of any individual cases, I’d like the universal statement to be counterfactually disproved. reply  by Sam Eisenstat 1463 days ago | link Condition 4 in your theorem coincides with Lewis’ account of counterfactuals. Pearl cites Lewis, but he also criticizes him on the ground that the ordering on worlds is too arbitrary. In the language of this post, he is saying that condition 2 arises naturally from the structure of the problem and that condition 4 is derives from the deeper structure corresponding to condition 2. I also noticed that the function $$f$$ and the partial order $$\succ$$ can be read as “time of first divergence from the real world” and “first diverges before”, respectively. This makes the theorem a lot more intuitive. reply  by Alex Appel 380 days ago | link Yeah, when I went back and patched up the framework of this post to be less logical-omniscence-y, I was able to get $$2\to 3\to 4\to 1$$, but 2 is a bit too strong to be proved from 1, because my framing of 2 is just about probability disagreements in general, while 1 requires $$W$$ to assign probability 1 to $$\phi$$. reply

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes