Update: This version of the Trolljecture fails too; see the counterexample due to Sam.
In An Informal Conjecture on Proof Length and Logical Counterfactuals, Scott discussed a “trolljecture” from a MIRI workshop, which attempted to justify (some) logical counterfactuals based on the lengths of proofs of various implications. Then Sam produced a counterexample, and Benja pointed to another counterexample.
But at the most recent MIRI workshop, I talked with Jonathan Lee and Holger Dell about a related way of evaluating logical counterfactuals, and we came away with a revived trolljecture!
Let’s say we have a recursively enumerable set of axioms \(S\) (in the language of PA) and a statement \(\varphi\). Then we define the “distance” \(d_S(\varphi)\) as the length of the shortest proof of \(\varphi\) using the axioms \(S\). (We set \(d_S(\varphi)=\infty\) if there is no such proof.)
And now we define a valuation \[\nu_S(\varphi)=\frac{d_S(\neg\varphi)}{d_S(\neg\varphi)+d_S(\varphi)}\] which we will interpret as the “probability” of the logical counterfactual that \(S\) implies \(\varphi\). (This is undefined when \(\varphi\) is undecidable given \(S\); we could define it as a limit when searching over proofs of bounded length, in which case it would equal 0.5 in such cases. But as we shall see, this is not the case we care about.)
This valuation has the trivial properties that \(\nu_S(\varphi)=1\) if \(S\vdash\varphi\) and \(S\not\vdash\neg\varphi\), and similarly \(\nu_S(\varphi)=0\) if \(S\not\vdash\varphi\) and \(S\vdash\neg\varphi\). But what’s more interesting are the properties that this valuation has if \(S\) is inconsistent.
If there is a short proof of \(\varphi\) from \(S\), and a much longer proof of \(\neg\varphi\) from \(S\), then \(\nu_S(\varphi)\approx 1\). Moreover, we have an approximate coherence condition \[\nu_S(\varphi\wedge\psi) + \nu_S(\varphi\wedge\neg\psi) \approx \nu_S(\varphi)\] so long as either \(\varphi\) or \(\psi\) has a proof from \(S\) that is much shorter than the proof of \(\bot\) from \(S\). (See the notes below.)
And this gets some key examples “correct” if we add the axioms that we actually care about (and which make the problem truly counterfactual). For instance, let’s take Sam’s counterexample to the original Trolljecture:
def U():
if A() = 1:
if PA is consistent:
return 10
else:
return 0
else: return 5
def A():
if PA ⊢ A() = 1 → U() = 10:
return 1
else: return 2
In that case, \(A()\) actually returns 2 and \(U()\) returns 5, but PA proves the “incorrect” counterfactual that \(A()=1\to U()=0\) and does not prove the “correct” counterfactual that \(A()=1\to U()=10\). (Read the linked post if this puzzles you.)
But what if we consider reasoning from the inconsistent premises that PA is consistent and yet \(A()=10\)? In that case, I claim that \(\nu_{\textsf{PA+Con(PA)}+(A()=1)}(U()=10)\) is near 1 while \(\nu_{\textsf{PA+Con(PA)}+(A()=1)}(U()=0)\) is near 0.
To see this, note that there is a short proof from these axioms that \(U()=10\) (look at the definition of \(U\), apply the axiom that \(A()=1\), apply the axiom \(\textsf{Con(PA)}\), and conclude that \(U()=10\)), but the shortest proof from those axioms that \(U()=0\) is longer: we would need to prove Godel’s Second Incompleteness Theorem to show that \(\textsf{Con(PA)}\to A()=2\), thus \(\neg \textsf{Con(PA)}\), and thus \(U()=0\).
(Of course, I could be missing a much shorter proof, but I think this particular example goes through. The same reasoning also holds for the example Benja mentioned.)
Therefore, I will go out on a limb and propose the following:
Revived Trolljecture: If we consider a modal agent \(A()\) inside a modal universe \(U()\), then for sufficiently large \(N\), the “correct” logical counterfactuals of the form “if \(A()=a\), then \(U()=u\)” are the ones such that the valuations \(\nu_{\textsf{PA+N}+(A()=a)}(U()=u)\) are high. In particular, if for \(N\) sufficiently large, there is a short proof in \(\textsf{PA+N}\) of \(A()=a\to U()=u\) and no comparably short proof of \(A()\neq a\), then \(u\) is unique with this property, and this is the “correct” counterfactual utility for \(A()=a\).
I’m not very confident that this is true as stated, but I have some reasonable hope that either a counterexample will be enlightening about logical counterfactuals, or that an improved analogue of the valuation will in fact be quite useful.
Notes on this valuation:
This is a claim about “third person” counterfactuals, not those done by an agent itself. Obviously, an agent that makes decisions based on this valuation will not be definable in modal logic, and I don’t yet make any claims about what such an agent might do in the PA versions of such problems.
The approximate coherence is of the following form: if the shortest proof of contradiction from \(S\) has length \(N\), and one of the statements \(\varphi\), \(\neg\varphi\), \(\psi\), \(\neg\psi\) has a proof from \(S\) of length \(n\ll N\), then \[\nu_S(\varphi\wedge\psi) + \nu_S(\varphi\wedge\neg\psi) = \nu_S(\varphi) + O(n/N).\] I don’t know a cleverer proof than going through the cases, but each case is straightforward. For instance, if \(d_S(\varphi)=n\), then we must have \(Nn\leq d_S(\neg\varphi) \leq N+1\), and so it follows quickly that \(\nu_S(\varphi) = 1 + O(n/N)\); then, considering the three cases \(d_S(\psi)\lesssim n\), \(d_S(\neg\psi)\lesssim n\), and neither, in each case it is clear that \(\nu_S(\varphi\wedge\psi) = \nu_S(\psi) + O(n/N)\).
It does not, however, seem to be the case that \(\nu_S(\varphi\wedge\psi) + \nu_S(\varphi\wedge\neg\psi) \approx \nu_S(\varphi)\) whenever there is an easily provable relation between \(\varphi\) and \(\psi\). I am not entirely sure that I’m not missing a necessary logical relation, but it seems like we could consider two sentences \(\varphi\) and \(\theta\) whose proofs are “independent” of the contradiction in \(S\) and of each other, and then let \(\psi=\varphi\wedge\theta\) so that \(\psi\to\varphi\) is quickly provable from \(S\). Say that \(d_S(\bot)=N\), \(d_S(\varphi)=N/2\), \(d_S(\theta)=N/3\); if these are independent in that way, we should be able to get \(d_S(\neg\varphi)\approx N\), \(d_S(\neg\theta)\approx N\), \(d_S(\psi)\approx d_S(\varphi) + d_S(\theta)\approx 5N/6\), and \(d_S(\neg\psi)\approx N\). Then \(\nu_S(\varphi)\approx 2/3\), \(\nu_S(\varphi\wedge\psi)\approx \nu_S(\psi) \approx 6/11\), and \(\nu_S(\varphi\wedge\neg\psi)\) will be at least 1/4 since \(d_S(\varphi\wedge\neg\psi)\approx N\) and \(d_S(\neg\varphi\vee\psi)\geq N/3\), so \(\nu_S(\varphi\wedge\psi) + \nu_S(\varphi\wedge\neg\psi)  \nu_S(\varphi) > 6/11+1/42/3>0\). I don’t know whether this could be used to break the revived trolljecture.
If the shortest proof of contradiction from \(S\) has length \(N\), then for any \(\varphi\), \(\frac{1}{N+2}\leq \nu_S(\varphi) \leq \frac{N+1}{N+2}\); the valuations are bounded away from 0 and 1. This may be a feature and not a bug: a mathematician could reason decently confidently about what math would look like in the counterfactual where Fermat’s Last Theorem were false (presuming that its actual proof is quite long), but it is hard to say very much about what math would look like in the counterfactual where \(1+1=3\). However, there are some examples, like naive set theory before Russell’s Paradox, where human mathematicians reasoned mostly coherently despite using axioms that (they did not notice) led quickly to contradictions; the valuation does not capture that aspect of mathematical intuition.
I restricted the new trolljecture to modal universes because it is wellspecified how the stronger axiom schemas \(PA+N\) eventually decide the agent and the universe in the standard model of the natural numbers. If something like it holds up here, then there may be a more general version.
