by Patrick LaVictoire 608 days ago | link | parent Sam Eisenstat produced the following counterexample to a stronger version of the revived trolljecture (one using soundness rather than consistency in the escalated theories): Take a statement $$X$$ in the language of PA such that neither $$X$$ nor $$\neg X$$ has a short proof (less than length $$L$$) in PA+Sound(PA); $$X$$ can be provably true, provably false, or undecidable in that system. Define $$U$$ to equal 10 if $$X\wedge A()=a$$, 0 if $$\neg X\wedge A()=a$$, and 5 if $$A()\neq a$$. Define $$A$$ to equal $$a$$ if there is a short proof in PA (less than length $$L$$) that $$A()=a\to U()=10$$, and $$b$$ otherwise. Clearly $$A()=b$$ and $$U()=5$$. Now I claim that in the formal system PA+Sound(PA)+$$(A()=a)$$, there is a very short proof of $$U()=10$$. And this is the case whether $$X$$ is true, false, or undecidable! In order to prove $$U()=0$$, take the axiom $$A()=a$$. It follows from the definition of $$A$$ that there is a short proof in PA (less than length $$L$$) that $$A()=a\to U()=10$$. And it follows from the soundness of PA that in fact $$A()=a\to U()=10$$, and thus $$U()=10$$. On the other hand, we cannot quickly prove $$U()=0$$, for if we could, then we could prove $$\neg X$$. Things get tricky because we’re using the extra (inconsistent) axiom $$A()=a$$, but intuitively, if $$X$$ is some arithmetical assertion that has nothing to do with the decision problem, then there shouldn’t be any short proof of $$\neg X$$ in this system either, and so the shortest proof of $$U()=0$$ should be at least comparable to length $$L$$.

### NEW DISCUSSION POSTS

The AI defers to anything
 by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

Thus anything that can
 by Stuart Armstrong on Corrigibility thoughts II: the robot operator | 0 likes

Ah, thanks! That seems more
 by Stuart Armstrong on Loebian cooperation in the tiling agents problem | 0 likes

It doesn't mean computation
 by Vladimir Slepnev on Loebian cooperation in the tiling agents problem | 1 like

I'm not sure this would work,
 by Stuart Armstrong on Loebian cooperation in the tiling agents problem | 0 likes

>How can the short term
 by Stuart Armstrong on Humans are not agents: short vs long term | 0 likes

I expect a workable approach
 by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

 by Stuart Armstrong on Corrigibility thoughts II: the robot operator | 0 likes

It is ‘a preference for
 by Stuart Armstrong on Humans are not agents: short vs long term | 0 likes

Note that we don't need to
 by Paul Christiano on ALBA requires incremental design of good long-term... | 0 likes

If I want my boat to travel
 by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

I don't think it's much like
 by Abram Demski on An Approach to Logically Updateless Decisions | 0 likes

Yeah, I like tail dependence.
 by Sam Eisenstat on An Approach to Logically Updateless Decisions | 0 likes

This is basically the
 by Paul Christiano on Cooperative Oracles: Stratified Pareto Optima and ... | 1 like

I think AsDT has a limited
 by Abram Demski on An Approach to Logically Updateless Decisions | 2 likes