Intelligent Agent Foundations Forumsign up / log in
Waterfall Truth Predicates
post by Abram Demski 951 days ago | Benja Fallenstein, Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 2 comments

The waterfall-based approaches to the Löbian obstacle offer a way around finitely-terminating sequences of trust which we get by adding towers of soundness schemas to \(PA\). This creates a kind of illusion of self-trust by way of a non-well-founded chain of trust.

Another familiar situation where we are normally faced with the ability to construct arbitrarily high towers but not a single self-referential system is that of truth predicates. Tarski’s undefinability theorem blocks the existence of a full truth predicate within the same language as the one which it describes. Perhaps a similar waterfall construction can be applied, to get an infinite descending chain of languages.

Extend the language of \(PA\) with a family of truth predicates \(Tr_n\). A Tarski-style approach would assert a T-schema \(Tr_n(\ulcorner \phi \urcorner) \leftrightarrow \phi\) for \(\phi\) which contain truth predicates indexed strictly lower than \(n\). (\(\ulcorner \phi \urcorner\) is the Gödel number of \(\phi\).) Here, we wish to flip this, and assert a T-schema which allows strictly higher \(n\).

This brings to mind Yablo’s Paradox. A contradiction can likely be worked out in a way resembling that, but instead I’ll note that this theory implies the naive soundness waterfall in which we construct a sequence of theories \(T_{n} = T_{n+1} + Sound(T_{n+1})\). This is because we can use the truth predicate \(Tr_n\) to carry out a proof of soundness for the axioms and inference rules, with the exception of instances of the T-schema involving \(Tr_{m \leq n}\). This gives us the naive soundness waterfall, which we know to be inconsistent. (Note that I have not checked this in detail, however.)

My idea for fixing the T-schema, then, is to introduce the same \(\psi(n)\) predicate which asserts that \(n\) is not the Gödel number of a proof of contradiction in \(ZFC\). We make the new schema:

  • \(\psi(n) \rightarrow \big[ Tr_n(\ulcorner \phi \urcorner) \leftrightarrow \phi \big]\), where \(\phi\) contains only \(Tr_{m>n}\).

Because we can prove any particular \(\psi(n)\), we can still apply the schema in specific cases. It seems likely that we can still carry out soundness arguments, as well, constructing the consistent version of the soundness waterfall. If so, the theory ends up being unsound as a result.

Here’s a proof that the theory is unsound. Consider again Yablo’s paradox. We construct an infinite sequence of statements, \(A_0, A_1, ...\) each of which assert that the subsequent statements in the sequence are all false. Specifically: \[A_n \leftrightarrow \forall_{m>n}: \neg Tr_n(\ulcorner A_m \urcorner)\]

Considering any particular \(A_n\), we see that it implies \(\neg Tr_n(\ulcorner A_{n+1} \urcorner)\), and also \(\forall_{m>n+1}: \neg Tr_n(\ulcorner A_m \urcorner)\). By an application of the T-schema, however, these two statements are just the negation of each other. Therefore, the theory proves \(\neg A_n\). The choice of \(n\) was generic, so we can see that the system eventually proves every sentence in the sequence false. From outside, we can see that this implies that each of them is true, however.

If the system proposed here can also carry out that reasoning, then it will be inconsistent.

Even if it is consistent, it’s still unsound, so it’s unlikely to be very useful. It would be interesting if truth-predicate versions of the other solutions to the Löbian obstacle could be constructed. (My intuition is that this won’t be possible for the consistency waterfall, but is likely possible for model polymorphism.)

by Benja Fallenstein 950 days ago | Patrick LaVictoire likes this | link

I would suggest changing this system by defining \(\psi(n)\) to mean that no \(m\le n\) is the Gödel number of a proof of an inconsistency in ZFC (instead of just asserting that \(n\) isn’t). The purpose of this is to make it so that if ZFC were inconsistent, then we only end up talking about a finite number of levels of truth predicate. More specifically, I’d define \(T_n\) to be PA plus the axiom schema

\[\forall m\ge n.\;\psi(m)\to\forall x.\;\mathrm{Tr}_m(\ulcorner\varphi(\overline x)\urcorner)\leftrightarrow\varphi(x).\]

Then, it seems that Jacob Hilton’s proof that the waterfalls are consistent goes through for this waterfall:

Work in ZFC and assume that ZFC is inconsistent. Let \(n\) be the lowest Gödel number of a proof of an inconsistency. Let \(M\) be the following model of our language: Start with the standard model of PA; it remains to give interpretations of the truth predicates. If \(m\ge n\), then \(\mathrm{Tr}_m(k)\) is false for all \(k\). If \(m<n\), then \(\mathrm{Tr}_m(k)\) is true iff \(k\) is the Gödel number of a true formula involving only \(\mathrm{Tr}_{m'}\) for \(m'>m\). Then, it’s clear that \(T_0\), and hence all \(T_m\) (since \(T_0\) is the strongest of the systems) is sound on \(M\), and therefore consistent.

Thus, we have proven in ZFC that if ZFC is inconsistent, then \(T_0\) is consistent; or equivalently, that if \(T_0\) is inconsistent, then ZFC is consistent. Stepping out of ZFC, we can see that if \(T_0\) is inconsistent, then ZFC proves this, and therefore in this case ZFC proves its own consistency, implying that it is inconsistent. Hence, if ZFC is consistent, then so is \(T_0\).

(Moreover, we can formalize this reasoning in ZFC. Hence, we can prove in ZFC (i) that if ZFC is inconsistent, then \(T_0\) is consistent, and (ii) that if ZFC is consistent, then \(T_0\) is consistent. By the law of the excluded middle, ZFC proves that \(T_0\) is consistent.)


by Benja Fallenstein 950 days ago | link

We should be more careful, though, about what we mean by saying that \(\varphi(x)\) only depends on \(\mathrm{Tr}_{m}\) for \(m>n\), though, since this cannot be a purely syntactic criterion if we allow quantification over the subscript (as I did here). I’m pretty sure that something can be worked out, but I’ll leave it for the moment.






[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes


Privacy & Terms