Intelligent Agent Foundations Forumsign up / log in
Logical Inductors that trust their limits
post by Scott Garrabrant 1035 days ago | Jack Gallagher, Jessica Taylor and Patrick LaVictoire like this | 2 comments

Here is another open question related to Logical Inductors. I have not thought about it very long, so it might be easy.

Does there exist a logical inductor \(\{\mathbb P_n\}\) over PA such that for all \(\phi\):

  1. PA proves that \(\mathbb P_\infty(\phi)\) exists and is in \([0,1]\), and

  2. \(\mathbb{E}_n(\mathbb{P}_\infty(\phi))\eqsim_n\mathbb{P}_n(\phi)\)?


Note that \(\mathbb{P}_\infty(\phi)\) need not be computable so this does not happen by default.

For example, consider the logical inductor described in the paper with the extra condition that if ever the deductive state is discovered to be inconsistent, all probabilities are set to 0 forever. This clause will never be executed, since PA is consistent, but since this clause exists, PA can prove that \(\mathbb P_\infty(\phi)\) exists. (PA proves that if PA is consistent, the limit exists using the proof in the paper, and if PA is inconsistent, the limit exists and equals 0.)

However, this logical inductor will not satisfy the second property. Consider the sentence \(\top\). \(\mathbb{P}_n(\top)\) will converge to 1, while \(\mathbb{E}_n(\mathbb{P}_\infty(\top))\) will converge to the probability according to \(\mathbb{P}_\infty\) that \(PA\) is consistent. (PA proves that if PA is consistent, the limit is 1 using the proof in the paper, and that if PA is inconsistent, the limit is 0.)

If a Logical Inductor with the above property is found, there are many follow up questions you could ask.

Can you make an analogue of the self trust property that works for \(\mathbb{P}_\infty\)? Does the above property imply that self trust property?

Is there some simple extra condition that could be added to the definition of a Logical Inductor, that implies everything we could want about beliefs about \(\mathbb{P}_\infty\)?


A good place to start with this question might be to analyze a logical inductor that in the spirit of the Demski prior adds sentences to the deductive state only if they are propositionally consistent with all previous sentences. This way, PA will prove that the algorithm defines a logical inductor over some consistent propositional theory (even if it does not know if that theory is PA).



by Devi Borg 1026 days ago | Abram Demski and Scott Garrabrant like this | link

I still feel like I don’t understand most things in the paper, but I think this construction should work:

Say we start with some deductive sequence \(\{D_n\}\) that is PA-complete. Now redefine the computation of this deductive sequence to include the condition that if ever at some stage \(n\) we find that PA is inconsistent (i.e. that \(D_n\) is propositionally inconsistent) we continue by computing \(\{D_m:m\geq n\}\) according to \(D_m=D_{m-1}\cup\{\phi_m\}\) where \(\phi_m\) is the lexicographically-first sentence propositionally consistent with \(D_{m-1}\).

Since PA is consistent this deductive process is extensionally identical to the one we had before, but now PA proves that it’s a deductive process: assuming consistency it’s a PA-complete deductive process, and assuming inconsistency it’s over some very arbitrary, but complete and computable, propositional theory. We can now carry out the construction in the paper to get the necessary logical inductor over PA which we denote by \(\{\mathbb{P}_n\}\). Here PA proves the convergence of the logical inductor as in the paper.

Fix \(\phi\) and \(\epsilon>0\), and write \(X_n:=\mathbb{P}_n(\phi)\) and \(X:=\mathbb{P}_\infty(\phi)\). We want to say that eventually \(|X_n-\mathbb{E}_n{X}|<\epsilon\), but by self-trust and linearity it’s enough to show that eventually \(\mathbb{E}_n(|X_n-X|)<\epsilon\). This would follow from eventually having \(\mathbb{P}_n(|X_n-X|\geq\epsilon)<\epsilon\) (as PA proves that \(X_n,X\in[0,1]\) which gives that \(|X_n-X|\leq 1\))

[edit2: this paragraph is actually false as written, I have a modified proof that works for PA+¬Con(PA), but it suggests that this construction is broken] From the proof that \(X_n\to X\) we have that there is some \(A\in[\mathbb{N}]^{<\omega}\) such that PA proves that \(|X_n-X|<\epsilon\) for \(n\notin A\). Define \(\psi_n\) to be \(|X_n-X|<\epsilon\). This is an efficiently computable sequence of sentences of which all but finitely many are theorems, so we get that \(\mathbb{P}_n(\neg\psi_n)\to 0\) (consider starting the inductor at some later point after all the non-theorems have passed) which means that there is some \(N\) where \(n>N\) implies \(\mathbb{P}_n(\neg\psi_n)<\epsilon\) which proves the result. [edit: some polishing to this paragraph]

Note, this ignores some subtleties with the fact that \(X\) could be irrational and therefore not a bona fide Logically Uncertain Variable. These concerns seem beside the point as we can get the necessary behavior by formalizing Logically Uncertain Variables slightly differently in terms of Dedekind cuts or perhaps more appropriate CDFs: Cumulative Distribution Formulae.

reply

by Devi Borg 1026 days ago | Abram Demski and Scott Garrabrant like this | link

I initially played around with the idea of freezing the final probability assignments when you find a contradiction, but couldn’t get it to work. The difficulty was that if you assumed the inconsistency of PA you got that the limit was simply the assignment at some future time, but you couldn’t defer to this because the function was not computable according to PA (by the consistency results in the paper). Something along these lines might still work.

As I worked on this approach I noticed a statement that seems both true and interesting which I delayed proving because I wasn’t sure how to get everything else to work. Let \(\chi:=\mathsf{Con(PA)}\) then it seems to me that we should have for any deferral function \(f(n)\)

\[\mathbb{E}_n(\mathbb{P}_{f(n)}(\phi)|\,\chi) \eqsim_n \mathbb{P}_n(\phi)\]

The reason why I think it’s true is that conditioning on \(\chi\) which is true should only improve our estimate of \(\mathbb{P}_{f(n)}(\phi)\), but if this estimate is actually improved it will be exploited by some poly-time trader. It would be very interesting if this weren’t the case since then conditioning on a true statement might actually be bad for us, but if it is true we automatically get the same for \(\neg\chi\) instead by some simple algebra (we could then say that infinitary consistency and inconsistency are irrelevant for determining probabilities at any computable times).

Edit, Proof: Let \(\{\mathbb{P}_n\}\) be constructed as in the paper. Then \(\mathsf{PA}+\mathsf{Con(PA)}\) proves that \(\{\mathbb{P}_n\}\) and \(\{\mathbb{P}_{f(n)}\}\) are logical inductors. We get that

\[\mathbb{E}_n(\mathbb{P}_{f(n)}(\phi)|\,\chi)\eqsim_n \mathbb{E}_n(\mathbb{P}_\infty(\phi)|\,\chi)\eqsim_n \mathbb{E}_n(\mathbb{P}_n(\phi)|\,\chi)\]

Here the final expression can be expanded out as

\[\sum_{k=1}^n\frac{1}{n}\mathbb{P}_n(\mathbb{P}_n(\phi)>k/n\land\chi)/\mathbb{P}_n(\chi)\]

Fix \(\epsilon>0\) we can pick \(n>1/\epsilon\) large enough that \(\mathbb{P}_n\) believes that \(\mathbb{P}_n(\phi)\) is within an interval of size \(\epsilon\) with probability at least \(1-\epsilon\). Then \(\mathbb{P}_n\) believes that \(\mathbb{P}_n(\phi)>k/n\land\chi\) is equivalent to \(\bot\) or \(\chi\) (depending on whether \(k\) is above or below some threshold) with probability at least \(1-\epsilon\). In both of these cases we get that \(\mathbb{P}_n(\mathbb{P}_n(\phi)>k/n\land\chi)=\mathbb{P}_n(\mathbb{P}_n(\phi)>k/n)\mathbb{P}_n(\chi)\) (or perhaps some extra epsilons here)

Now we can cancel and get that the expression is eventually within \(C\epsilon\) (for some \(C<10\)) of

\[\sum_k \frac{1}{n}\mathbb{P}_n(\mathbb{P}_n(\phi)>k/n)=\mathbb{E}_n(\mathbb{P}_n(\phi))\eqsim_n\mathbb{P}_n(\phi)\]

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms