Intelligent Agent Foundations Forumsign up / log in
Further Progress on a Bayesian Version of Logical Uncertainty
post by Alex Appel 17 days ago | Scott Garrabrant likes this | 1 comment

I’d like to credit Daniel Demski for helpful discussion.

So, the Eisenstat prior mentioned earlier seemed like a rather fundamental advance on the state of the art in logical uncertainty. Given that it was essentially a modification of the Demski prior, the obvious next question was “what’s the analogue of the modified Demski prior in this setting?”

As a quick refresher, the Demski prior sampled sentences, while the modified Demski prior sampled turing machines that would spit out a (potentially infinite) list of sentences. The key advance of the modified Demski prior over the Demski prior is that the modified Demski prior enables assigning positive probability to certain computable infinite sequences of sentences. The prototypical example was assigning positive probability to first-order PA with its infinite axiom schema, but this property also permitted the result in the logical induction paper where a turing machine could be asymptotically trusted to only output true things, even if the proof length of the statements grows ever-further beyond our ability to check.

The Demski and modified Demski prior, in their true forms, involve calling a halting oracle to determine whether a contradiction is present when a sentence/infinite sequence of sentences is added to the “sentence pool”.

A rather simple and general (though not very efficient) process for approximating the probability of an event under some prior involves specifying some sort of “prior-generating process”, and then monte-carlo sampling it a whole lot of times and seeing how many runs generated the event. In the case of the Demski and modified Demski prior, the use of a halting oracle means the prior-generating process can’t be emulated. However, since a halting oracle is limit-computable, throwing a whole lot of computing resources at consistency checking yields a better and better ability to emulate what the true version of the Demski/modified Demski priors are doing.

The key property of the Eisenstat prior that makes it \(\varepsilon\)-approximable instead of just limit-computable is that there’s a particular time by which you’re nearly guaranteed to run into either \(\phi\) or \(\neg\phi\). Any sort of “modified Eisenstat prior” should preserve this property.

So, without further ado, I’ll present the algorithm, go into the philosophy of why it is the way it is, handwave at a few nice properties it probably has, and point out future directions.

DEFINITIONS:

\(\mu_{\Phi}\) is a probability measure over sentences, s.t. \(\forall n\in\mathbb{N}:\mu_{\Phi}((\neg)^{2n}\phi)=\mu_{\Phi}((\neg)^{2n+1}\phi)\). In short, you’re as likely to run into \(\phi\) as \(\neg\phi\).

\(\mu_{M}\) is a probability measure over turing machines.

If x is a string of sentences, \(x\phi\) is the string of sentences concantated with \(\phi\).

\(\lambda\) is the empty string.

\(y\sqsubseteq x\) means y is a prefix of x (a prefix in the sense of a substring of sentences, not a prefix of x interpreted as a bitstring).

\(r(t(x))\) is the runtime of turing machine \(t\) on input \(x\).

\(t(x)\) is the bitstring that is the output of turing machine \(t\) on input \(x\).

\(f(|x|)\) is an arbitrary function that bounds the runtime of a turing machine in terms of the length of the input sentence string as measured in bits (so a long sentence would increase \(|x|\) more than a short one would).

In short, we sample a turing machine, check to make sure it obeys our runtime bounds and accepts the past history of sentences as valid and hasn’t been blacklisted yet, and throw it into the pool of “possibly valid constraints”. Then we generate a sentence \(\phi\), test all turing machines in the pool of “possibly valid constraints” against \(x\phi\) and \(x\neg\phi\) and blacklist every TM that times out or rejects both extensions. Then we divide the remaining turing machines into “team A” and “team B”, which favor \(\phi\) and \(\neg\phi\) as the extensions, use the ratio of measures of the teams to add either \(\phi\) or \(\neg\phi\) as a sentence, and blacklist all members of the losing team. Then just iterate.

In pseudocode, this is:

\(T^{-}:=\emptyset\)

\(T:=\emptyset\)

\(x:=\lambda\)

FOR \(n\in\mathbb{N}\)

. . Sample a turing machine \(t\) from \(\mu_{M}\)

. . IF \(t\not\in T^{-}\)

. . . . Failurebool:=0

. . . . . . FOR all \(y\sqsubseteq x\)

. . . . . . . . IF \(r(t(y))>f(|y|)\vee t(y)\neq 1\)

. . . . . . . . . . Failurebool:=1, END

. . . . IF Failurebool=0

. . . . . . \(T:=T\cup{t}\)

. . . . . . ELSE \(T^{-}:=T^{-}\cup{t}\)

. . Sample a sentence \(\phi\) from \(\mu_{\Phi}\)

. . FOR all \(t\in T\)

. . . . IF \(r(t(x\phi))>f(|x\phi|)\vee r(t(x\phi))>f(|x\neg\phi|)\vee(t(x\phi)\neq 1\wedge t(x\phi)\neq 1)\)

. . . . . . \(T:=T/{t}\)

. . . . . . \(T^{-}:=T^{-}\cup{t}\)

. . \(T_{A}:=\emptyset\)

. . \(T_{B}:=\emptyset\)

. . FOR all \(t\in T\)

. . . . IF \(t(x\phi)=1\wedge t(x\neg\phi)\neq 1\)

. . . . . . \(T_{A}:=T_{A}\cup{t}\)

. . . . ELSE IF \(t(x\neg\phi)=1\wedge t(x\phi)\neq 1\)

. . . . . . \(T_{B}:=T_{B}\cup{t}\)

. . FLIP with probability \(\frac{\sum_{t\in T_{A}}\mu_{M}(t)}{\sum_{t\in T_{A}\cup T_{B}}\mu_{M}(t)}\)

. . . . IF heads

. . . . . . \(x:=x\phi\)

. . . . . . \(T:=T/{T_{B}}\)

. . . . ELSE

. . . . . . \(x:=x\neg\phi\)

. . . . . . \(T:=T/{T_{A}}\)

Ok, so why is this the way it is?

PHILOSOPHY:

Well, we want to have a certain time by which we are pretty much guaranteed to either run into \(\phi\) or \(\neg\phi\), as this property is what permits \(\varepsilon\)-approximation.

Consider the standard modified Demski prior, except that we just sample one turing machine and that’s it, and also it’ll magically stop cold as soon as it hits the first sentence that’s propositionally inconsistent with all those that came before. \(\mathbb{P}(\phi)\) is the probability that x will be output by this process. Note that as described, \(\mathbb{P}(\phi)+ \mathbb{P}(\neg\phi)<1\). Even in this really simple case where you don’t have to worry about inconsistency testing or multiple turing machines, there are sufficiently small \(\varepsilon\) where it’s impossible to \(\varepsilon\)-approximate these probabilities within a known-ahead-of-time time bound, because of the (much greater than \(\varepsilon\)) probability of sampling a turing machine that outputs a propositionally consistent list of sentences, and also outputs \(\phi\) ahead of whatever time bound you named.

Thus, we seem to have a fully general problem with trusting in turing machines that enumerate infinite, or even very big, lists of sentences, because we have uncertainty about whether they’ll ever output some \(\phi\). Is this a fundamental obstacle to having a prior that assigns positive probability to some infinite list of sentences?

In my opinion, yes. Also, I don’t think the lack of this property is fatal, and if we look at what “positive probability to an infinite list of sentences” actually buys us, we can get something just about as good.

The “positive probability to an infinite list of sentences” property seems to cash out in practice as “if we verify a bunch of sentences spat out by a particular turing machine, we should boost our probability assigned to the rest of them”. It is what allows inductively learning that something in the environment has superior computing powers, and can be trusted.

Imagine the concrete case where you see a sentence get spat out by a trusted turing machine. You go from uncertainty in it to near-certainty that it’s true. You’ve just updated on the logical fact \(t(n)=\phi\) and it boosted your probability in \(\phi\). But you don’t know in advance what the turing machine will output on larger inputs of n. So it sure seems like it’s possible to believe that a turing machine outputs true things, but the prior assigns 0 probability to the whole infinite list of sentences because you don’t know in advance what it will output, so you’d have to update on infinitely many things in order to assign positive probability to the whole conjunction, ie, the sequence of observations of the form “turing machine outputted \(\phi\)”.

You don’t actually need positive probability to infinite conjunctions to have abstract trust in the output of things with long runtime! Or at least it sure seems that way from our thought experiment.

Ok, but what sort of thing could the constraint “trust in turing machine t” correspond to? Well, maybe, in addition to a propositional consistency checker, we also have another checker that throws out all strings of theorems that contain both \(t(n)=\phi\) and \(\neg\phi\). This wouldn’t take much runtime at all to implement.

Wait, put that way, propositional consistency and trust in a turing machine both seem like the same sort of thing! The original Eisenstat prior was just the one where propositional consistency was the only constraint.

Also, if we’re using the “turing machine outputs a list of sentences” thing, we may be waiting arbitrarily long, but if we instead consider turing machines that take a string of sentences and accept or reject them, and operate within a known time (both constraints we came up with are of this form), we’re back to being able to generate strings of sentences and have a known time by which we are nearly guaranteed to hit \(\phi\) or \(\neg\phi\)! So \(\varepsilon\)-approximation is preserved if all your (finite) set of constraints have a known runtime bound.

We want to learn all constraints. So, the obvious next step is having a pool of constraints/turing machines that is added to by random sampling, and turing machines are kicked out of it for running over time, rejecting the past history of the string of sentences (this condition suggested by Daniel Demski), or rejecting both possible extensions of a string (this is to nearly guarantee hitting \(\phi\) or \(\neg\phi\) after known time).

And, by doing Bayesian updates on sentences that are more likely to be present when certain constraints are present, it’s possible to inductively learn which constraints apply to the true environment!

There is a bit of weirdness here, because in full generality, these turing machines can care about order, unlike the P.C. and turing-machine-trust constraints. Also, since the list of constraints gets added to over time, this prior expects the future to be more ordered/obey more constraints. Also, since a “pseudoconstraint” that only times out or rejects both extensions in very rare circumstances can hang around for a long time before being blacklisted, this prior has parts that have a certain constraint apply for a long time, and then it lifts.

Still to do: What happens when constraints disagree? (Daniel Demski pointed out that my initial stab at this was wrong, and I generated this next stretch as a fix)

Well, the intended use case of this is to update on a list of theorems. Image that you know that either \(\phi\) or \(\neg\phi\) is the next theorem coming in, and the constraints disagree. You can split them into three teams. Team A says that \(x\phi\) is the proper extension, Team B says that \(x\neg\phi\) is the proper extension, and Team C accepts both as valid, and we can just ignore them.

Although it steps outside the intended use case, if you imagine generating constraints way faster than the theorems come to you, it’s generically expected that your constraints will disagree on the upcoming thing because you used enough search power to find some constraints that got the entire past history right, but disagree on the upcoming sentence. If you decide between the teams by flipping a coin, then you’d get half of all problems wrong, forever. So, deciding by the probability mass of the turing machines was used instead.

If we imagine updating on the true list of theorems, the true constraints that reality follows will stick around forever. Call that team A. And define team \(B_{n}\) as the turing machines that disagree with team A on turn n, and team \(B\) as the union of all of them. Each turn, everyone in team \(B_{n}\) dies. So the total probability mass that team \(B\) can muster limits to 0 as time goes on and more and more theorems get seen, while team A’s probability mass remains constant. Thus, in the limit, the constraints on team A will be trusted and used to generate extensions that follow the “rules of reality” for increasingly long times.

HANDWAVING:

It should be possible to \(\varepsilon\)-approximate the probability of a sentence in knowable time.

It should be possible to implement Bayesian updates on either sentences or constraints by the same path mentioned in the original post.

In the limit of updating on theorems, all “constraints” on sets of sentences that are computable in [insert time bound here] time that apply to reality should be learned by this prior.

I’m pretty dang sure you don’t need the full strength of propositional consistency to get a coherent distribution in the limit of updating on evidence. Every propositionally inconsistent world has a disproof of the form \(a\wedge b\wedge c…\to\neg\phi\) when \(\phi\) is true, so a sufficient (though maybe not minimal) constraint that imposes a coherent distribution would be “the string cannot contain \(a\wedge b\wedge c…\to\neg\phi\), and \(a\), and \(b\)… and \(\phi\) and if it does, reject it” because that should take out every propositionally inconsistent world in the limit.

Note that the above permits logical uncertainty over very complicated Booleans that would take a long time to evaluate, like hard 3SAT instances. Full propositional consistency renders logical uncertainty over those sorts of things impossible, while it very plainly exists in reality.

NEXT STEPS:

Find a hole in this thing if one is present.

How can this sort of framework be made to interact sensibly with sequence prediction? Does it already interact sensibly with sequence prediction, or do important modifications need to be made?

Actually, in general, sequence prediction and “what is true?” prediction seem quite hard to marry in this framework, because a near-guarantee that you’ll hit either \(\phi\) or \(\neg\phi\) after a known amount of time doesn’t interact well with “I believe that neither \(\phi\) or \(\neg\phi\) will show up in my observed list of theorems, unless an astronomical amount of time has gone by”. Some sort of resolution must be found.

What should it be named if it does turn out to be important? I was considering the “Eisenstat-Appel Prior”, or EA Prior for short, but apparently it’s bad taste to name things after yourself and someone probably has a better idea.



by Alex Appel 15 days ago | link

Intermediate update:

The handwavy argument about how you’d get propositional inconsistency in the limit of imposing the constraint of “the string cannot contain \(a\wedge b\wedge c...\to\neg\phi\) and \(a\) and \(b\) and… and \(\phi\)

is less clear than I thought. The problem is that, while the prior may learn that that constraint applies as it updates on more sentences, that particular constraint can get you into situations where adding either \(\phi\) or \(\neg\phi\) leads to a violation of the constraint.

So, running the prior far enough forward leads to the probability distribution being nearly certain that, while that particular constraint applied in the past, it will stop applying at some point in the future by vetoing both possible extensions of a string of sentences, and then less-constrained conditions will apply from that point forward.

On one hand, if you don’t have the computational resources to enforce full propositional consistency, it’s expected that most of the worlds you generate will be propositionally inconsistent, and midway through generating them you’ll realize that some of them are indeed propositionally inconsistent.

On the other hand, we want to be able to believe that constraints capable of painting themselves into a corner will apply to reality forevermore.

I’ll think about this a bit more. One possible line of attack is having \(\mathbb{P}(\phi)\) and \(\mathbb{P}(\neg\phi)\) not add up to one, because it’s possible that the sentence generating process will just stop cold before one of the two shows up, and renormalizing them to 1. But I’d have to check if it’s still possible to \(\varepsilon\)-approximate the distribution if we introduce this renormalization, and to be honest, I wouldn’t be surprised if there was a more elegant way around this.

EDIT: yes it’s still possible to \(\varepsilon\)-approximate the distribution in known time if you have \(\mathbb{P}(\phi)\) refer to \(\frac{probability to encounter \phi first}{1-probability to halt first}\), although the bounds are really loose. This is because if most of the execution paths involve halting before the sentence is sampled, \(\varepsilon\)-error in the probability of sampling \(\phi\) first will get blown up by the small denominator.

Will type up the proof later, but it basically proceeds by looking at the probability mass associated with “sample the trivial constraint that accepts everything, and sample it again on each successive round”, because this slice of probability mass has a near-guarantee of hitting \(\phi\), and then showing that even this tiny slice has substantially more probability mass than the cumulative probability of ever sampling a really rare sentence or not hitting any of \(\phi\), \(\neg\phi\), or the string terminating.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

RSS

Privacy & Terms