Intelligent Agent Foundations Forumsign up / log in
Asymptotic Logical Uncertainty: A Modification to the Demski Prior
post by Scott Garrabrant 1249 days ago | Abram Demski, Jessica Taylor and Patrick LaVictoire like this | discuss

Part of the Asymptotic Logical Uncertainty series. Abram Demski proposed a computably approximable coherent probability assignment in 2012. In this post, I will present a modification developed last year by Benja Fallenstein, Abram Demski, and myself, which samples Turing machines rather than logical sentences. I will further give a concrete computable approximation of it which is a logical predictor.

Fix a UTM \(U\), which has a read only input tape with an infinite string of 0s and 1s, and a one way write only output tape which outputs a possibly infinite string of 0s and 1s and #s. We build a coherent probability distribution as follows.

Start with an empty set \(T\) of logical sentences. (Or a set containing some logical axioms you wish to start with.) One at a time sample a random infinite string \(w\) of 0s and 1s and use it as input for \(U\). Interpret the output \(U(w)\) as a finite or infinite string of logical sentences separated by # symbols. If \(T\) is consistent with all the sentences in \(U(w)\) simultaneously, then modify \(T\) by adding in all the sentences in \(U(w)\). Resample and repeat infinitely many times.

The probability assigned to each logical sentence is the probability that it is eventually added to \(T\) in the above process. Let \(\mathbb{P}(\phi)\) denote the probability assigned to \(\phi\).

The step where we kept or threw out \(U(w)\) based on whether or not it was consistent with \(T\) required a halting oracle. However, we can approximate the above process. The following procedure \(M\) takes in an time \(t\) and outputs a list of sentences. It has the property the limit as \(t\) goes to infinity of the probability that \(M(t)\) outputs \(\phi\) converges to \(\mathbb{P}(\phi)\).

On input \(t\), sample \(2^t\) binary words \(w_1,\ldots,w_{2^t}\). Run \(U\) for \(2^t\) time steps on each word to get \(2^t\) finite lists of sentences, \(\ell_1,\ldots,\ell_t\). Let \(K\) be a proof checker which takes a list of sentences and searches for contradiction in those sentences. Let \(T\) start as an empty set of sentences. For \(i=1,\ldots,2^t\) run \(K\) for \(2\uparrow\uparrow t\) time steps on \(T\cup\ell_i\). If it does not find a contradiction, modify \(T\) by adding in all the sentences in \(\ell_i\).

We make three assumptions about \(K\). First, we assume that for any fixed list of sentences, \(K\) eventually finds a contradiction if and only if a contradiction exists. Second, we assume that for sufficiently large \(t\), \(K\) observes any propositional contradictions in the time we give it. By propositional contradictions, I mean that \(K\) should observe a contradiction if it can be proven using only propositional calculus. Third, we assume that if \(K\) observes a contradiction in \(T\cup\phi\) and \(T\cup\neg\phi\), then \(K\) should observe a contradiction in \(T\) in the same amount of time. Only the first assumption is necessary for the approximation to converge to the correct thing, but the other two will be useful for uniform coherence.

This approximation procedure only gives a random process which either contains or does not contain a sentence \(\phi\) with probability converging to \(\mathbb{P}(\phi)\). We can turn this into a procedure which outputs probabilities. Let \(N\) be a machine which on input \(t\), for each \(\phi\) which \(M(t)\) outputs with probability at least \(2^{-t}\), computes the probability that \(M(t)\) outputs each \(\phi\) (within an error of \(2^{-t}\)), and outputs the ordered pair containing \(\phi\) and this probability. This can be done by going through all possible execution paths for \(M(t)\) until the remaining paths have probability less than \(2^{-t}\).

Finally, we can turn this into a logical predictor as defined in the previous post by running \(N\) on input \(t\) for all natural numbers \(t\) in increasing order, and outputting a single # symbol between each pair of successive values of \(t\). In the next post, we will show that this logical predictor is uniformly coherent.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms