Intelligent Agent Foundations Forumsign up / log in

I found an improved version by Pavel, that gives a way to construct a proof of \(\phi\) from \(\Box_{n}\phi\) that has a length of \(\mathcal{O}(n)\). The improved version is here.

There are restrictions to this result, though. One is that the C-rule must apply to the logic. This is just the ability to go from \(\exists x:\phi(x)\) to instantiating a \(c\) such that \(\phi(c)\). Pretty much all reasonable theorem provers have this.

The second restriction is that the theory must be finitely axiomatizable. No axiom schemas allowed. Again, this isn’t much of a restriction in practice, because NBG set theory, which proves the consistency of ZFC, is finitely axiomatizable.

The proof strategy is basically as follows. It’s shown that the shortest proof of a statement with quantifier depth n must have a length of \(\Omega(n^2)\), if the maximum quantifier depth in the proof is \(2n\) or greater.

This can be flipped around to conclude that if there’s a length-n proof of \(\phi\), the maximum quantifier depth in the proof can be at most \(\mathcal{O}(\sqrt{n})\).

The second part of the proof involves constructing a bounded-quantifier version of a truth predicate. By Tarski’s undefinability of truth, a full truth predicate cannot be constructed, but it’s possible to exhibit a formula for which it’s provable that \[qd(\overline{\psi})\le n\to(Sat_{n}(\overline{\psi},x)\leftrightarrow\Sigma(Sat_{n},\overline{\psi},x))\] (\(\Sigma\) is the formula laying out Tarski’s conditions for something to be a truth predicate). Also, if \(n\ge\) quantifier depth of \(\psi\), there’s a proof of \[Sat_{n}(\overline{\psi},x)\leftrightarrow\psi[x]\] (\(\psi[x]\) is the sentence \(\psi\) with its free variables substituted for the elements enumerated in the list \(x\)) Also, there’s a proof that \(Sat_{n}\) is preserved under inference rules and logical axioms, as long as everything stays below a quantifier depth of \(n\).

All these proofs can be done in \(\mathcal{O}(n^2)\) lines. One factor of \(n\) comes from the formula abbreviated as \(Sat_{n}(x,y)\) getting longer at a linear rate, and the other factor comes from having to prove \(Sat_{n}\) for each \(n\) seperately as an ingredient for the next proof.

Combining the two parts, the \(\mathcal{O}(\sqrt{n})\) bound on the quantifier depth and the \(\mathcal{O}(n^2)\) bound on how long it takes to prove stuff about the truth predicate, make it take \(\mathcal{O}(n)\) steps to prove all the relevant theorems about a sufficiently large bounded quantifier depth truth predicate, and then you can just go “the statement that we are claiming to have been proved must have \(Sat_{n}\) apply to it, and we’ve proved this is equivalent to the statement itself”

As a further bonus, a single \(\mathcal{O}(n)\)-length proof can establish the consistency of the theory itself for all \(n\)-length proofs.

It seems like a useful project to develop a program that will automatically write a proof of this form, to assess whether abstract unpacking of bounded proofs is usable in practice, but it will require digging into a bunch of finicky details of exactly how to encode a math theory inside itself.


Caught a flaw with this proposal in the currently stated form, though it is probably patchable.

When unpacking a proof, at some point the sentence \(\Box_{1}\phi\) will be reached as a conclusion, which is a false statement.


by Sam Eisenstat 17 days ago | link

I misunderstood your proposal, but you don’t need to do this work to get what you want. You can just take each sentence \(\square_n \phi \to \phi\) as an axiom, but declare that this axiom takes \(n\) symbols to invoke. This could be done by changing the notion of length of a proof, or by taking axioms \(\psi_{\phi,n} \to (\square_n \phi \to \phi)\) and \(\psi_{\phi,n}\) with \(\psi_{\phi,n}\) very long.


by Alex Appel 64 days ago | Abram Demski likes this | link | parent | on: Smoking Lesion Steelman

I think that in that case, the agent shouldn’t smoke, and CDT is right, although there is side-channel information that can be used to come to the conclusion that the agent should smoke. Here’s a reframing of the provided payoff matrix that makes this argument clearer. (also, your problem as stated should have 0 utility for a nonsmoker imagining the situation where they smoke and get killed)

Let’s say that there is a kingdom which contains two types of people, good people and evil people, and a person doesn’t necessarily know which type they are. There is a magical sword enchanted with a heavenly aura, and if a good person wields the sword, it will guide them do heroic things, for +10 utility (according to a good person) and 0 utility (according to a bad person). However, if an evil person wields the sword, it will afflict them for the rest of their life with extreme itchiness, for -100 utility (according to everyone).

good person’s utility estimates:

  • takes sword

    • I’m good: 10

    • I’m evil: -90

  • don’t take sword: 0

evil person’s utility estimates:

  • takes sword

    • I’m good: 0

    • I’m evil: -100

  • don’t take sword: 0

As you can clearly see, this is the exact same payoff matrix as the previous example. However, now it’s clear that if a (secretly good) CDT agent believes that most of society is evil, then it’s a bad idea to pick up the sword, because the agent is probably evil (according to the info they have) and will be tormented with itchiness for the rest of their life, and if it believes that most of society is good, then it’s a good idea to pick up the sword. Further, this situation is intuitively clear enough to argue that CDT just straight-up gets the right answer in this case.

A human (with some degree of introspective power) in this case, could correctly reason “oh hey I just got a little warm fuzzy feeling upon thinking of the hypothetical where I wield the sword and it doesn’t curse me. This is evidence that I’m good, because an evil person would not have that response, so I can safely wield the sword and will do so”.

However, what the human is doing in this case is using side-channel information that isn’t present in the problem description. They’re directly experiencing sense data as a result of the utility calculation outputting 10 in that hypothetical, and updating on that. In a society where everyone was really terrible at introspection so the only access they had to their decision algorithm was seeing their actual decision, (and assuming no previous decision problems that good and evil people decide differently on so the good person could learn that they were good by their actions), it seems to me like there’s a very intuitively strong case for not picking up the sword/not smoking.


by Alex Appel 77 days ago | link | parent | on: Musings on Exploration

A: While that is a really interesting note that I hadn’t spotted before, the standard formulation of exploration steps in logical inductor decision theory involve infinite exploration steps over all time, so even though an agent of this type would be able to inductively learn from what other agents do in different decision problems in less time than it naively appears, that wouldn’t make it explore less.

B: What I intended with the remark about Thompson sampling was that troll bridge functions on there being two distinct causes of “attempting to cross the bridge”. One is crossing because you believe it to be the best action, and the other is crossing because an exploration step occurred, and Thompson sampling doesn’t have a split decision criterion like this. Although now that you point it out, it is possible to make a Thompson sampling variant where the troll blows up the bridge when “crossing the bridge” is not the highest-ranked action.


Update: This isn’t really an issue, you just need to impose an assumption that there is some function \(f\) such that \(f(n)>n\), and \(f(n)\) is computable in time polynomial in \(f(n)\), and you always find out whether exploration happened on turn \(f(n)\) after \(\mathcal{O}(f(n+1))\) days.

This is just the condition that there’s a subsequence where good feedback is possible, and is discussed significantly in section 4.3 of the logical induction paper.

If there’s a subsequence B (of your subsequence of interest, A) where you can get good feedback, then there’s infinite exploration steps on subsequence B (and also on A because it contains B)

This post is hereby deprecated. Still right, just not that relevant.


by Alex Appel 97 days ago | Abram Demski likes this | link | parent | on: Distributed Cooperation

If you drop the Pareto-improvement condition from the cell rank, and just have “everyone sorts things by their own utility”, then you won’t necessarily get a Pareto-optimal outcome (within the set of cell center-points), but you will at least get a point where there are no strict Pareto improvements (no points that leave everyone better off).

The difference between the two is… let’s say we’ve got a 2-player 2-move game that in utility-space, makes some sort of quadrilateral. If the top and right edges join at 90 degrees, the Pareto-frontier would be the point on the corner, and the set of “no strict Pareto improvements” would be the top and the right edges.

If that corner is obtuse, then both “Pareto frontier” and “no strict Pareto improvements” agree that both line edges are within the set, and if the corner is acute, then both “Pareto frontier” and “no strict Pareto improvements” agree that only the corner is within the set. It actually isn’t much of a difference, it only manifests when the utilities for a player are exactly equal, and is easily changed by a little bit of noise.

The utility-approximation issue you pointed out seems to be pointing towards the impossibility of guaranteeing limiting to a point on the Pareto frontier (when you make the cell size smaller and smaller), precisely because of that “this set is unstable under arbitrarily small noise” issue.

But, the “set of all points that have no strict Pareto improvements by more than \(\delta\) for all players”, ie, the \(\delta\)-fuzzed version of “set of points with no strict pareto improvement”, does seem to be robust against a little bit of noise, and doesn’t require the Pareto-improvement condition on everyone’s ranking of cells.

So I’m thinking that if that’s all we can attain (because of the complication you pointed out), then it lets us drop that inelegant Pareto-improvement condition.

I’ll work on the proof that for sufficiently small cell size \(\epsilon\), you can get an outcome within \(\delta\) of the set of “no strict Pareto improvements available”

Nice job spotting that flaw.


Intermediate update:

The handwavy argument about how you’d get propositional inconsistency in the limit of imposing the constraint of “the string cannot contain \(a\wedge b\wedge c...\to\neg\phi\) and \(a\) and \(b\) and… and \(\phi\)

is less clear than I thought. The problem is that, while the prior may learn that that constraint applies as it updates on more sentences, that particular constraint can get you into situations where adding either \(\phi\) or \(\neg\phi\) leads to a violation of the constraint.

So, running the prior far enough forward leads to the probability distribution being nearly certain that, while that particular constraint applied in the past, it will stop applying at some point in the future by vetoing both possible extensions of a string of sentences, and then less-constrained conditions will apply from that point forward.

On one hand, if you don’t have the computational resources to enforce full propositional consistency, it’s expected that most of the worlds you generate will be propositionally inconsistent, and midway through generating them you’ll realize that some of them are indeed propositionally inconsistent.

On the other hand, we want to be able to believe that constraints capable of painting themselves into a corner will apply to reality forevermore.

I’ll think about this a bit more. One possible line of attack is having \(\mathbb{P}(\phi)\) and \(\mathbb{P}(\neg\phi)\) not add up to one, because it’s possible that the sentence generating process will just stop cold before one of the two shows up, and renormalizing them to 1. But I’d have to check if it’s still possible to \(\varepsilon\)-approximate the distribution if we introduce this renormalization, and to be honest, I wouldn’t be surprised if there was a more elegant way around this.

EDIT: yes it’s still possible to \(\varepsilon\)-approximate the distribution in known time if you have \(\mathbb{P}(\phi)\) refer to \(\frac{probability to encounter \phi first}{1-probability to halt first}\), although the bounds are really loose. This is because if most of the execution paths involve halting before the sentence is sampled, \(\varepsilon\)-error in the probability of sampling \(\phi\) first will get blown up by the small denominator.

Will type up the proof later, but it basically proceeds by looking at the probability mass associated with “sample the trivial constraint that accepts everything, and sample it again on each successive round”, because this slice of probability mass has a near-guarantee of hitting \(\phi\), and then showing that even this tiny slice has substantially more probability mass than the cumulative probability of ever sampling a really rare sentence or not hitting any of \(\phi\), \(\neg\phi\), or the string terminating.


A summary that might be informative to other people: Where does the \(\omega(\frac{2}{3})\) requirement on the growth rate of the “rationality parameter” \(\beta\) come from?

Well, the expected loss of the agent comes from two sources. Making a suboptimal choice on its own, and incurring a loss from consulting a not-fully-rational advisor. The policy of the agent is basically “defer to the advisor when the expected loss over all time of acting (relative to the optimal move by an agent who knew the true environment) is too high”. Too high, in this case, cashes out as “higher than \(\beta(t)^{-1}t^{-1/x}\)”, where t is the time discount parameter and \(\beta\) is the level-of-rationality parameter. Note that as the operator gets more rational, the agent gets less reluctant about deferring. Also note that t is reversed from what you might think, high values of t mean that the agent has a very distant planning horizon, low values mean the agent is more present-oriented.

On most rounds, the agent acts on its own, so the expected all-time loss on a single round from taking suboptimal choices is on the order of \(\beta(t)^{-1}t^{-1/x}\), and also we’re summing up over about t rounds (technically exponential discount, but they’re similar enough). So the loss from acting on its own ends up being about \(\beta(t)^{-1}t^{(x-1)/x}\).

On the other hand, delegation will happen on at most ~\(t^{2/x}\) rounds, with a loss of \(\beta(t)^{-1}\) value, so the loss from delegation ends up being around \(\beta(t)^{-1}t^{2/x}\).

Setting these two losses equal to each other/minimizing the exponent on the t when they are smooshed together gets you x=3. And then \(\beta(t)\) must grow asymptotically faster than \(t^{2/3}\) to have the loss shrink to 0. So that’s basically where the 2/3 comes from, it comes from setting the delegation threshold to equalize long-term losses from the AI acting on its own, and the human picking bad choices, as the time horizon t goes to infinity.


I don’t believe that \(x_{:n}^{!k}\) was defined anywhere, but we “use the definition” in the proof of Lemma 1.

As far as I can tell, it’s a set of (j,y) pairs, where j is the index of a hypothesis, and y is an infinite history string, rather like the set \(h^{!k}\).

How do the definitions of \(h^{!k}\) and \(x^{!k}_{:n}\) differ?


by Vadim Kosoy 178 days ago | link

Hi Alex!

The definition of \(h^{!k}\) makes sense for any \(h\), that is, the superscript \(!k\) in this context is a mapping from finite histories to sets of pairs as you said. In the line in question we just apply this mapping to \(x_{:n}\) where \(x\) is a bound variable coming from the expected value.

I hope this helps?


by Alex Appel 241 days ago | Abram Demski likes this | link | parent | on: Predictable Exploration

Hm, I got the same result from a different direction.

(probably very confused/not-even-wrong thoughts ahead)

It’s possible to view a policy of the form “I’ll compute X and respond based on what X outputs” as… tying your output to X, in a sense. Logical link formation, if you will.

And policies of the form “I’ll compute X and respond in a way that makes that output of X impossible/improbable” (can’t always do this) correspond to logical link cutting.

And with this, we see what the chicken rule in MUDT/exploration in LIDT is doing. It’s systematically cutting all the logical links it can, and going “well, if the statement remains correlated with me despite me trying my best to shake off anything that predicts me too well, I guess I”cause" it."

But some potentially-useful links were cut by this process, such as “having short abstract reasoning available that lets others predict what you will do” (a partner in a prisoner’s dilemma, the troll in troll bridge, etc..)

At the same time, some links should be cut by a policy that diagonalizes against predictions/calls upon an unpredictable process (anything that can be used to predict your behavior in matching pennies, evading Death when Death can’t crack your random number generator, etc…)

So I wound up with “predictable policy selection that forms links to stuff that would be useful to correlate with yourself, and cuts links to stuff that would be detrimental to have correlated with yourself”.

Predictably choosing an easy-to-predict policy is easy-to-predict, predictably choosing a hard-to-predict policy is hard-to-predict.

This runs directly into problem 1 of “how do you make sure you have good counterfactuals of what would happen if you had a certain pattern of logical links, if you aren’t acting unpredictably”, and maybe some other problems as well, but it feels philosophically appealing.


by Abram Demski 241 days ago | link

So I wound up with “predictable policy selection that forms links to stuff that would be useful to correlate with yourself, and cuts links to stuff that would be detrimental to have correlated with yourself”.


I’m reading this as “You want to make decisions as early as you can, because when you decide one of the things you can do is decide to put the decision off for later; but when you make a decision later, you can’t decide to put it earlier.”

And “logical time” here determines whether others can see your move when they decide to make theirs. You place yourself upstream of more things if you think less before deciding.

This runs directly into problem 1 of “how do you make sure you have good counterfactuals of what would happen if you had a certain pattern of logical links, if you aren’t acting unpredictably”, and maybe some other problems as well, but it feels philosophically appealing.

Here’s where I’m saying “just use the chicken rule again, in this stepped-back reasoning”. It likely re-introduces versions the same problems at the higher level, but perhaps iterating this process as many times as we can afford is in some sense the best we can do.


by Abram Demski 240 days ago | link

Thinking about this more, I think there’s an important disanalogy between trying to make policy decisions with earlier market states vs smaller proof-searches.

In Agent Simulates Predictor, we can use an earlier market state to decide our policy, because the earlier market state can trust the predictor to make the right predictions, even if the predictor is using a more powerful logic (since logical inductors can learn to boundedly trust more powerful logics).

However, with proof-based DTs, no analogous move is possible.

Consider a version of Agent Simulates Predictor in which Omega searches for a proof that you one-box in PA+Con(PA); if one is found, Omega fills the $1m box. Otherwise, not. Omega has \(T_1\) time to think. The agent has \(T_2\) time to think, \(T_2 >> T_1\). The agent reasons in PA.

If the agent refused to use all its time, and only ran for \(T_0 << T_1\) time, but still had enough time to find interesting proofs, then it could reason as follows: “If I one-box, then there is a short proof that I one-box which Omega can find. So I get $1M.” It may not know if PA+Con(PA) is sound, but that doesn’t matter; the agent just has to ensure that there is a proof which Omega will find. It wouldn’t find any proofs leading to higher utility that this, so it would one-box and get $1M.

Unfortunately, I don’t see any way to harness the shorter proof-search to choose a policy which would get the $1M in this case but choose to think longer in other cases where that’s beneficial.

We might want the agent to reason: “If I stop and one-box right now, Omega will be able to prove that I one-box, and I’ll get $1M. If I wait longer, Omega won’t be able to prove what I do, so I’ll at most be able to get $100. So, I’ll stop now and one-box.” However, this reasoning would have to take place at a proof-length in which several things hold at once:

  • The agent can prove that it’s still “early” enough that its action would be provable to Omega if it acted now.
  • It’s “late” enough that the agent can see that Omega’s predictions are sound (IE, it can check that Omega doesn’t reach false results in the limited time it has). This allows the agent to see that it’ll never get money from both boxes.

It seems very unlikely that there is a proof length where these can both be true, due to bounded Löb.

For logical induction, on the other hand, there’s quite likely to be a window with analogous properties.







I found an improved version
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes


Privacy & Terms