Passing Troll Bridge discussion post by Alex Appel 477 days ago | Abram Demski likes this | discuss In an earlier discussion about the Troll Bridge problem, Abram mentioned a Lobian proof that a logical induction based agent would converge to not crossing the bridge. It looked rather sketchy, though, and further discussion by Paul in the comments revealed that Troll Bridge might be passable. (as a quick refresher, Troll Bridge is the decision problem where action 1, ie, attempting to cross the bridge, will result in crossing the bridge for a utility of 1, unless you tried to cross the bridge because your exploration step was active, in which case, the troll blows up the bridge, and you get a utility of 0. Refraining from attempting to cross the bridge, which is action 2, will give a utility of 0.5, more about it will be typed up soon.) Results: The variant of logical induction that Abram mentioned, where the probabilities jump to 100% as soon as something gets proved, probably converges to failure on Troll Bridge. The vanilla variant of logical induction that was discussed in the Logical Induction paper has an unexpected and novel sort of Lobian loophole that appears to allow converging to failure on Troll Bridge, and also might be usable to introduce the spurious counterfactual problem from proof-based decision theory into the logical inductor decision theory setting. This same loophole also renders the “proof” that different logical inductors may converge to different actions in 5-and-10 (mentioned here) invalid, and I strongly suspect that sufficiently interesting proof orderings may be used to make any given logical inductor converge to taking either the 10 dollars or the 5 dollars, and vice-versa. There’s a new variant of logical induction I came up with (a minor and somewhat hacky variant) that allows proofs about “you could have an enforcing trader that makes large bets on getting certain outcomes” to go through, and in particular, it seems to allow victory on troll bridge if you set up the initial set of traders right. (modulo a few very reasonable assumptions, and one thing that I wasn’t quite able to prove but seems extremely likely) It’s a custom-designed and horribly hacky solution, and I’d expect the True Solution to involve (somehow) dropping exploration steps entirely, but the process of constructing it did turn up that unexpected loophole mentioned in the above paragraph, and also was good for training an ability to prove bounds about what a logical inductor does, so I might as well present it. Fair warning, the appendix involves a lot of digging into the fine details of the logical inductor algorithm. Setup: $$\newcommand{\wbest}[1]{sup_{\mathbb{W}\in PC(D_{#1})}}$$ $$\newcommand{\wworst}[1]{inf_{\mathbb{W}\in PC(D_{#1})}}$$ $$\newcommand{\bud}[2]{B#1_{#2}}$$ $$\newcommand{\totrade}{(\overline{\mathbb{P}})}$$ $\mathcal{B}^{Tb}_{n}\totrade$ is the $$n$$’th trade of the trader $$T$$ equipped with a “budget filter” of $$b$$, against the market $$\totrade$$. It is defined as follows: if: $$\mathbb{W}(\sum_{i\le m}\mathcal{B}^{Tb}_{i}\totrade)=-b$$ for some $$m\frac{i}{n}\wedge A_{n}=1)}{max(\mathbb{P}_{n}(A_{n}=1),\epsilon)}<.51\right)*\left((U_{n}>\frac{i}{n}\wedge A_{n}=1)-\frac{\mathbb{P}_{n}(U_{n}>\frac{i}{n}\wedge A_{n}=1)}{max(\mathbb{P}_{n}(A_{n}=1),\epsilon)}(A_{n}=1)\right)\] \(A_{n}$$ is an indicator function that’s 0 if n is before an “awakening day” and 1 afterward. Let’s go over some properties of $$E$$. Because it’s budgeted at $$X$$, its starting score is $$X$$. By the way the budgeting operator works, if it makes a trade, the score will drop by the worst-case value of that trade. Looking at it, what it does is start buying any components of $$\mathbb{E}(U_{n}|A_{n}=1)$$ that are priced below 51 cents, and it’ll escalate to spending its entire starting endowment by the time the price on any of the components drop to 50.5 cents. Thus, if $$E$$ doesn’t spend its entire budget or starting endowment, whichever is lower, on a turn, the price of all components will be 50.5 cents and above, so $$\mathbb{E}(U_{n}|A_{n}=1)>.505)$$. There is some day $$n$$ by which $$|\mathbb{E}(U_{n}|A_{n}=2)-.5|<.005$$ forevermore, so, assuming that this trader never escalates to spending all its money, and “wakes up” after that day, as implemented by that indicator function $$A_{n}$$, it will be able to ensure that the agent always attempts and succeeds at crossing the bridge (except on exploration steps). So, now all that’s left is showing that this trader will (with high probability), not escalate to spending all its money. Because we are taking an (approximate) fixed-point of $$E$$ and the supertrader $$S$$, and at the fixed-point, the total number of shares bought and sold is (approximately) zero, the magnitude of the enforcer’s trade is (approximately) equal to the magnitude of the supertrader’s trade. (it’s late at night and I’m not feeling like accounting for this and it almost certainly doesn’t make a difference). As this trader will escalate to whatever level of trading is required to keep the expected value of crossing the bridge above .505, the supertrader can only force a loss/budget overdraw once its score/maximum allowable trade rises above that of the enforcers. Now, the game is as follows. On turn $$n$$, the enforcer and supertrader have scores of $X_{n}:=X+\wworst{n}\mathbb{W}\left(\sum_{mk\) for varying values of k, the lowest possible value would be -5 and the highest possible value would be the number of purchased shares-5 before an observation of the value of $$\phi$$ came in, and that would immediately settle all the component bets. The second thing to note is that $$\delta_{n}$$ will be defined as the set of days $$n-1$$ and before where a trade was made that now has had its value revealed because an observation of the utility gained on that day has made it to the theorem prover on day $$n+1$$, and it wasn’t present on day n. It is then easy to verify that the change in score from day n to day n+1 is as follows. \[\Delta X=\wworst{n+1}\mathbb{W}\left(\sum_{m\in\delta_{n}}E_{m}\totrade\right)-\wworst{n}\left(\sum_{m\in\delta_{n}}E_{m}\totrade\right)+\wworst{n+1}\mathbb{W}(E_{n}\totrade)$ $\Delta C=-\wbest{n+1}\mathbb{W}\left(\sum_{m\in\delta_{n}}E_{m}\totrade\right)+\wbest{n}\left(\sum_{m\in\delta_{n}}E_{m}\totrade\right)-\wbest{n+1}\mathbb{W}(E_{n}\totrade)$ Now, we’ll impose one further key assumption. The proof-searching process must be either unbiased or biased in favor of crossing, ie, the sequence of proofs of the form “on day n, the bridge was crossed/not crossed” is a psuedorandom sequence with probability $$\frac{\epsilon}{2}$$ or less of the proof being “exploration step into the bridge occurred”. It’s possible to screw over the enforcer by deferring information of successful bridge-crossings to some really sparse subsequence, but making proofs of bridge-explosions denser, so this basically says that you get information about passing or not-passing the bridge in a way that isn’t biased towards particular outcomes. Direct observation of the utility on day n immediately after the n’th trade is made and the n’th action is selected is a very easy way to get this property, but this generalization handles delays in feedback. So, now that all these preliminaries are out of the way, we can see that the the reveal of theorems from day $$n$$ to day $$n+1$$ is always negative-expected-value for $$S$$ and positive-expected value for $$E$$, and the most recent trade only locks up slightly more value from $$E$$ than from $$S$$. Conditional on a particular trade being in $$\delta_{n}$$, the expected value of the trade to $$E$$ (remember, we’re assuming the pseudorandom sequence of exploration steps is truly random and taking an expectation over that) is bounded below by $$mag(E_{m}\totrade)*(.495(1-\epsilon)-.51\frac{\epsilon}{2})$$, and the expected value of the trade to $$S$$ is bounded above by $$mag(E_{m}\totrade)*(.51\frac{\epsilon}{2}-.495(1-\epsilon))$$. (we don’t account for the trader exploring into remaining on its side of the bridge because the conditional contract that $$E$$ bought and $$S$$ sold has 0 value in that world) Also note that if $$PC(D_{n+1})$$ settles the most recent trade, the same analysis applies to it, and if $$PC(D_{n+1})$$ doesn’t settle the most recent trade, then $$\wworst{n+1}\mathbb{W}(E_{n}\totrade)$$ is at minimum $$-.51mag(E_{m}\totrade)$$ and $$-\wbest{n+1}\mathbb{W}(E_{n}\totrade)$$ is at maximum $$-.505mag(E_{m}\totrade)$$. This may seem quite complicated, but the essence of it is that these two traders are playing a game. The super-trader can sell shares in “the bridge will be succesfully crossed” for at most 51 cents. However, until the share has its value assessed, the super-trader is only allowed to sell a finite amount of them to the enforcer (because score is tied to the maximum number of shares that can be moved). Also, there’s a $$1-\epsilon$$ chance that the share will turn out be worth 1 dollar, so this action is negative-expected value for the supertrader. I’m not sure of the exact result required to prove this, but there are results under the name of “Gambler’s Ruin” that any gambler playing a negative-expected-value game will eventually go broke regardless of strategy. I’m extremely confident that there’s a similar result that any gambler playing a negative-expected-value game against another player with a sufficiently larger bankroll will (with very high probability) fail to ever attain the same wealth as the other player. Thus, we have showed that, assuming unbiased finding of theorems, and assuming the gambler’s ruin conjecture above goes through, there’s a value of $$X$$ that can be assigned to the enforcer to ensure with arbitrarily high “probability” that the bridge will be crossed forevermore after the enforcer wakes up, except on exploration steps. Lemma 1 The maximum magnitude trade that $$S$$ can make is bounded above by $$7-\wbest{n}\mathbb{W}\left(\sum_{md}\zeta(b)*mag(\mathcal{B}^{Tb}_{n}\totrade)\] Now, we can split it up into two cases. In case one, for all \(\mathbb{W}\in PC(D_{n})$$, $mag(T_{n}\totrade)\le b+\mathbb{W}\left(\sum_{i\le n-1}\mathcal{B}^{Tb}_{i}\totrade\right)$ And thus, $$mag(\mathcal{B}^{Tb}_{n}\totrade)=mag(T_{n})$$ by the definition of $$\mathcal{B}$$. In the second case, there is a $$\mathbb{W}\in PC(D_{n})$$ which minimizes $$b+\mathbb{W}(\sum_{i\le n-1}\mathcal{B}^{Tb}_{n}\totrade)$$ to less than $$mag(T_{n}\totrade)$$, thus by the definition of $$\mathcal{B}$$, we get $mag(T_{n}\totrade)*\left(\frac{mag(T_{n}\totrade)}{b+\mathbb{W}(\sum_{i\le n-1}\mathcal{B}^{Tb}_{i}\totrade)}\right)^{-1}=b+s_{b}$ Therefore, $$mag(\bud{T}{n}\totrade)=\sum_{b>d}\zeta(b)*min(s_{b}+b,mag(T_{n}\totrade))$$, and we get the bound \[mag(\bud{T}{n}\totrade)<\sum_{d

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes