by Alex Appel 21 days ago | Vadim Kosoy likes this | link | parent | on: Delegative Inverse Reinforcement Learning A summary that might be informative to other people: Where does the $$\omega(\frac{2}{3})$$ requirement on the growth rate of the “rationality parameter” $$\beta$$ come from? Well, the expected loss of the agent comes from two sources. Making a suboptimal choice on its own, and incurring a loss from consulting a not-fully-rational advisor. The policy of the agent is basically “defer to the advisor when the expected loss over all time of acting (relative to the optimal move by an agent who knew the true environment) is too high”. Too high, in this case, cashes out as “higher than $$\beta(t)^{-1}t^{-1/x}$$”, where t is the time discount parameter and $$\beta$$ is the level-of-rationality parameter. Note that as the operator gets more rational, the agent gets less reluctant about deferring. Also note that t is reversed from what you might think, high values of t mean that the agent has a very distant planning horizon, low values mean the agent is more present-oriented. On most rounds, the agent acts on its own, so the expected all-time loss on a single round from taking suboptimal choices is on the order of $$\beta(t)^{-1}t^{-1/x}$$, and also we’re summing up over about t rounds (technically exponential discount, but they’re similar enough). So the loss from acting on its own ends up being about $$\beta(t)^{-1}t^{(x-1)/x}$$. On the other hand, delegation will happen on at most ~$$t^{2/x}$$ rounds, with a loss of $$\beta(t)^{-1}$$ value, so the loss from delegation ends up being around $$\beta(t)^{-1}t^{2/x}$$. Setting these two losses equal to each other/minimizing the exponent on the t when they are smooshed together gets you x=3. And then $$\beta(t)$$ must grow asymptotically faster than $$t^{2/3}$$ to have the loss shrink to 0. So that’s basically where the 2/3 comes from, it comes from setting the delegation threshold to equalize long-term losses from the AI acting on its own, and the human picking bad choices, as the time horizon t goes to infinity. reply
 by Alex Appel 21 days ago | link | parent | on: Delegative Inverse Reinforcement Learning I don’t believe that $$x_{:n}^{!k}$$ was defined anywhere, but we “use the definition” in the proof of Lemma 1. As far as I can tell, it’s a set of (j,y) pairs, where j is the index of a hypothesis, and y is an infinite history string, rather like the set $$h^{!k}$$. How do the definitions of $$h^{!k}$$ and $$x^{!k}_{:n}$$ differ? reply
 by Vadim Kosoy 21 days ago | link Hi Alex! The definition of $$h^{!k}$$ makes sense for any $$h$$, that is, the superscript $$!k$$ in this context is a mapping from finite histories to sets of pairs as you said. In the line in question we just apply this mapping to $$x_{:n}$$ where $$x$$ is a bound variable coming from the expected value. I hope this helps? reply
 by Abram Demski 84 days ago | link So I wound up with “predictable policy selection that forms links to stuff that would be useful to correlate with yourself, and cuts links to stuff that would be detrimental to have correlated with yourself”. Agreed! I’m reading this as “You want to make decisions as early as you can, because when you decide one of the things you can do is decide to put the decision off for later; but when you make a decision later, you can’t decide to put it earlier.” And “logical time” here determines whether others can see your move when they decide to make theirs. You place yourself upstream of more things if you think less before deciding. This runs directly into problem 1 of “how do you make sure you have good counterfactuals of what would happen if you had a certain pattern of logical links, if you aren’t acting unpredictably”, and maybe some other problems as well, but it feels philosophically appealing. Here’s where I’m saying “just use the chicken rule again, in this stepped-back reasoning”. It likely re-introduces versions the same problems at the higher level, but perhaps iterating this process as many times as we can afford is in some sense the best we can do. reply
 by Abram Demski 83 days ago | link Thinking about this more, I think there’s an important disanalogy between trying to make policy decisions with earlier market states vs smaller proof-searches. In Agent Simulates Predictor, we can use an earlier market state to decide our policy, because the earlier market state can trust the predictor to make the right predictions, even if the predictor is using a more powerful logic (since logical inductors can learn to boundedly trust more powerful logics). However, with proof-based DTs, no analogous move is possible. Consider a version of Agent Simulates Predictor in which Omega searches for a proof that you one-box in PA+Con(PA); if one is found, Omega fills the $1m box. Otherwise, not. Omega has $$T_1$$ time to think. The agent has $$T_2$$ time to think, $$T_2 >> T_1$$. The agent reasons in PA. If the agent refused to use all its time, and only ran for $$T_0 << T_1$$ time, but still had enough time to find interesting proofs, then it could reason as follows: “If I one-box, then there is a short proof that I one-box which Omega can find. So I get$1M.” It may not know if PA+Con(PA) is sound, but that doesn’t matter; the agent just has to ensure that there is a proof which Omega will find. It wouldn’t find any proofs leading to higher utility that this, so it would one-box and get $1M. Unfortunately, I don’t see any way to harness the shorter proof-search to choose a policy which would get the$1M in this case but choose to think longer in other cases where that’s beneficial. We might want the agent to reason: “If I stop and one-box right now, Omega will be able to prove that I one-box, and I’ll get $1M. If I wait longer, Omega won’t be able to prove what I do, so I’ll at most be able to get$100. So, I’ll stop now and one-box.” However, this reasoning would have to take place at a proof-length in which several things hold at once: The agent can prove that it’s still “early” enough that its action would be provable to Omega if it acted now. It’s “late” enough that the agent can see that Omega’s predictions are sound (IE, it can check that Omega doesn’t reach false results in the limited time it has). This allows the agent to see that it’ll never get money from both boxes. It seems very unlikely that there is a proof length where these can both be true, due to bounded Löb. For logical induction, on the other hand, there’s quite likely to be a window with analogous properties. reply
 by Alex Appel 94 days ago | link | parent | on: Smoking Lesion Steelman III: Revenge of the Tickle... What does the Law of Logical Causality say about CON(PA) in Sam’s probabilistic version of the troll bridge? My intuition is that in that case, the agent would think CON(PA) would be causally downstream of itself, because the distribution of actions conditional on CON(PA) and $$\neg$$CON(PA) are different. Can we come up with any example where the agent thinking it can control CON(PA) (or any other thing that enables accurate predictions of its actions) actually gets it into trouble? reply
 by Abram Demski 90 days ago | link I agree, my intuition is that LLC asserts that the troll, and even CON(PA), is downstream. And, it seems to get into trouble because it treats it as downstream. I also suspect that Troll Bridge will end up formally outside the realm where LLC can be justified by the desire to make ratifiability imply CDT=EDT. (I’m working on another post which will go into that more.) reply
 by Alex Appel 178 days ago | link | parent | on: Cooperative Oracles: Stratified Pareto Optima and ... It looks legitimate, actually. Remember, $$f$$ is set-valued, so if $$r_{i-1}=0$$, $$f(r)=[0,1]$$. In all other cases, $$f(r)=\frac{r_{i-1}}{2}$$. $$f$$ is a nonempty convex set-valued function, so all that’s left is to show the closed graph property. If the limiting value of $$r_{i-1}$$ is something other than 0, the closed graph property holds, and if the limiting value of $$r_{i-1}$$ is 0, the closed graph property holds because $$0\in[0,1]$$. reply
 by Vadim Kosoy 161 days ago | link Hi Alex! I agree that the multimap you described is Kakutani and gives the correct fair set, but in the OP it says that if $$r_{i-1}=0$$ then $$f(r)=r_i$$, not $$f(r)=[0,1]$$. Maybe I am missing something about the notation? reply
 by Alex Appel 577 days ago | link | parent | on: All Mathematicians are Trollable: Divergence of Na... Quick question: It is possible to drive the probability of x down arbitrarily far by finding a bunch of proofs of the form “x implies y” where y is a theorem. But the exact same argument applies to not x. If the theorem-prover always finds a proof of the form “not x implies y” immediately afterwards, the probability wouldn’t converge, but it would fluctuate within a certain range, which looks good enough. What, if any, conditions need to be imposed on the theorem prover to confine the probabilities assigned to an unprovable statement to a range that is narrower than (0, 1)? reply
 by Alex Appel 808 days ago | link | parent | on: Proof Length and Logical Counterfactuals Revisited I don’t know, that line of reasoning that U()=10 seems like a pretty clear consequence of PA+Sound(PA)+A()=a, and the lack of a counterfactual for “X is false” doesn’t violate any of my intuitions. It’s just reasoning backwards from “The agent takes action a” to the mathematical state of affairs that must have produced it (there is a short proof of X). On second thought, the thing that broke the original trolljecture was reasoning backwards from “I take action a” to the mathematical state of affairs that produced it. Making inferences about the mathematical state of affairs in your counterfactuals using knowledge of your own decision procedure does seem to be a failure mode at first glance. Maybe use the counterfactual of “find-and-replace all instances of X’s source code in the universe program U with action a, and evaluate”? But that wouldn’t work for different algorithms that depend on checking the same math facts. There needs to be some way to go from “X takes action A” to “closely related algorithm Y takes action B”. But that’s just inferring mathematical statements from the combination of actions and knowing X’s decision rule. I’ll stick with the trolljecture as the current best candidate for “objective” counterfactuals, because reasoning backwards from actions and decision rules a short way into math facts seems needed to handle “logically related” algorithms, and this counterexample looks intuitively correct. reply
 by Patrick LaVictoire 851 days ago | link ASP has only been formalized in ways that don’t translate to modal universes, so it’s only the analogous extensions that could apply there. In the formulation here, it’s not enough that the agent one-box; it must do so provably in less than $$N$$ steps. I think that both the valuations $$\nu_{\textsf{ZFC+Sound(ZFC)}+A()=1}(U()=1M)$$ and $$\nu_{\textsf{ZFC+Sound(ZFC)}+A()=1}(U()=0)$$ will probably be about $$1/2$$, but I’m not sure. reply
 by Alex Appel 990 days ago | Benja Fallenstein and Jessica Taylor like this | link | parent | on: Reflective probabilistic logic cannot assign posit... If you are looking for a weaker inner reflection principle, does $$\mathbb{P}((a < \mathbb{P}(\ulcorner \varphi \urcorner ) < b) \rightarrow \mathbb{P}(\ulcorner a-\epsilon < \mathbb{P}(\ulcorner \varphi \urcorner) < b+\epsilon \urcorner) = 1) = 1$$ for some finite $$\epsilon$$ sound viable, or are there fatal flaws with it? This came about while trying to figure out how to break the proof in the probabilistic procrastination paper. Making the reflection principle unable to prove that P(eventually presses button) is above $$1-\epsilon$$ came up as a possible way forward. reply
 by Benja Fallenstein 987 days ago | Jessica Taylor likes this | link If you replace the inner “$$= 1$$” by “$$> 1-\epsilon$$”, then the literal thing you wrote follows from the reflection principle: Suppose that the outer probability is $$< 1$$. Then $\mathbb{P}[(a < \mathbb{P}[\varphi] < b) \wedge \mathbb{P}[a-\epsilon < \mathbb{P}[\varphi] < b+\varepsilon] \le 1 - \epsilon] > 0.$ Now, $$\mathbb{P}[a < \mathbb{P}[\varphi] < b] > 0$$ implies $$\mathbb{P}[a \le \mathbb{P}[\varphi] \le b] > 0$$, which by the converse of the outer reflection principle yields $$a \le \mathbb{P}[\varphi] \le b$$, whence $$a - \epsilon < \mathbb{P}[\varphi] < b + \epsilon$$. Now, by the forward direction of the outer reflection principle, we have $\mathbb{P}[a - \epsilon < \mathbb{P}[\varphi] < b + \epsilon] = 1 > 1 - \epsilon,$ which, by the outer reflection principle again, implies $\mathbb{P}[\mathbb{P}[a - \epsilon < \mathbb{P}[\varphi] < b + \epsilon] > 1 - \epsilon] = 1,$ a contradiction to the assumption that $$\cdots \le 1 - \epsilon$$ had outer probability $$> 0$$. However, what we’d really like is an inner reflection principle that assigns probability one to the statement *quantified over all $$a$$, $$b$$, and Gödel numbers $$\ulcorner\varphi\urcorner$$. I think Paul Christiano has a proof that this is impossible for small enough $$\epsilon$$, but I don’t remember how the details worked. reply
 by Paul Christiano 987 days ago | Benja Fallenstein likes this | link Here is the basic problem. I think that you can construct an appropriate liar’s sentence by using a Lipschitz function without an approximate fixed point. But someone might want to check that more carefully and write it up, to make sure and to see where the possible loopholes are. I think that it may not have ruled out this particular principle, just something slightly stronger (but the two were equivalent for the kinds of proof techniqeus we were considering). reply

### NEW DISCUSSION POSTS

This is somewhat related to
 by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
 by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
 by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
 by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
 by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
 by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
 by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
 by Abram Demski on Policy Selection Solves Most Problems | 1 like

Looking "at the very
 by Abram Demski on Policy Selection Solves Most Problems | 0 likes

 by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
 by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
 by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
 by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes