Computing an exact quantilal policy discussion post by Vadim Kosoy 13 days ago | discuss
Resource-Limited Reflective Oracles
post by Alex Appel 15 days ago | Abram Demski likes this | discuss

Reflective oracles accurately answer questions about what arbitrary halting probabilistic oracle machines output. It is possible to make a variant of a reflective oracle that accurately answers questions about what sufficiently short-running Turing machines with access to the same oracle output.

 No Constant Distribution Can be a Logical Inductor discussion post by Alex Appel 18 days ago | Abram Demski, Jessica Taylor and Stuart Armstrong like this | 1 comment
 Musings on Exploration discussion post by Alex Appel 23 days ago | Vadim Kosoy likes this | 4 comments
Quantilal control for finite MDPs
post by Vadim Kosoy 25 days ago | Ryan Carey, Alex Appel and Abram Demski like this | discuss

We introduce a variant of the concept of a “quantilizer” for the setting of choosing a policy for a finite Markov decision process (MDP), where the generic unknown cost is replaced by an unknown penalty term in the reward function. This is essentially a generalization of quantilization in repeated games with a cost independence assumption. We show that the “quantilal” policy shares some properties with the ordinary optimal policy, namely that (i) it can always be chosen to be Markov (ii) it can be chosen to be stationary when time discount is geometric (iii) the “quantilum” value of an MDP with geometric time discount is a continuous piecewise rational function of the parameters, and it converges when the discount parameter $$\lambda$$ approaches 1. Finally, we demonstrate a polynomial-time algorithm for computing the quantilal policy, showing that quantilization is not qualitatively harder than ordinary optimization.

 A Difficulty With Density-Zero Exploration discussion post by Alex Appel 30 days ago | 1 comment
Distributed Cooperation
post by Alex Appel 38 days ago | Abram Demski and Scott Garrabrant like this | 2 comments

Reflective oracles can be approximated by computing Nash equilibria. But is there some procedure that produces a Pareto-optimal equilibrium in a game, aka, a point produced by a Cooperative oracle? It turns out there is. There are some interesting philosophical aspects to it, which will be typed up in the next post.

The result is not original to me, it’s been floating around MIRI for a while. I think Scott, Sam, and Abram worked on it, but there might have been others. All I did was formalize it a bit, and generalize from the 2-player 2-move case to the n-player n-move case. With the formalism here, it’s a bit hard to intuitively understand what’s going on, so I’ll indicate where to visualize an appropriate 3-dimensional object.

 Passing Troll Bridge discussion post by Alex Appel 61 days ago | Abram Demski likes this | discuss
Why we want unbiased learning processes
post by Stuart Armstrong 64 days ago | discuss

Crossposted at Lesserwrong.

tl;dr: if an agent has a biased learning process, it may choose actions that are worse (with certainty) for every possible reward function it could be learning.

 Two Types of Updatelessness discussion post by Abram Demski 69 days ago | discuss
 Stable Pointers to Value II: Environmental Goals discussion post by Abram Demski 75 days ago | 1 comment
Further Progress on a Bayesian Version of Logical Uncertainty
post by Alex Appel 83 days ago | Scott Garrabrant likes this | 1 comment

I’d like to credit Daniel Demski for helpful discussion.

 Strategy Nonconvexity Induced by a Choice of Potential Oracles discussion post by Alex Appel 89 days ago | Abram Demski likes this | discuss
An Untrollable Mathematician
post by Abram Demski 92 days ago | Alex Appel, Sam Eisenstat, Vadim Kosoy, Jack Gallagher, Jessica Taylor, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this | 1 comment

Follow-up to All Mathematicians are Trollable.

It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the non-convergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild non-convergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.

Logical counterfactuals and differential privacy
post by Nisan Stiennon 93 days ago | Abram Demski and Scott Garrabrant like this | 1 comment

This idea was informed by discussions with Abram Demski, Scott Garrabrant, and the MIRIchi discussion group.

More precise regret bound for DRL
post by Vadim Kosoy 124 days ago | Alex Appel likes this | discuss

We derive a regret bound for DRL reflecting dependence on:

• Number of hypotheses

• Mixing time of MDP hypotheses

• The probability with which the advisor takes optimal actions

That is, the regret bound we get is fully explicit up to a multiplicative constant (which can also be made explicit). Currently we focus on plain (as opposed to catastrophe) and uniform (finite number of hypotheses, uniform prior) DRL, although this result can and should be extended to the catastrophe and/or non-uniform settings.

 Value learning subproblem: learning goals of simple agents discussion post by Alex Mennen 129 days ago | discuss
 Oracle paper discussion post by Stuart Armstrong 133 days ago | Vladimir Slepnev likes this | discuss
Being legible to other agents by committing to using weaker reasoning systems
post by Alex Mennen 143 days ago | Stuart Armstrong and Vladimir Slepnev like this | 1 comment

Suppose that an agent $$A_{1}$$ reasons in a sound theory $$T_{1}$$, and an agent $$A_{2}$$ reasons in a theory $$T_{2}$$, such that $$T_{1}$$ proves that $$T_{2}$$ is sound. Now suppose $$A_{1}$$ is trying to reason in a way that is legible to $$A_{2}$$, in the sense that $$A_{2}$$ can rely on $$A_{1}$$ to reach correct conclusions. One way of doing this is for $$A_{1}$$ to restrict itself to some weaker theory $$T_{3}$$, which $$T_{2}$$ proves is sound, for the purposes of any reasoning that it wants to be legible to $$A_{2}$$. Of course, in order for this to work, not only would $$A_{1}$$ have to restrict itself to using $$T_{3}$$, but $$A_{2}$$ would to trust that $$A_{1}$$ had done so. A plausible way for that to happen is for $$A_{1}$$ to reach the decision quickly enough that $$A_{2}$$ can simulate $$A_{1}$$ making the decision to restrict itself to using $$T_{3}$$.

 Why DRL doesn't work for arbitrary environments discussion post by Vadim Kosoy 146 days ago | discuss
 Stable agent, subagent-unstable discussion post by Stuart Armstrong 148 days ago | discuss
Policy Selection Solves Most Problems
post by Abram Demski 148 days ago | Alex Appel and Vladimir Slepnev like this | 4 comments

It seems like logically updateless reasoning is what we would want in order to solve many decision-theory problems. I show that several of the problems which seem to require updateless reasoning can instead be solved by selecting a policy with a logical inductor that’s run a small amount of time. The policy specifies how to make use of knowledge from a logical inductor which is run longer. This addresses the difficulties which seem to block logically updateless decision theory in a fairly direct manner. On the other hand, it doesn’t seem to hold much promise for the kind of insights which we would want from a real solution.

 Catastrophe Mitigation Using DRL (Appendices) discussion post by Vadim Kosoy 154 days ago | discuss
 Where does ADT Go Wrong? discussion post by Abram Demski 159 days ago | Jack Gallagher and Jessica Taylor like this | 1 comment
Catastrophe Mitigation Using DRL

Previously we derived a regret bound for DRL which assumed the advisor is “locally sane.” Such an advisor can only take actions that don’t lose any value in the long term. In particular, if the environment contains a latent catastrophe that manifests with a certain rate (such as the possibility of an UFAI), a locally sane advisor has to take the optimal course of action to mitigate it, since every delay yields a positive probability of the catastrophe manifesting and leading to permanent loss of value. This state of affairs is unsatisfactory, since we would like to have performance guarantees for an AI that can mitigate catastrophes that the human operator cannot mitigate on their own. To address this problem, we introduce a new form of DRL where in every hypothetical environment the set of uncorrupted states is divided into “dangerous” (impending catastrophe) and “safe” (catastrophe was mitigated). The advisor is then only required to be locally sane in safe states, whereas in dangerous states certain “leaking” of long-term value is allowed. We derive a regret bound in this setting as a function of the time discount factor, the expected value of catastrophe mitigation time for the optimal policy, and the “value leak” rate (i.e. essentially the rate of catastrophe occurrence). The form of this regret bound implies that in certain asymptotic regimes, the agent attains near-optimal expected utility (and in particular mitigates the catastrophe with probability close to 1), whereas the advisor on its own fails to mitigate the catastrophe with probability close to 1.

Older

NEW DISCUSSION POSTS

I think that in that case,
 by Alex Appel on Smoking Lesion Steelman | 1 like

 by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
 by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
 by Jessica Taylor on Musings on Exploration | 0 likes

 by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
 by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
 by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

If you drop the
 by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
 by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
 by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
 by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
 by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
 by Sam Eisenstat on An Untrollable Mathematician | 1 like