 1.  An Untrollable Mathematician   post by Abram Demski 538 days ago  Alex Appel, Sam Eisenstat, Vanessa Kosoy, Jack Gallagher, Jessica Taylor, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this  1 comment  
 Followup to All Mathematicians are Trollable.
It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the nonconvergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild nonconvergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.
 
   3.  Being legible to other agents by committing to using weaker reasoning systems   post by Alex Mennen 590 days ago  Stuart Armstrong and Vladimir Slepnev like this  1 comment  
 Suppose that an agent \(A_{1}\) reasons in a sound theory \(T_{1}\), and an agent \(A_{2}\) reasons in a theory \(T_{2}\), such that \(T_{1}\) proves that \(T_{2}\) is sound. Now suppose \(A_{1}\) is trying to reason in a way that is legible to \(A_{2}\), in the sense that \(A_{2}\) can rely on \(A_{1}\) to reach correct conclusions. One way of doing this is for \(A_{1}\) to restrict itself to some weaker theory \(T_{3}\), which \(T_{2}\) proves is sound, for the purposes of any reasoning that it wants to be legible to \(A_{2}\). Of course, in order for this to work, not only would \(A_{1}\) have to restrict itself to using \(T_{3}\), but \(A_{2}\) would to trust that \(A_{1}\) had done so. A plausible way for that to happen is for \(A_{1}\) to reach the decision quickly enough that \(A_{2}\) can simulate \(A_{1}\) making the decision to restrict itself to using \(T_{3}\).  
  4.  Policy Selection Solves Most Problems   post by Abram Demski 594 days ago  Alex Appel and Vladimir Slepnev like this  4 comments  
 It seems like logically updateless reasoning is what we would want in order to solve many decisiontheory problems. I show that several of the problems which seem to require updateless reasoning can instead be solved by selecting a policy with a logical inductor that’s run a small amount of time. The policy specifies how to make use of knowledge from a logical inductor which is run longer. This addresses the difficulties which seem to block logically updateless decision theory in a fairly direct manner. On the other hand, it doesn’t seem to hold much promise for the kind of insights which we would want from a real solution.
 
  5.  Reflective oracles as a solution to the converse Lawvere problem   post by Sam Eisenstat 607 days ago  Alex Mennen, Alex Appel, Vanessa Kosoy, Abram Demski, Jessica Taylor, Scott Garrabrant and Vladimir Slepnev like this  discuss  
 1 Introduction
Before the work of Turing, one could justifiably be skeptical of the idea of a universal computable function. After all, there is no computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\) such that for all computable \(g\colon\mathbb{N}\to\mathbb{N}\) there is some index \(i_{g}\) such that \(f\left(i_{g},n\right)=g\left(n\right)\) for all \(n\). If there were, we could pick \(g\left(n\right)=f\left(n,n\right)+1\), and then \[g\left(i_{g}\right)=f\left(i_{g},i_{g}\right)+1=g\left(i_{g}\right)+1,\] a contradiction. Of course, universal Turing machines don’t run into this obstacle; as Gödel put it, “By a kind of miracle it is not necessary to distinguish orders, and the diagonal procedure does not lead outside the defined notion.” [1]
The miracle of Turing machines is that there is a partial computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) such that for all partial computable \(g\colon\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) there is an index \(i\) such that \(f\left(i,n\right)=g\left(n\right)\) for all \(n\). Here, we look at a different “miracle”, that of reflective oracles [2,3]. As we will see in Theorem 1, given a reflective oracle \(O\), there is a (stochastic) \(O\)computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\left\{ 0,1\right\}\) such that for any (stochastic) \(O\)computable function \(g\colon\mathbb{N}\to\left\{ 0,1\right\}\), there is some index \(i\) such that \(f\left(i,n\right)\) and \(g\left(n\right)\) have the same distribution for all \(n\). This existence theorem seems to skirt even closer to the contradiction mentioned above.
We use this idea to answer “in spirit” the converse Lawvere problem posed in [4]. These methods also generalize to prove a similar analogue of the ubiquitous converse Lawvere problem from [5]. The original questions, stated in terms of topology, remain open, but I find that the model proposed here, using computability, is equally satisfying from the point of view of studying reflective agents. Those references can be consulted for more motivation on these problems from the perspective of reflective agency.
Section 3 proves the main lemma, and proves the converse Lawvere theorem for reflective oracles. In section 4, we use that to give a (circular) proof of Brouwer’s fixed point theorem, as mentioned in [4]. In section 5, we prove the ubiquitous converse Lawvere theorem for reflective oracles.
 
    8.  All Mathematicians are Trollable: Divergence of Naturalistic Logical Updates   post by Abram Demski 1168 days ago  Jessica Taylor, Patrick LaVictoire, Scott Garrabrant and Vladimir Slepnev like this  1 comment  
 The post on naturalistic logical updates left open the question of whether the probability distribution converges as we condition on more logical information. Here, I show that this cannot always be the case: for any computable probability distribution with naturalistic logical updates, we can show it proofs in an order which will prevent convergence. In fact, at any time, we can drive the probability of \(x\) up or down as much as we like, for a wide variety of sentences \(x\).
As an aid to intuition, I describe the theorem informally as “all mathematicians are trollable”. I was once told that there was an “all mathematicians go to Bayesian hell” theorem, based on the fact that a computable probability distribution must suffer arbitrarily large logloss when trying to model mathematics. The idea here is similar. We are representing the belief state of a mathematician with a computable probability distribution, and trying to manipulate that belief state by proving carefullyselected theorems to the mathematician.
 
   10.  Welcome!   post by Benja Fallenstein 1715 days ago  Abram Demski, Luke Muehlhauser, Nate Soares, Patrick LaVictoire and Vladimir Slepnev like this  2 comments  
 Welcome to MIRI’s asyetunnamed new forum for technical Friendly AI research—whether or not it’s associated with MIRI! We want to provide a place for posting and discussing work on topics like the Löbian obstacle, updateless decision theory, and corrigibility. The LessWrong group blog has hosted some discussions on topics like these in the past, but Friendly AI research has never been entirely on topic there. By creating a forum focused entirely on research, we hope to make it easier to find, and have, interesting discussions of interesting new work.
The forum is worldreadable, but posting and commenting will be inviteonly. We have some ideas for a review process for submissions by nonmembers, but contributions from nonmembers will probably not be accepted until early 2015 at the earliest.
 
 

 NEW POSTSNEW DISCUSSION POSTS[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables  0 likes 
There should be a chat icon
Apparently "You must be
There is a replacement for
Regarding the physical
by Vanessa Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
I think that we should expect
by Vanessa Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
I think I understand your
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
This seems like a hack. The
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
After thinking some more,
by Vanessa Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
Yes, I think that we're
by Vanessa Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
My intuition is that it must
by Vanessa Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
To first approximation, a
by Vanessa Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
Actually, I *am* including
by Vanessa Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds  0 likes 
> Well, we could give up on
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
