  2.  The LearningTheoretic AI Alignment Research Agenda   post by Vadim Kosoy 84 days ago  Alex Appel and Jessica Taylor like this  36 comments  
 In this essay I will try to explain the overall structure and motivation of my AI alignment research agenda. The discussion is informal and no new theorems are proved here. The main features of my research agenda, as I explain them here, are
Viewing AI alignment theory as part of a general abstract theory of intelligence
Using desiderata and axiomatic definitions as starting points, rather than specific algorithms and constructions
Formulating alignment problems in the language of learning theory
Evaluating solutions by their formal mathematical properties, ultimately aiming at a quantitative theory of risk assessment
Relying on the mathematical intuition derived from learning theory to pave the way to solving philosophical questions
 
  3.  Logical Inductors Converge to Correlated Equilibria (Kinda)   post by Alex Appel 120 days ago  Sam Eisenstat and Jessica Taylor like this  1 comment  
 Logical inductors of “similar strength”, playing against each other in a repeated game, will converge to correlated equilibria of the oneshot game, for the same reason that players that react to the past plays of their opponent converge to correlated equilibria. In fact, this proof is essentially just the proof from Calibrated Learning and Correlated Equilibrium by Forster (1997), adapted to a logical inductor setting.
 
    6.  An Untrollable Mathematician   post by Abram Demski 242 days ago  Alex Appel, Sam Eisenstat, Vadim Kosoy, Jack Gallagher, Jessica Taylor, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this  1 comment  
 Followup to All Mathematicians are Trollable.
It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the nonconvergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild nonconvergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.
 
   8.  Reflective oracles as a solution to the converse Lawvere problem   post by Sam Eisenstat 311 days ago  Alex Mennen, Alex Appel, Vadim Kosoy, Abram Demski, Jessica Taylor, Scott Garrabrant and Vladimir Slepnev like this  discuss  
 1 Introduction
Before the work of Turing, one could justifiably be skeptical of the idea of a universal computable function. After all, there is no computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\) such that for all computable \(g\colon\mathbb{N}\to\mathbb{N}\) there is some index \(i_{g}\) such that \(f\left(i_{g},n\right)=g\left(n\right)\) for all \(n\). If there were, we could pick \(g\left(n\right)=f\left(n,n\right)+1\), and then \[g\left(i_{g}\right)=f\left(i_{g},i_{g}\right)+1=g\left(i_{g}\right)+1,\] a contradiction. Of course, universal Turing machines don’t run into this obstacle; as Gödel put it, “By a kind of miracle it is not necessary to distinguish orders, and the diagonal procedure does not lead outside the defined notion.” [1]
The miracle of Turing machines is that there is a partial computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) such that for all partial computable \(g\colon\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) there is an index \(i\) such that \(f\left(i,n\right)=g\left(n\right)\) for all \(n\). Here, we look at a different “miracle”, that of reflective oracles [2,3]. As we will see in Theorem 1, given a reflective oracle \(O\), there is a (stochastic) \(O\)computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\left\{ 0,1\right\}\) such that for any (stochastic) \(O\)computable function \(g\colon\mathbb{N}\to\left\{ 0,1\right\}\), there is some index \(i\) such that \(f\left(i,n\right)\) and \(g\left(n\right)\) have the same distribution for all \(n\). This existence theorem seems to skirt even closer to the contradiction mentioned above.
We use this idea to answer “in spirit” the converse Lawvere problem posed in [4]. These methods also generalize to prove a similar analogue of the ubiquitous converse Lawvere problem from [5]. The original questions, stated in terms of topology, remain open, but I find that the model proposed here, using computability, is equally satisfying from the point of view of studying reflective agents. Those references can be consulted for more motivation on these problems from the perspective of reflective agency.
Section 3 proves the main lemma, and proves the converse Lawvere theorem for reflective oracles. In section 4, we use that to give a (circular) proof of Brouwer’s fixed point theorem, as mentioned in [4]. In section 5, we prove the ubiquitous converse Lawvere theorem for reflective oracles.
 
       14.  The Ubiquitous Converse Lawvere Problem   post by Scott Garrabrant 530 days ago  Marcello Herreshoff, Sam Eisenstat, Jessica Taylor and Patrick LaVictoire like this  discuss  
 In this post, I give a stronger version of the open question presented here, and give a motivation for this stronger property. This came out of conversations with Marcello, Sam, and Tsvi.
Definition: A continuous function \(f:X\rightarrow Y\) is called ubiquitous if for every continuous function \(g:X\rightarrow Y\), there exists a point \(x\in X\) such that \(f(x)=g(x)\).
Open Problem: Does there exist a topological space \(X\) with a ubiquitous function \(f:X\rightarrow[0,1]^X\)?
 
            
Older 
 NEW POSTSNEW DISCUSSION POSTSThere should be a chat icon
Apparently "You must be
There is a replacement for
Regarding the physical
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
I think that we should expect
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
I think I understand your
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
This seems like a hack. The
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
After thinking some more,
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
Yes, I think that we're
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
My intuition is that it must
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
To first approximation, a
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
Actually, I *am* including
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds  0 likes 
> Well, we could give up on
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
> For another thing, consider
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
