 1.  Distributed Cooperation   post by Alex Appel 6 days ago  Abram Demski and Scott Garrabrant like this  2 comments  
 Reflective oracles can be approximated by computing Nash equilibria. But is there some procedure that produces a Paretooptimal equilibrium in a game, aka, a point produced by a Cooperative oracle? It turns out there is. There are some interesting philosophical aspects to it, which will be typed up in the next post.
The result is not original to me, it’s been floating around MIRI for a while. I think Scott, Sam, and Abram worked on it, but there might have been others. All I did was formalize it a bit, and generalize from the 2player 2move case to the nplayer nmove case. With the formalism here, it’s a bit hard to intuitively understand what’s going on, so I’ll indicate where to visualize an appropriate 3dimensional object.
 
     5.  Reflective oracles as a solution to the converse Lawvere problem   post by Sam Eisenstat 128 days ago  Alex Mennen, Alex Appel, Vadim Kosoy, Abram Demski, Jessica Taylor, Scott Garrabrant and Vladimir Slepnev like this  discuss  
 1 Introduction
Before the work of Turing, one could justifiably be skeptical of the idea of a universal computable function. After all, there is no computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\) such that for all computable \(g\colon\mathbb{N}\to\mathbb{N}\) there is some index \(i_{g}\) such that \(f\left(i_{g},n\right)=g\left(n\right)\) for all \(n\). If there were, we could pick \(g\left(n\right)=f\left(n,n\right)+1\), and then \[g\left(i_{g}\right)=f\left(i_{g},i_{g}\right)+1=g\left(i_{g}\right)+1,\] a contradiction. Of course, universal Turing machines don’t run into this obstacle; as Gödel put it, “By a kind of miracle it is not necessary to distinguish orders, and the diagonal procedure does not lead outside the defined notion.” [1]
The miracle of Turing machines is that there is a partial computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) such that for all partial computable \(g\colon\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) there is an index \(i\) such that \(f\left(i,n\right)=g\left(n\right)\) for all \(n\). Here, we look at a different “miracle”, that of reflective oracles [2,3]. As we will see in Theorem 1, given a reflective oracle \(O\), there is a (stochastic) \(O\)computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\left\{ 0,1\right\}\) such that for any (stochastic) \(O\)computable function \(g\colon\mathbb{N}\to\left\{ 0,1\right\}\), there is some index \(i\) such that \(f\left(i,n\right)\) and \(g\left(n\right)\) have the same distribution for all \(n\). This existence theorem seems to skirt even closer to the contradiction mentioned above.
We use this idea to answer “in spirit” the converse Lawvere problem posed in [4]. These methods also generalize to prove a similar analogue of the ubiquitous converse Lawvere problem from [5]. The original questions, stated in terms of topology, remain open, but I find that the model proposed here, using computability, is equally satisfying from the point of view of studying reflective agents. Those references can be consulted for more motivation on these problems from the perspective of reflective agency.
Section 3 proves the main lemma, and proves the converse Lawvere theorem for reflective oracles. In section 4, we use that to give a (circular) proof of Brouwer’s fixed point theorem, as mentioned in [4]. In section 5, we prove the ubiquitous converse Lawvere theorem for reflective oracles.
 
   7.  Resolving human inconsistency in a simple model   post by Stuart Armstrong 170 days ago  Abram Demski likes this  1 comment  
 A putative new idea for AI control; index here.
This post will present a simple model of an inconsistent human, and ponder how to resolve their inconsistency.
Let \(\bf{H}\) be our agent, in a turnbased world. Let \(R^l\) and \(R^s\) be two simple reward functions at each turn. The reward \(R^l\) is thought of as being a ‘longterm’ reward, while \(R^s\) is a shortterm one.
 
      12.  Conditioning on Conditionals   post by Scott Garrabrant 219 days ago  Abram Demski likes this  discuss  
 (From conversations with Sam, Abram, Tsvi, Marcello, and Ashwin Sah) A basic EDT agent starts with a prior, updates on a bunch of observations, and then has an choice between various actions. It conditions on each possible action it could take, and takes the action for which this conditional leads the the highest expected utility. An updateless (but nonpolicy selection) EDT agent has a problem here. It wants to not update on the observations, but it wants to condition on the fact that its takes a specific action given its observations. It is not obvious what this conditional should look like. In this post, I agrue for a particular way to interpret this conditioning on this conditional (of taking a specific action given a specific observation).  
   14.  A cheating approach to the tiling agents problem   post by Vladimir Slepnev 267 days ago  Alex Mennen, Vadim Kosoy and Abram Demski like this  2 comments  
 (This post resulted from a conversation with Wei Dai.)
Formalizing the tiling agents problem is very delicate. In this post I’ll show a toy problem and a solution to it, which arguably meets all the desiderata stated before, but only by cheating in a new and unusual way.
Here’s a summary of the toy problem: we ask an agent to solve a difficult math question and also design a successor agent. Then the successor must solve another math question and design its own successor, and so on. The questions get harder each time, so they can’t all be solved in advance, and each of them requires believing in Peano arithmetic (PA). This goes on for a fixed number of rounds, and the final reward is the number of correct answers.
Moreover, we will demand that the agent must handle both subtasks (solving the math question and designing the successor) using the same logic. Finally, we will demand that the agent be able to reproduce itself on each round, not just design a custommade successor that solves the math question with PA and reproduces itself by quining.
 
      19.  CIRL Wireheading   post by Tom Everitt 321 days ago  Abram Demski and Stuart Armstrong like this  3 comments  
 Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.  
  20.  Infinite ethics comparisons   post by Stuart Armstrong 323 days ago  Abram Demski likes this  1 comment  
 Work done with Amanda Askell; the errors are mine.
It’s very difficult to compare utilities across worlds with infinite populations. For instance, it seems clear that world \(w_1\) is better than \(w_2\), if the number indicate the utilities of various agents:
 \(w_1 = 1,0,1,0,1,0,1,0,1,0, \ldots\)
 \(w_2 = 1,0,1,0,0,1,0,0,0,1, \ldots\)
 
    23.  Nearest unblocked strategy versus learning patches   post by Stuart Armstrong 393 days ago  Abram Demski likes this  9 comments  
 The nearest unblocked strategy problem (NUS) is the idea that if you program a restriction or a patch into an AI, then the AI will often be motivated to pick a strategy that is as close as possible to the banned strategy, very similar in form, and maybe just as dangerous.
For instance, if the AI is maximising a reward \(R\), and does some behaviour \(B_i\) that we don’t like, we can patch the AI’s algorithm with patch \(P_i\) (‘maximise \(R_0\) subject to these constraints…’), or modify \(R\) to \(R_i\) so that \(B_i\) doesn’t come up. I’ll focus more on the patching example, but the modified reward one is similar.
 
  24.  Thoughts on Quantilizers   post by Stuart Armstrong 420 days ago  Ryan Carey and Abram Demski like this  discuss  
 A putative new idea for AI control; index here.
This post will look at some of the properties of quantilizers, when they succeed and how they might fail.
Roughly speaking, let \(f\) be some true objective function that we want to maximise. We haven’t been able to specify it fully, so we have instead a proxy function \(g\). There is a cost function \(c=fg\) which measures how much \(g\) falls short of \(f\). Then a quantilizer will choose actions (or policies) radomly from the top \(n\%\) of actions available, ranking those actions according to \(g\).
 
  
Older 
 NEW POSTSNEW DISCUSSION POSTSIf you drop the
Cool! I'm happy to see this
Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po...  2 likes 
[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals  1 like 
Intermediate update:
The
by Alex Appel on Further Progress on a Bayesian Version of Logical ...  0 likes 
Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po...  2 likes 
This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy  0 likes 
I at first didn't understand
This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex  0 likes 
This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex  0 likes 
Nice writeup. Is oneboxing
Hi Alex!
The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning  0 likes 
A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning  1 like 
I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning  0 likes 
This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi...  0 likes 
