1.Logical Inductor Tiling and Why it's Hard
post by Alex Appel 15 days ago | Sam Eisenstat and Abram Demski like this | discuss

(Tiling result due to Sam, exposition of obstacles due to me)

 2. A Loophole for Self-Applicative Soundness discussion post by Alex Appel 17 days ago | Abram Demski likes this | 4 comments
 3. Doubts about Updatelessness discussion post by Alex Appel 49 days ago | Abram Demski likes this | 2 comments
 4. Resource-Limited Reflective Oracles discussion post by Alex Appel 71 days ago | Sam Eisenstat, Abram Demski and Jessica Taylor like this | discuss
 5. No Constant Distribution Can be a Logical Inductor discussion post by Alex Appel 74 days ago | Sam Eisenstat, Abram Demski, Jessica Taylor and Stuart Armstrong like this | 1 comment
6.Quantilal control for finite MDPs
post by Vadim Kosoy 81 days ago | Ryan Carey, Alex Appel and Abram Demski like this | discuss

We introduce a variant of the concept of a “quantilizer” for the setting of choosing a policy for a finite Markov decision process (MDP), where the generic unknown cost is replaced by an unknown penalty term in the reward function. This is essentially a generalization of quantilization in repeated games with a cost independence assumption. We show that the “quantilal” policy shares some properties with the ordinary optimal policy, namely that (i) it can always be chosen to be Markov (ii) it can be chosen to be stationary when time discount is geometric (iii) the “quantilum” value of an MDP with geometric time discount is a continuous piecewise rational function of the parameters, and it converges when the discount parameter $$\lambda$$ approaches 1. Finally, we demonstrate a polynomial-time algorithm for computing the quantilal policy, showing that quantilization is not qualitatively harder than ordinary optimization.

7.Distributed Cooperation
post by Alex Appel 95 days ago | Abram Demski and Scott Garrabrant like this | 2 comments

Reflective oracles can be approximated by computing Nash equilibria. But is there some procedure that produces a Pareto-optimal equilibrium in a game, aka, a point produced by a Cooperative oracle? It turns out there is. There are some interesting philosophical aspects to it, which will be typed up in the next post.

The result is not original to me, it’s been floating around MIRI for a while. I think Scott, Sam, and Abram worked on it, but there might have been others. All I did was formalize it a bit, and generalize from the 2-player 2-move case to the n-player n-move case. With the formalism here, it’s a bit hard to intuitively understand what’s going on, so I’ll indicate where to visualize an appropriate 3-dimensional object.

 8. Passing Troll Bridge discussion post by Alex Appel 117 days ago | Abram Demski likes this | discuss
 9. Strategy Nonconvexity Induced by a Choice of Potential Oracles discussion post by Alex Appel 145 days ago | Abram Demski likes this | discuss
10.Logical counterfactuals and differential privacy
post by Nisan Stiennon 150 days ago | Abram Demski and Scott Garrabrant like this | 1 comment

This idea was informed by discussions with Abram Demski, Scott Garrabrant, and the MIRIchi discussion group.

11.Reflective oracles as a solution to the converse Lawvere problem
post by Sam Eisenstat 216 days ago | Alex Mennen, Alex Appel, Vadim Kosoy, Abram Demski, Jessica Taylor, Scott Garrabrant and Vladimir Slepnev like this | discuss

1 Introduction

Before the work of Turing, one could justifiably be skeptical of the idea of a universal computable function. After all, there is no computable function $$f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}$$ such that for all computable $$g\colon\mathbb{N}\to\mathbb{N}$$ there is some index $$i_{g}$$ such that $$f\left(i_{g},n\right)=g\left(n\right)$$ for all $$n$$. If there were, we could pick $$g\left(n\right)=f\left(n,n\right)+1$$, and then $g\left(i_{g}\right)=f\left(i_{g},i_{g}\right)+1=g\left(i_{g}\right)+1,$ a contradiction. Of course, universal Turing machines don’t run into this obstacle; as Gödel put it, “By a kind of miracle it is not necessary to distinguish orders, and the diagonal procedure does not lead outside the defined notion.” [1]

The miracle of Turing machines is that there is a partial computable function $$f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}$$ such that for all partial computable $$g\colon\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}$$ there is an index $$i$$ such that $$f\left(i,n\right)=g\left(n\right)$$ for all $$n$$. Here, we look at a different “miracle”, that of reflective oracles [2,3]. As we will see in Theorem 1, given a reflective oracle $$O$$, there is a (stochastic) $$O$$-computable function $$f\colon\mathbb{N}\times\mathbb{N}\to\left\{ 0,1\right\}$$ such that for any (stochastic) $$O$$-computable function $$g\colon\mathbb{N}\to\left\{ 0,1\right\}$$, there is some index $$i$$ such that $$f\left(i,n\right)$$ and $$g\left(n\right)$$ have the same distribution for all $$n$$. This existence theorem seems to skirt even closer to the contradiction mentioned above.

We use this idea to answer “in spirit” the converse Lawvere problem posed in [4]. These methods also generalize to prove a similar analogue of the ubiquitous converse Lawvere problem from [5]. The original questions, stated in terms of topology, remain open, but I find that the model proposed here, using computability, is equally satisfying from the point of view of studying reflective agents. Those references can be consulted for more motivation on these problems from the perspective of reflective agency.

Section 3 proves the main lemma, and proves the converse Lawvere theorem for reflective oracles. In section 4, we use that to give a (circular) proof of Brouwer’s fixed point theorem, as mentioned in [4]. In section 5, we prove the ubiquitous converse Lawvere theorem for reflective oracles.

 12. Should I post technical ideas here or on LessWrong 2.0? discussion post by Stuart Armstrong 259 days ago | Abram Demski likes this | 3 comments
13.Resolving human inconsistency in a simple model
post by Stuart Armstrong 259 days ago | Abram Demski likes this | 1 comment

A putative new idea for AI control; index here.

This post will present a simple model of an inconsistent human, and ponder how to resolve their inconsistency.

Let $$\bf{H}$$ be our agent, in a turn-based world. Let $$R^l$$ and $$R^s$$ be two simple reward functions at each turn. The reward $$R^l$$ is thought of as being a ‘long-term’ reward, while $$R^s$$ is a short-term one.

 14. Metamathematics and probability link by Alex Mennen 272 days ago | Abram Demski likes this | discuss
15.The Doomsday argument in anthropic decision theory
post by Stuart Armstrong 293 days ago | Abram Demski likes this | discuss

In Anthropic Decision Theory (ADT), behaviours that resemble the Self Sampling Assumption (SSA) derive from average utilitarian preferences (and from certain specific selfish preferences).

However, SSA implies the doomsday argument, and, to date, I hadn’t found a good way to express the doomsday argument within ADT.

This post will remedy that hole, by showing how there is a natural doomsday-like behaviour for average utilitarian agents within ADT.

16.Density Zero Exploration
post by Alex Mennen 308 days ago | Abram Demski, Paul Christiano and Scott Garrabrant like this | discuss

The idea here is due to Scott Garrabrant. All I did was write it.

17.Logical Induction with incomputable sequences
post by Alex Mennen 308 days ago | Abram Demski, Paul Christiano and Scott Garrabrant like this | discuss

In the definition of a logical inductor, the deductive process is required to be computable. This, of course, does not allow the logical inductor to use randomness, or predict uncomputable sequences. The way traders were defined in the logical induction paper, this was necessary, because the traders were not given access to the output of the deductive process.

18.Conditioning on Conditionals
post by Scott Garrabrant 308 days ago | Abram Demski likes this | discuss

(From conversations with Sam, Abram, Tsvi, Marcello, and Ashwin Sah) A basic EDT agent starts with a prior, updates on a bunch of observations, and then has an choice between various actions. It conditions on each possible action it could take, and takes the action for which this conditional leads the the highest expected utility. An updateless (but non-policy selection) EDT agent has a problem here. It wants to not update on the observations, but it wants to condition on the fact that its takes a specific action given its observations. It is not obvious what this conditional should look like. In this post, I agrue for a particular way to interpret this conditioning on this conditional (of taking a specific action given a specific observation).

19.The Three Levels of Goodhart's Curse
post by Scott Garrabrant 308 days ago | Vadim Kosoy, Abram Demski and Paul Christiano like this | 2 comments

Note: I now consider this post deprecated and instead recommend this updated version.

Goodhart’s curse is a neologism by Eliezer Yudkowsky stating that “neutrally optimizing a proxy measure U of V seeks out upward divergence of U from V.” It is related to many near by concepts (e.g. the tails come apart, winner’s curse, optimizer’s curse, regression to the mean, overfitting, edge instantiation, goodhart’s law). I claim that there are three main mechanisms through which Goodhart’s curse operates.

20.A cheating approach to the tiling agents problem
post by Vladimir Slepnev 355 days ago | Alex Mennen, Vadim Kosoy and Abram Demski like this | 2 comments

(This post resulted from a conversation with Wei Dai.)

Formalizing the tiling agents problem is very delicate. In this post I’ll show a toy problem and a solution to it, which arguably meets all the desiderata stated before, but only by cheating in a new and unusual way.

Here’s a summary of the toy problem: we ask an agent to solve a difficult math question and also design a successor agent. Then the successor must solve another math question and design its own successor, and so on. The questions get harder each time, so they can’t all be solved in advance, and each of them requires believing in Peano arithmetic (PA). This goes on for a fixed number of rounds, and the final reward is the number of correct answers.

Moreover, we will demand that the agent must handle both subtasks (solving the math question and designing the successor) using the same logic. Finally, we will demand that the agent be able to reproduce itself on each round, not just design a custom-made successor that solves the math question with PA and reproduces itself by quining.

21.Loebian cooperation in the tiling agents problem
post by Vladimir Slepnev 361 days ago | Alex Mennen, Vadim Kosoy, Abram Demski, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The tiling agents problem is about formalizing how AIs can create successor AIs that are at least as smart. Here’s a toy model I came up with, which is similar to Benya’s old model but simpler. A computer program X is asked one of two questions:

• Would you like some chocolate?

• Here’s the source code of another program Y. Do you accept it as your successor?

 22. Futarchy, Xrisks, and near misses discussion post by Stuart Armstrong 384 days ago | Abram Demski likes this | discuss
23.Why I am not currently working on the AAMLS agenda
post by Jessica Taylor 404 days ago | Ryan Carey, Marcello Herreshoff, Sam Eisenstat, Abram Demski, Daniel Dewey, Scott Garrabrant and Stuart Armstrong like this | 2 comments

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.

24.Cooperative Oracles: Introduction
post by Scott Garrabrant 404 days ago | Abram Demski, Jessica Taylor and Patrick LaVictoire like this | 1 comment

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

1. Introduction
2. Nonexploited Bargaining
3. Stratified Pareto Optima and Almost Stratified Pareto Optima
4. Definition and Existence Proof
5. Alternate Notions of Dependency
post by Tom Everitt 410 days ago | Abram Demski and Stuart Armstrong like this | 3 comments

Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.

Older

NEW DISCUSSION POSTS

I found an improved version
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
 by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
 by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
 by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
 by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
 by Alex Appel on Smoking Lesion Steelman | 1 like

 by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
 by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
 by Jessica Taylor on Musings on Exploration | 0 likes