1. Goodhart Taxonomy link by Scott Garrabrant 344 days ago | discuss
2.Logical Updatelessness as a Robust Delegation Problem
post by Scott Garrabrant 408 days ago | discuss

(Cross-posted an Less Wrong)

3.Hyperreal Brouwer
post by Scott Garrabrant 429 days ago | Vadim Kosoy and Stuart Armstrong like this | 2 comments

This post explains how to view Kakutani’s fixed point theorem as a special case of Brouwer’s fixed point theorem with hyperreal numbers. This post is just math intuitions, but I found them useful in thinking about Kakutani’s fixed point theorem and many things in agent foundations. This came out of conversations with Sam Eisenstat.

4.Conditioning on Conditionals
post by Scott Garrabrant 479 days ago | Abram Demski likes this | discuss

(From conversations with Sam, Abram, Tsvi, Marcello, and Ashwin Sah) A basic EDT agent starts with a prior, updates on a bunch of observations, and then has an choice between various actions. It conditions on each possible action it could take, and takes the action for which this conditional leads the the highest expected utility. An updateless (but non-policy selection) EDT agent has a problem here. It wants to not update on the observations, but it wants to condition on the fact that its takes a specific action given its observations. It is not obvious what this conditional should look like. In this post, I agrue for a particular way to interpret this conditioning on this conditional (of taking a specific action given a specific observation).

5.The Three Levels of Goodhart's Curse
post by Scott Garrabrant 480 days ago | Vadim Kosoy, Abram Demski and Paul Christiano like this | 2 comments

Note: I now consider this post deprecated and instead recommend this updated version.

Goodhart’s curse is a neologism by Eliezer Yudkowsky stating that “neutrally optimizing a proxy measure U of V seeks out upward divergence of U from V.” It is related to many near by concepts (e.g. the tails come apart, winner’s curse, optimizer’s curse, regression to the mean, overfitting, edge instantiation, goodhart’s law). I claim that there are three main mechanisms through which Goodhart’s curse operates.

6.Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima
post by Scott Garrabrant 554 days ago | Vadim Kosoy, Patrick LaVictoire and Stuart Armstrong like this | 8 comments

In this post, we generalize the notions in Cooperative Oracles: Nonexploited Bargaining to deal with the possibility of introducing extra agents that have no control but have preferences. We further generalize this to infinitely many agents. (Part of the series started here.)

7.Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 576 days ago | Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

8.Cooperative Oracles: Introduction
post by Scott Garrabrant 576 days ago | Abram Demski, Jessica Taylor and Patrick LaVictoire like this | 1 comment

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

1. Introduction
2. Nonexploited Bargaining
3. Stratified Pareto Optima and Almost Stratified Pareto Optima
4. Definition and Existence Proof
5. Alternate Notions of Dependency
9.Two Major Obstacles for Logical Inductor Decision Theory
post by Scott Garrabrant 601 days ago | Alex Mennen, Sam Eisenstat, Abram Demski, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | 3 comments

In this post, I describe two major obstacles for logical inductor decision theory: untaken actions are not observable and no updatelessness for computations. I will concretely describe both of these problems in a logical inductor framework, but I believe that both issues are general enough to transcend that framework.

10.The Ubiquitous Converse Lawvere Problem
post by Scott Garrabrant 608 days ago | Marcello Herreshoff, Sam Eisenstat, Jessica Taylor and Patrick LaVictoire like this | discuss

In this post, I give a stronger version of the open question presented here, and give a motivation for this stronger property. This came out of conversations with Marcello, Sam, and Tsvi.

Definition: A continuous function $$f:X\rightarrow Y$$ is called ubiquitous if for every continuous function $$g:X\rightarrow Y$$, there exists a point $$x\in X$$ such that $$f(x)=g(x)$$.

Open Problem: Does there exist a topological space $$X$$ with a ubiquitous function $$f:X\rightarrow[0,1]^X$$?

11.Formal Open Problem in Decision Theory
post by Scott Garrabrant 618 days ago | Marcello Herreshoff, Sam Eisenstat, Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 13 comments

In this post, I present a new formal open problem. A positive answer would be valuable for decision theory research. A negative answer would be helpful, mostly for figuring out what is the closest we can get to a positive answer. I also give some motivation for the problem, and some partial progress.

Open Problem: Does there exist a topological space $$X$$ (in some convenient category of topological spaces) such that there exists a continuous surjection from $$X$$ to the space $$[0,1]^X$$ (of continuous functions from $$X$$ to $$[0,1]$$)?

12.Prediction Based Robust Cooperation
post by Scott Garrabrant 655 days ago | Patrick LaVictoire likes this | 1 comment

In this post, We present a new approach to robust cooperation, as an alternative to the “modal combat” framework. This post is very hand-waivey. If someone would like to work on making it better, let me know.

13.Entangled Equilibria and the Twin Prisoners' Dilemma
post by Scott Garrabrant 666 days ago | Vadim Kosoy and Patrick LaVictoire like this | 2 comments

In this post, I present a generalization of Nash equilibria to non-CDT agents. I will use this formulation to model mutual cooperation in a twin prisoners’ dilemma, caused by the belief that the other player is similar to you, and not by mutual prediction. (This post came mostly out of a conversation with Sam Eisenstat, as well as contributions from Tsvi Benson-Tilsen and Jessica Taylor)

14.postCDT: Decision Theory using post-selected Bayes nets
post by Scott Garrabrant 763 days ago | Ryan Carey, Patrick LaVictoire and Paul Christiano like this | 1 comment

The purpose of this post is to document a minor idea about a new type of decision theory that works using a Bayes net. This is not a concrete proposal, since I will give no insight on which Bayes net to use. I am not that excited by this proposal, but think it is worth writing up anyway.

15.Updatelessness and Son of X
post by Scott Garrabrant 765 days ago | Ryan Carey, Abram Demski and Jessica Taylor like this | 8 comments

The purpose of this post is to discuss the relationship between the concepts of Updatelessness and the “Son of” operator.

 16. A failed attempt at Updatelessness using Universal Inductors discussion post by Scott Garrabrant 766 days ago | Jessica Taylor and Patrick LaVictoire like this | 1 comment
 17. Transitive negotiations with counterfactual agents discussion post by Scott Garrabrant 779 days ago | Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss
18.The set of Logical Inductors is not Convex
post by Scott Garrabrant 803 days ago | Sam Eisenstat, Abram Demski and Patrick LaVictoire like this | 3 comments

Sam Eisenstat asked the following interesting question: Given two logical inductors over the same deductive process, is every (rational) convex combination of them also a logical inductor? Surprisingly, the answer is no! Here is my counterexample.

19.Logical Inductors contain Logical Inductors over other complexity classes
post by Scott Garrabrant 804 days ago | Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss

In the Logical Induction paper, we give a definition of logical inductors over polynomial time traders. It is clear from our definition that our use of polynomial time is rather arbitrary, and we could define e.g. an exponential time logical inductor. However, it may be less clear that actually logical inductors over one complexity class contain logical inductors over other complexity classes within them.

20.Logical Inductors that trust their limits
post by Scott Garrabrant 809 days ago | Jack Gallagher, Jessica Taylor and Patrick LaVictoire like this | 2 comments

Here is another open question related to Logical Inductors. I have not thought about it very long, so it might be easy.

Does there exist a logical inductor $$\{\mathbb P_n\}$$ over PA such that for all $$\phi$$:

1. PA proves that $$\mathbb P_\infty(\phi)$$ exists and is in $$[0,1]$$, and

2. $$\mathbb{E}_n(\mathbb{P}_\infty(\phi))\eqsim_n\mathbb{P}_n(\phi)$$?

21.Universal Inductors
post by Scott Garrabrant 816 days ago | Sam Eisenstat, Jack Gallagher, Benja Fallenstein, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss

Now that the Logical Induction paper is out, I am directing my attention towards decision theory. The approach I currently think will be most fruitful is attempting to make a logically updateless version of Wei Dai’s Updateless Decision Theory. Abram Demski has posted on here about this, but I think Logical Induction provides a new angle with which we can attack the problem. This post will present an alternate way of viewing Logical Induction which I think will be especially helpful for building a logical UDT. (The Logical Induction paper is a prerequisite for this post.)

 22. The many counterfactuals of counterfactual mugging discussion post by Scott Garrabrant 971 days ago | Ryan Carey and Tsvi Benson-Tilsen like this | 2 comments
23.Another Concise Open Problem
post by Scott Garrabrant 1045 days ago | Nate Soares and Patrick LaVictoire like this | 1 comment

Given the success of the last open problem I posted, (Janos Kramer disproved my conjecture) I decided to post another one. Again, I will cut off the philosophy, and just give the math. The reason I want this result is to solve the problem described here.

24.Second Failure of Inductive Learning: The Entangled Benford Test
post by Scott Garrabrant 1046 days ago | Nate Soares and Patrick LaVictoire like this | discuss

This is a followup to the concrete failure of the Solomonoff Induction inspired approach to inductive learning with a delay in feedback. This second failure is not one that we have an algorithm to deal with yet.

 25. Concise Open Problem in Logical Uncertainty discussion post by Scott Garrabrant 1093 days ago | Jessica Taylor and Patrick LaVictoire like this | 7 comments
Older

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes