1.Loebian cooperation in the tiling agents problem
post by Vladimir Slepnev 752 days ago | Alex Mennen, Vanessa Kosoy, Abram Demski, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The tiling agents problem is about formalizing how AIs can create successor AIs that are at least as smart. Here’s a toy model I came up with, which is similar to Benya’s old model but simpler. A computer program X is asked one of two questions:

• Would you like some chocolate?

• Here’s the source code of another program Y. Do you accept it as your successor?

2.Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima
post by Scott Garrabrant 774 days ago | Vanessa Kosoy, Patrick LaVictoire and Stuart Armstrong like this | 8 comments

In this post, we generalize the notions in Cooperative Oracles: Nonexploited Bargaining to deal with the possibility of introducing extra agents that have no control but have preferences. We further generalize this to infinitely many agents. (Part of the series started here.)

3.Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 795 days ago | Vanessa Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

4.Cooperative Oracles: Introduction
post by Scott Garrabrant 795 days ago | Abram Demski, Jessica Taylor and Patrick LaVictoire like this | 1 comment

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

1. Introduction
2. Nonexploited Bargaining
3. Stratified Pareto Optima and Almost Stratified Pareto Optima
4. Definition and Existence Proof
5. Alternate Notions of Dependency
5.Generalizing Foundations of Decision Theory II
post by Abram Demski 814 days ago | Sam Eisenstat, Vanessa Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments

As promised in the previous post, I develop my formalism for justifying as many of the decision-theoretic axioms as possible with generalized dutch-book arguments. (I’ll use the term “generalized dutch-book” to refer to arguments with a family resemblance to dutch-book or money-pump.) The eventual goal is to relax these assumptions in a way which addresses bounded processing power, but for now the goal is to get as much of classical decision theory as possible justified by a generalized dutch-book.

6.Two Major Obstacles for Logical Inductor Decision Theory
post by Scott Garrabrant 820 days ago | Alex Mennen, Sam Eisenstat, Abram Demski, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | 3 comments

In this post, I describe two major obstacles for logical inductor decision theory: untaken actions are not observable and no updatelessness for computations. I will concretely describe both of these problems in a logical inductor framework, but I believe that both issues are general enough to transcend that framework.

 7. Where's the first benign agent? link by Jacob Kopczynski 822 days ago | Patrick LaVictoire and Paul Christiano like this | 15 comments
8.The Ubiquitous Converse Lawvere Problem
post by Scott Garrabrant 827 days ago | Marcello Herreshoff, Sam Eisenstat, Jessica Taylor and Patrick LaVictoire like this | discuss

In this post, I give a stronger version of the open question presented here, and give a motivation for this stronger property. This came out of conversations with Marcello, Sam, and Tsvi.

Definition: A continuous function $$f:X\rightarrow Y$$ is called ubiquitous if for every continuous function $$g:X\rightarrow Y$$, there exists a point $$x\in X$$ such that $$f(x)=g(x)$$.

Open Problem: Does there exist a topological space $$X$$ with a ubiquitous function $$f:X\rightarrow[0,1]^X$$?

9.Formal Open Problem in Decision Theory
post by Scott Garrabrant 838 days ago | Marcello Herreshoff, Sam Eisenstat, Vanessa Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 13 comments

In this post, I present a new formal open problem. A positive answer would be valuable for decision theory research. A negative answer would be helpful, mostly for figuring out what is the closest we can get to a positive answer. I also give some motivation for the problem, and some partial progress.

Open Problem: Does there exist a topological space $$X$$ (in some convenient category of topological spaces) such that there exists a continuous surjection from $$X$$ to the space $$[0,1]^X$$ (of continuous functions from $$X$$ to $$[0,1]$$)?

10.Modal Combat for games other than the prisoner's dilemma
post by Alex Mennen 870 days ago | Patrick LaVictoire and Scott Garrabrant like this | 1 comment
11.Some problems with making induction benign, and approaches to them
post by Jessica Taylor 874 days ago | Nate Soares, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The universal prior is malign. I’ll talk about sequence of problems causing it to be malign and possible solutions.

 12. Maximally efficient agents will probably have an anti-daemon immune system discussion post by Jessica Taylor 874 days ago | Ryan Carey, Patrick LaVictoire and Scott Garrabrant like this | 1 comment
 13. All the indifference designs discussion post by Stuart Armstrong 874 days ago | Patrick LaVictoire likes this | 1 comment
14.Prediction Based Robust Cooperation
post by Scott Garrabrant 875 days ago | Patrick LaVictoire likes this | 1 comment

In this post, We present a new approach to robust cooperation, as an alternative to the “modal combat” framework. This post is very hand-waivey. If someone would like to work on making it better, let me know.

15.Entangled Equilibria and the Twin Prisoners' Dilemma
post by Scott Garrabrant 886 days ago | Vanessa Kosoy and Patrick LaVictoire like this | 2 comments

In this post, I present a generalization of Nash equilibria to non-CDT agents. I will use this formulation to model mutual cooperation in a twin prisoners’ dilemma, caused by the belief that the other player is similar to you, and not by mutual prediction. (This post came mostly out of a conversation with Sam Eisenstat, as well as contributions from Tsvi Benson-Tilsen and Jessica Taylor)

16.On motivations for MIRI's highly reliable agent design research
post by Jessica Taylor 903 days ago | Ryan Carey, Sam Eisenstat, Daniel Dewey, Nate Soares, Patrick LaVictoire, Paul Christiano, Tsvi Benson-Tilsen and Vladimir Nesov like this | 10 comments

(this post came out of a conversation between me and Owen Cotton-Barratt, plus a follow-up conversation with Nate)

17.Strategies for coalitions in unit-sum games
post by Jessica Taylor 905 days ago | Patrick LaVictoire and Stuart Armstrong like this | 3 comments

I’m going to formalize some ideas related to my previous post about pursuing convergent instrumental goals without good priors and prove theorems about how much power a coalition can guarantee. The upshot is that, while non-majority coalitions can’t guarantee controlling a non-negligible fraction of the expected power, majority coalitions can guarantee controlling a large fraction of the expected power.

 18. Open problem: thin logical priors discussion post by Tsvi Benson-Tilsen 916 days ago | Ryan Carey, Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 2 comments
19.My current take on the Paul-MIRI disagreement on alignability of messy AI
post by Jessica Taylor 935 days ago | Ryan Carey, Vanessa Kosoy, Daniel Dewey, Patrick LaVictoire, Scott Garrabrant and Stuart Armstrong like this | 40 comments

Paul Christiano and “MIRI” have disagreed on an important research question for a long time: should we focus research on aligning “messy” AGI (e.g. one found through gradient descent or brute force search) with human values, or on developing “principled” AGI (based on theories similar to Bayesian probability theory)? I’m going to present my current model of this disagreement and additional thoughts about it.

20.Uninfluenceable agents
post by Stuart Armstrong 950 days ago | Vanessa Kosoy and Patrick LaVictoire like this | 7 comments

A putative new idea for AI control; index here.

After explaining biased learning processes, we can now define influenceable (and uninfluenceable) learning processes.

Recall that the (unbiased) influence problem is due to agents randomising their preferences, as a sort of artificial `learning’ process, if the real learning process is slow or incomplete.

 21. Counterfactuals on POMDP discussion post by Stuart Armstrong 951 days ago | Patrick LaVictoire likes this | discuss
 22. The universal prior is malign link by Paul Christiano 958 days ago | Ryan Carey, Vanessa Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments
 23. My recent posts discussion post by Paul Christiano 959 days ago | Ryan Carey, Jessica Taylor, Patrick LaVictoire, Stuart Armstrong and Tsvi Benson-Tilsen like this | discuss
24.Predicting HCH using expert advice
post by Jessica Taylor 961 days ago | Ryan Carey, Patrick LaVictoire and Paul Christiano like this | 1 comment

Summary: in approximating a scheme like HCH , we would like some notion of “the best the prediction can be given available AI capabilities”. There’s a natural notion of “the best prediction of a human we should expect to get”. In general this doesn’t yield good predictions of HCH, but it does yield an HCH-like computation model that seems useful.

 25. (Non-)Interruptibility of Sarsa(λ) and Q-Learning link by Richard Möhn 973 days ago | Jessica Taylor and Patrick LaVictoire like this | 5 comments
Older

NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vanessa Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes