1. No Constant Distribution Can be a Logical Inductor discussion post by Alex Appel 381 days ago | Sam Eisenstat, Vadim Kosoy, Abram Demski, Jessica Taylor and Stuart Armstrong like this | 1 comment
2.Being legible to other agents by committing to using weaker reasoning systems
post by Alex Mennen 506 days ago | Stuart Armstrong and Vladimir Slepnev like this | 1 comment

Suppose that an agent $$A_{1}$$ reasons in a sound theory $$T_{1}$$, and an agent $$A_{2}$$ reasons in a theory $$T_{2}$$, such that $$T_{1}$$ proves that $$T_{2}$$ is sound. Now suppose $$A_{1}$$ is trying to reason in a way that is legible to $$A_{2}$$, in the sense that $$A_{2}$$ can rely on $$A_{1}$$ to reach correct conclusions. One way of doing this is for $$A_{1}$$ to restrict itself to some weaker theory $$T_{3}$$, which $$T_{2}$$ proves is sound, for the purposes of any reasoning that it wants to be legible to $$A_{2}$$. Of course, in order for this to work, not only would $$A_{1}$$ have to restrict itself to using $$T_{3}$$, but $$A_{2}$$ would to trust that $$A_{1}$$ had done so. A plausible way for that to happen is for $$A_{1}$$ to reach the decision quickly enough that $$A_{2}$$ can simulate $$A_{1}$$ making the decision to restrict itself to using $$T_{3}$$.

3.The Happy Dance Problem
post by Abram Demski 522 days ago | Scott Garrabrant and Stuart Armstrong like this | 1 comment

Since the invention of logical induction, people have been trying to figure out what logically updateless reasoning could be. This is motivated by the idea that, in the realm of Bayesian uncertainty (IE, empirical uncertainty), updateless decision theory is the simple solution to the problem of reflective consistency. Naturally, we’d like to import this success to logically uncertain decision theory.

At a research retreat during the summer, we realized that updateless decision theory wasn’t so easy to define even in the seemingly simple Bayesian case. A possible solution was written up in Conditioning on Conditionals. However, that didn’t end up being especially satisfying.

Here, I introduce the happy dance problem, which more clearly illustrates the difficulty in defining updateless reasoning in the Bayesian case. I also outline Scott’s current thoughts about the correct way of reasoning about this problem.

4.Hyperreal Brouwer
post by Scott Garrabrant 564 days ago | Vadim Kosoy and Stuart Armstrong like this | 2 comments

This post explains how to view Kakutani’s fixed point theorem as a special case of Brouwer’s fixed point theorem with hyperreal numbers. This post is just math intuitions, but I found them useful in thinking about Kakutani’s fixed point theorem and many things in agent foundations. This came out of conversations with Sam Eisenstat.

5.Current thoughts on Paul Christano's research agenda
post by Jessica Taylor 646 days ago | Ryan Carey, Owen Cotton-Barratt, Sam Eisenstat, Paul Christiano, Stuart Armstrong and Wei Dai like this | 15 comments

This post summarizes my thoughts on Paul Christiano’s agenda in general and ALBA in particular.

6.Loebian cooperation in the tiling agents problem
post by Vladimir Slepnev 668 days ago | Alex Mennen, Vadim Kosoy, Abram Demski, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The tiling agents problem is about formalizing how AIs can create successor AIs that are at least as smart. Here’s a toy model I came up with, which is similar to Benya’s old model but simpler. A computer program X is asked one of two questions:

• Would you like some chocolate?

• Here’s the source code of another program Y. Do you accept it as your successor?

7.Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima
post by Scott Garrabrant 689 days ago | Vadim Kosoy, Patrick LaVictoire and Stuart Armstrong like this | 8 comments

In this post, we generalize the notions in Cooperative Oracles: Nonexploited Bargaining to deal with the possibility of introducing extra agents that have no control but have preferences. We further generalize this to infinitely many agents. (Part of the series started here.)

8.Futarchy Fix
post by Abram Demski 693 days ago | Scott Garrabrant and Stuart Armstrong like this | 9 comments

Robin Hanson’s Futarchy is a proposal to let prediction markets make governmental decisions. We can view an operating Futarchy as an agent, and ask if it is aligned with the interests of its constituents. I am aware of two main failures of alignment: (1) since predicting rare events is rewarded in proportion to their rareness, prediction markets heavily incentivise causing rare events to happen (I’ll call this the entropy-market problem); (2) it seems prediction markets would not be able to assign probability to existential risk, since you can’t collect on bets after everyone’s dead (I’ll call this the existential risk problem). I provide three formulations of (1) and solve two of them, and make some comments on (2). (Thanks to Scott for pointing out the second of these problems to me; I don’t remember who originally told me about the first problem, but also thanks.)

9.Why I am not currently working on the AAMLS agenda
post by Jessica Taylor 710 days ago | Ryan Carey, Marcello Herreshoff, Sam Eisenstat, Abram Demski, Daniel Dewey, Scott Garrabrant and Stuart Armstrong like this | 2 comments

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.

10.Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 711 days ago | Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

post by Tom Everitt 716 days ago | Abram Demski and Stuart Armstrong like this | 3 comments

Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.

12.Formal Open Problem in Decision Theory
post by Scott Garrabrant 753 days ago | Marcello Herreshoff, Sam Eisenstat, Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 13 comments

In this post, I present a new formal open problem. A positive answer would be valuable for decision theory research. A negative answer would be helpful, mostly for figuring out what is the closest we can get to a positive answer. I also give some motivation for the problem, and some partial progress.

Open Problem: Does there exist a topological space $$X$$ (in some convenient category of topological spaces) such that there exists a continuous surjection from $$X$$ to the space $$[0,1]^X$$ (of continuous functions from $$X$$ to $$[0,1]$$)?

13.Some problems with making induction benign, and approaches to them
post by Jessica Taylor 789 days ago | Nate Soares, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The universal prior is malign. I’ll talk about sequence of problems causing it to be malign and possible solutions.

 14. Censoring out-of-domain representations discussion post by Patrick LaVictoire 811 days ago | Jessica Taylor and Stuart Armstrong like this | 3 comments
15.Strategies for coalitions in unit-sum games
post by Jessica Taylor 820 days ago | Patrick LaVictoire and Stuart Armstrong like this | 3 comments

I’m going to formalize some ideas related to my previous post about pursuing convergent instrumental goals without good priors and prove theorems about how much power a coalition can guarantee. The upshot is that, while non-majority coalitions can’t guarantee controlling a non-negligible fraction of the expected power, majority coalitions can guarantee controlling a large fraction of the expected power.

 16. An impossibility result for doing without good priors discussion post by Jessica Taylor 823 days ago | Stuart Armstrong likes this | discuss
 17. Pursuing convergent instrumental subgoals on the user's behalf doesn't always require good priors discussion post by Jessica Taylor 844 days ago | Daniel Dewey, Paul Christiano and Stuart Armstrong like this | 9 comments
18.My current take on the Paul-MIRI disagreement on alignability of messy AI
post by Jessica Taylor 850 days ago | Ryan Carey, Vadim Kosoy, Daniel Dewey, Patrick LaVictoire, Scott Garrabrant and Stuart Armstrong like this | 40 comments

Paul Christiano and “MIRI” have disagreed on an important research question for a long time: should we focus research on aligning “messy” AGI (e.g. one found through gradient descent or brute force search) with human values, or on developing “principled” AGI (based on theories similar to Bayesian probability theory)? I’m going to present my current model of this disagreement and additional thoughts about it.

 19. My recent posts discussion post by Paul Christiano 875 days ago | Ryan Carey, Jessica Taylor, Patrick LaVictoire, Stuart Armstrong and Tsvi Benson-Tilsen like this | discuss
 20. Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences discussion post by Patrick LaVictoire 1039 days ago | Jessica Taylor and Stuart Armstrong like this | discuss
 21. Two problems with causal-counterfactual utility indifference discussion post by Jessica Taylor 1062 days ago | Patrick LaVictoire, Stuart Armstrong and Vladimir Slepnev like this | discuss
22.Anything you can do with n AIs, you can do with two (with directly opposed objectives)
post by Jessica Taylor 1084 days ago | Patrick LaVictoire and Stuart Armstrong like this | 2 comments

Summary: For any normal-form game, it’s possible to cast the problem of finding a correlated equilibrium in this game as a 2-player zero-sum game. This seems useful because zero-sum games are easy to analyze and more resistant to collusion.

 23. Maximizing a quantity while ignoring effect through some channel discussion post by Jessica Taylor 1116 days ago | Patrick LaVictoire and Stuart Armstrong like this | 12 comments
24.What does it mean for correct operation to rely on transfer learning?
post by Jessica Taylor 1146 days ago | Daniel Dewey, Patrick LaVictoire, Paul Christiano and Stuart Armstrong like this | discuss

Summary: Some approaches to AI value alignment rely on transfer learning. I attempt to explain this idea more clearly.

 25. Another view of quantilizers: avoiding Goodhart's Law discussion post by Jessica Taylor 1200 days ago | Abram Demski, Patrick LaVictoire and Stuart Armstrong like this | 1 comment
Older

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes