1.Being legible to other agents by committing to using weaker reasoning systems
post by Alex Mennen 111 days ago | Stuart Armstrong and Vladimir Slepnev like this | 1 comment

Suppose that an agent $$A_{1}$$ reasons in a sound theory $$T_{1}$$, and an agent $$A_{2}$$ reasons in a theory $$T_{2}$$, such that $$T_{1}$$ proves that $$T_{2}$$ is sound. Now suppose $$A_{1}$$ is trying to reason in a way that is legible to $$A_{2}$$, in the sense that $$A_{2}$$ can rely on $$A_{1}$$ to reach correct conclusions. One way of doing this is for $$A_{1}$$ to restrict itself to some weaker theory $$T_{3}$$, which $$T_{2}$$ proves is sound, for the purposes of any reasoning that it wants to be legible to $$A_{2}$$. Of course, in order for this to work, not only would $$A_{1}$$ have to restrict itself to using $$T_{3}$$, but $$A_{2}$$ would to trust that $$A_{1}$$ had done so. A plausible way for that to happen is for $$A_{1}$$ to reach the decision quickly enough that $$A_{2}$$ can simulate $$A_{1}$$ making the decision to restrict itself to using $$T_{3}$$.

 continue reading »
2.The Happy Dance Problem
post by Abram Demski 127 days ago | Scott Garrabrant and Stuart Armstrong like this | 1 comment

Since the invention of logical induction, people have been trying to figure out what logically updateless reasoning could be. This is motivated by the idea that, in the realm of Bayesian uncertainty (IE, empirical uncertainty), updateless decision theory is the simple solution to the problem of reflective consistency. Naturally, we’d like to import this success to logically uncertain decision theory.

At a research retreat during the summer, we realized that updateless decision theory wasn’t so easy to define even in the seemingly simple Bayesian case. A possible solution was written up in Conditioning on Conditionals. However, that didn’t end up being especially satisfying.

Here, I introduce the happy dance problem, which more clearly illustrates the difficulty in defining updateless reasoning in the Bayesian case. I also outline Scott’s current thoughts about the correct way of reasoning about this problem.

 continue reading »
3.Hyperreal Brouwer
post by Scott Garrabrant 169 days ago | Vadim Kosoy and Stuart Armstrong like this | 2 comments

This post explains how to view Kakutani’s fixed point theorem as a special case of Brouwer’s fixed point theorem with hyperreal numbers. This post is just math intuitions, but I found them useful in thinking about Kakutani’s fixed point theorem and many things in agent foundations. This came out of conversations with Sam Eisenstat.

 continue reading »
4.Current thoughts on Paul Christano's research agenda
post by Jessica Taylor 250 days ago | Ryan Carey, Owen Cotton-Barratt, Sam Eisenstat, Paul Christiano, Stuart Armstrong and Wei Dai like this | 15 comments

This post summarizes my thoughts on Paul Christiano’s agenda in general and ALBA in particular.

 continue reading »
5.Loebian cooperation in the tiling agents problem
post by Vladimir Slepnev 272 days ago | Alex Mennen, Vadim Kosoy, Abram Demski, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The tiling agents problem is about formalizing how AIs can create successor AIs that are at least as smart. Here’s a toy model I came up with, which is similar to Benya’s old model but simpler. A computer program X is asked one of two questions:

• Would you like some chocolate?

• Here’s the source code of another program Y. Do you accept it as your successor?

 continue reading »
6.Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima
post by Scott Garrabrant 294 days ago | Vadim Kosoy, Patrick LaVictoire and Stuart Armstrong like this | 8 comments

In this post, we generalize the notions in Cooperative Oracles: Nonexploited Bargaining to deal with the possibility of introducing extra agents that have no control but have preferences. We further generalize this to infinitely many agents. (Part of the series started here.)

 continue reading »
7.Futarchy Fix
post by Abram Demski 298 days ago | Scott Garrabrant and Stuart Armstrong like this | 9 comments

Robin Hanson’s Futarchy is a proposal to let prediction markets make governmental decisions. We can view an operating Futarchy as an agent, and ask if it is aligned with the interests of its constituents. I am aware of two main failures of alignment: (1) since predicting rare events is rewarded in proportion to their rareness, prediction markets heavily incentivise causing rare events to happen (I’ll call this the entropy-market problem); (2) it seems prediction markets would not be able to assign probability to existential risk, since you can’t collect on bets after everyone’s dead (I’ll call this the existential risk problem). I provide three formulations of (1) and solve two of them, and make some comments on (2). (Thanks to Scott for pointing out the second of these problems to me; I don’t remember who originally told me about the first problem, but also thanks.)

 continue reading »
8.Why I am not currently working on the AAMLS agenda
post by Jessica Taylor 315 days ago | Ryan Carey, Marcello Herreshoff, Sam Eisenstat, Abram Demski, Daniel Dewey, Scott Garrabrant and Stuart Armstrong like this | 2 comments

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.

 continue reading »
9.Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 315 days ago | Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

 continue reading »
10.CIRL Wireheading
post by Tom Everitt 321 days ago | Abram Demski and Stuart Armstrong like this | 3 comments

Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.

 continue reading »
11.Formal Open Problem in Decision Theory
post by Scott Garrabrant 358 days ago | Marcello Herreshoff, Sam Eisenstat, Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 13 comments

In this post, I present a new formal open problem. A positive answer would be valuable for decision theory research. A negative answer would be helpful, mostly for figuring out what is the closest we can get to a positive answer. I also give some motivation for the problem, and some partial progress.

Open Problem: Does there exist a topological space $$X$$ (in some convenient category of topological spaces) such that there exists a continuous surjection from $$X$$ to the space $$[0,1]^X$$ (of continuous functions from $$X$$ to $$[0,1]$$)?

 continue reading »
12.Some problems with making induction benign, and approaches to them
post by Jessica Taylor 394 days ago | Nate Soares, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The universal prior is malign. I’ll talk about sequence of problems causing it to be malign and possible solutions.

 continue reading »
 13. Censoring out-of-domain representations discussion post by Patrick LaVictoire 416 days ago | Jessica Taylor and Stuart Armstrong like this | 3 comments
14.Strategies for coalitions in unit-sum games
post by Jessica Taylor 425 days ago | Patrick LaVictoire and Stuart Armstrong like this | 3 comments

I’m going to formalize some ideas related to my previous post about pursuing convergent instrumental goals without good priors and prove theorems about how much power a coalition can guarantee. The upshot is that, while non-majority coalitions can’t guarantee controlling a non-negligible fraction of the expected power, majority coalitions can guarantee controlling a large fraction of the expected power.

 continue reading »
 15. An impossibility result for doing without good priors discussion post by Jessica Taylor 428 days ago | Stuart Armstrong likes this | discuss
 16. Pursuing convergent instrumental subgoals on the user's behalf doesn't always require good priors discussion post by Jessica Taylor 449 days ago | Daniel Dewey, Paul Christiano and Stuart Armstrong like this | 9 comments
17.My current take on the Paul-MIRI disagreement on alignability of messy AI
post by Jessica Taylor 455 days ago | Ryan Carey, Vadim Kosoy, Daniel Dewey, Patrick LaVictoire, Scott Garrabrant and Stuart Armstrong like this | 40 comments

Paul Christiano and “MIRI” have disagreed on an important research question for a long time: should we focus research on aligning “messy” AGI (e.g. one found through gradient descent or brute force search) with human values, or on developing “principled” AGI (based on theories similar to Bayesian probability theory)? I’m going to present my current model of this disagreement and additional thoughts about it.

 continue reading »
 18. My recent posts discussion post by Paul Christiano 479 days ago | Ryan Carey, Jessica Taylor, Patrick LaVictoire, Stuart Armstrong and Tsvi Benson-Tilsen like this | discuss
 19. Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences discussion post by Patrick LaVictoire 644 days ago | Jessica Taylor and Stuart Armstrong like this | discuss
 20. Two problems with causal-counterfactual utility indifference discussion post by Jessica Taylor 667 days ago | Patrick LaVictoire, Stuart Armstrong and Vladimir Slepnev like this | discuss
21.Anything you can do with n AIs, you can do with two (with directly opposed objectives)
post by Jessica Taylor 688 days ago | Patrick LaVictoire and Stuart Armstrong like this | 2 comments

Summary: For any normal-form game, it’s possible to cast the problem of finding a correlated equilibrium in this game as a 2-player zero-sum game. This seems useful because zero-sum games are easy to analyze and more resistant to collusion.

 continue reading »
 22. Maximizing a quantity while ignoring effect through some channel discussion post by Jessica Taylor 721 days ago | Patrick LaVictoire and Stuart Armstrong like this | 12 comments
23.What does it mean for correct operation to rely on transfer learning?
post by Jessica Taylor 750 days ago | Daniel Dewey, Patrick LaVictoire, Paul Christiano and Stuart Armstrong like this | discuss

Summary: Some approaches to AI value alignment rely on transfer learning. I attempt to explain this idea more clearly.

 continue reading »
 24. Another view of quantilizers: avoiding Goodhart's Law discussion post by Jessica Taylor 805 days ago | Abram Demski, Patrick LaVictoire and Stuart Armstrong like this | 1 comment
 25. Superrationality in arbitrary games discussion post by Vadim Kosoy 870 days ago | Jessica Taylor, Nate Soares, Patrick LaVictoire, Scott Garrabrant and Stuart Armstrong like this | 5 comments
Older

### RECENT COMMENTS

If you drop the
 by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
 by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
 by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
 by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
 by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
 by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
 by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
 by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
 by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
 by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
 by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
 by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
 by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

Privacy & Terms