Optimisation in manipulating humans: engineered fanatics vs yes-men discussion post by Stuart Armstrong 3 days ago | discuss
 An Approach to Logically Updateless Decisions discussion post by Abram Demski 8 days ago | Scott Garrabrant likes this | discuss
AI safety: three human problems and one AI issue
post by Stuart Armstrong 9 days ago | Ryan Carey and Daniel Dewey like this | 1 comment

There have been various attempts to classify the problems in AI safety research. Our old Oracle paper that classified then-theoretical methods of control, to more recent classifications that grow out of modern more concrete problems.

These all serve their purpose, but I think a more enlightening classification of the AI safety problems is to look at what the issues we are trying to solve or avoid. And most of these issues are problems about humans.

Acausal trade: conclusion: theory vs practice
post by Stuart Armstrong 12 days ago | discuss

When I started this dive into acausal trade, I expected to find subtle and interesting theoretical considerations. Instead, most of the issues are practical.

 Acausal trade: being unusual discussion post by Stuart Armstrong 12 days ago | discuss
 Acausal trade: different utilities, different trades discussion post by Stuart Armstrong 12 days ago | discuss
 Acausal trade: trade barriers discussion post by Stuart Armstrong 12 days ago | discuss
 Value Learning for Irrational Toy Models discussion post by Patrick LaVictoire 13 days ago | discuss
 Acausal trade: full decision algorithms discussion post by Stuart Armstrong 13 days ago | discuss
 Acausal trade: universal utility, or selling non-existence insurance too late discussion post by Stuart Armstrong 13 days ago | discuss
Why I am not currently working on the AAMLS agenda
post by Jessica Taylor 16 days ago | Ryan Carey, Marcello Herreshoff, Sam Eisenstat, Abram Demski, Daniel Dewey, Scott Garrabrant and Stuart Armstrong like this | 1 comment

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.

Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 16 days ago | Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation.

Cooperative Oracles: Introduction
post by Scott Garrabrant 16 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

1. Introduction
2. Nonexploited Bargaining
3. Stratified and Nearly Pareto Optima
4. Definition and Existence Proof
5. Alternate Notions of Dependency
 Acausal trade: double decrease discussion post by Stuart Armstrong 17 days ago | 2 comments
post by Stuart Armstrong 17 days ago | discuss

I’ve never really understood acausal trade. So in a short series of posts, I’ll attempt to analyse the concept sufficiently that I can grasp it - and hopefully so others can grasp it as well.

post by Tom Everitt 21 days ago | Abram Demski and Stuart Armstrong like this | 1 comment

Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.

Infinite ethics comparisons
post by Stuart Armstrong 24 days ago | 1 comment

Work done with Amanda Askell; the errors are mine.

It’s very difficult to compare utilities across worlds with infinite populations. For instance, it seems clear that world $$w_1$$ is better than $$w_2$$, if the number indicate the utilities of various agents:

• $$w_1 = 1,0,1,0,1,0,1,0,1,0, \ldots$$
• $$w_2 = 1,0,1,0,0,1,0,0,0,1, \ldots$$
 Intertheoretic utility comparison: outcomes, strategies and utilities discussion post by Stuart Armstrong 25 days ago | discuss
 Finding reflective oracle distributions using a Kakutani map discussion post by Jessica Taylor 27 days ago | Vadim Kosoy likes this | discuss
A correlated analogue of reflective oracles
post by Jessica Taylor 27 days ago | Sam Eisenstat, Vadim Kosoy, Abram Demski and Scott Garrabrant like this | discuss

Summary: Reflective oracles correspond to Nash equilibria. A correlated version of reflective oracles exists and corresponds to correlated equilibria. The set of these objects is convex, which is useful.

Change utility, reduce extortion
post by Stuart Armstrong 30 days ago | 3 comments

EDIT: This method is not intended to solve extortion, just to remove the likelihood of extremely terrible outcomes (and slightly reduce the vulnerability to extortion).

 A permutation argument for comparing utility functions discussion post by Stuart Armstrong 31 days ago | 2 comments
 Intertheoretic utility comparison: examples discussion post by Stuart Armstrong 31 days ago | discuss
 Intertheoretic utility comparison: simple theory discussion post by Stuart Armstrong 31 days ago | Jessica Taylor likes this | 8 comments
Generalizing Foundations of Decision Theory II
post by Abram Demski 35 days ago | Sam Eisenstat, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments

As promised in the previous post, I develop my formalism for justifying as many of the decision-theoretic axioms as possible with generalized dutch-book arguments. (I’ll use the term “generalized dutch-book” to refer to arguments with a family resemblance to dutch-book or money-pump.) The eventual goal is to relax these assumptions in a way which addresses bounded processing power, but for now the goal is to get as much of classical decision theory as possible justified by a generalized dutch-book.

Older

### NEW DISCUSSION POSTS

The "benign induction
 by David Krueger on Why I am not currently working on the AAMLS agenda | 0 likes

This comment is to explain
 by Alex Mennen on Formal Open Problem in Decision Theory | 0 likes

Thanks for writing this -- I
 by Daniel Dewey on AI safety: three human problems and one AI issue | 1 like

I think it does do the double
 by Stuart Armstrong on Acausal trade: double decrease | 0 likes

>but the agent incorrectly
 by Stuart Armstrong on CIRL Wireheading | 0 likes

I think the double decrease
 by Owen Cotton-Barratt on Acausal trade: double decrease | 0 likes

The problem is that our
 by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Yeah. The original generator
 by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 0 likes

I don't see how it would. The
 by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Does this generalise to
 by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

>Every point in this set is a
 by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This seems a proper version
 by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This doesn't seem to me to
 by Stuart Armstrong on Change utility, reduce extortion | 0 likes

[_Regret Theory with General
 by Abram Demski on Generalizing Foundations of Decision Theory II | 0 likes

It's not clear whether we
 by Paul Christiano on Infinite ethics comparisons | 1 like