Loebian cooperation in the tiling agents problem
post by Vladimir Slepnev 3 days ago | Alex Mennen and Abram Demski like this | 3 comments

The tiling agents problem is about formalizing how AIs can create successor AIs that are at least as smart. Here’s a toy model I came up with, which is similar to Benya’s old model but simpler. A computer program X is asked one of two questions:

• Would you like some chocolate?

• Here’s the source code of another program Y. Do you accept it as your successor?

Humans are not agents: short vs long term
post by Stuart Armstrong 18 days ago | 2 comments

A putative new idea for AI control; index here.

This is an example of humans not being (idealised) agents.

Imagine a human who has a preference to not live beyond a hundred years. However, they want to live to next year, and it’s predictable that every year they are alive, they will have the same desire to survive till the next year.

 New circumstances, new values? discussion post by Stuart Armstrong 21 days ago | discuss
Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima
post by Scott Garrabrant 25 days ago | Patrick LaVictoire and Stuart Armstrong like this | 5 comments

In this post, we generalize the notions in Cooperative Oracles: Nonexploited Bargaining to deal with the possibility of introducing extra agents that have no control but have preferences. We further generalize this to infinitely many agents. (Part of the series started here.)

 Futarchy, Xrisks, and near misses discussion post by Stuart Armstrong 25 days ago | Abram Demski likes this | discuss
Futarchy Fix
post by Abram Demski 29 days ago | Scott Garrabrant and Stuart Armstrong like this | 9 comments

Robin Hanson’s Futarchy is a proposal to let prediction markets make governmental decisions. We can view an operating Futarchy as an agent, and ask if it is aligned with the interests of its constituents. I am aware of two main failures of alignment: (1) since predicting rare events is rewarded in proportion to their rareness, prediction markets heavily incentivise causing rare events to happen (I’ll call this the entropy-market problem); (2) it seems prediction markets would not be able to assign probability to existential risk, since you can’t collect on bets after everyone’s dead (I’ll call this the existential risk problem). I provide three formulations of (1) and solve two of them, and make some comments on (2). (Thanks to Scott for pointing out the second of these problems to me; I don’t remember who originally told me about the first problem, but also thanks.)

Divergent preferences and meta-preferences
post by Stuart Armstrong 29 days ago | discuss

A putative new idea for AI control; index here.

In simple graphical form, here is the problem of divergent human preferences:

 Optimisation in manipulating humans: engineered fanatics vs yes-men discussion post by Stuart Armstrong 33 days ago | discuss
 An Approach to Logically Updateless Decisions discussion post by Abram Demski 38 days ago | Sam Eisenstat, Jack Gallagher and Scott Garrabrant like this | 4 comments
AI safety: three human problems and one AI issue
post by Stuart Armstrong 39 days ago | Ryan Carey and Daniel Dewey like this | 1 comment

A putative new idea for AI control; index here.

There have been various attempts to classify the problems in AI safety research. Our old Oracle paper that classified then-theoretical methods of control, to more recent classifications that grow out of modern more concrete problems.

These all serve their purpose, but I think a more enlightening classification of the AI safety problems is to look at what the issues we are trying to solve or avoid. And most of these issues are problems about humans.

Acausal trade: conclusion: theory vs practice
post by Stuart Armstrong 42 days ago | discuss

A putative new idea for AI control; index here.

When I started this dive into acausal trade, I expected to find subtle and interesting theoretical considerations. Instead, most of the issues are practical.

 Acausal trade: being unusual discussion post by Stuart Armstrong 42 days ago | discuss
 Acausal trade: different utilities, different trades discussion post by Stuart Armstrong 42 days ago | discuss
 Acausal trade: trade barriers discussion post by Stuart Armstrong 43 days ago | discuss
 Value Learning for Irrational Toy Models discussion post by Patrick LaVictoire 43 days ago | discuss
 Acausal trade: full decision algorithms discussion post by Stuart Armstrong 43 days ago | discuss
 Acausal trade: universal utility, or selling non-existence insurance too late discussion post by Stuart Armstrong 43 days ago | discuss
Why I am not currently working on the AAMLS agenda
post by Jessica Taylor 46 days ago | Ryan Carey, Marcello Herreshoff, Sam Eisenstat, Abram Demski, Daniel Dewey, Scott Garrabrant and Stuart Armstrong like this | 2 comments

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.

Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 46 days ago | Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

Cooperative Oracles: Introduction
post by Scott Garrabrant 46 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

1. Introduction
2. Nonexploited Bargaining
3. Stratified Pareto Optima and Almost Stratified Pareto Optima
4. Definition and Existence Proof
5. Alternate Notions of Dependency
 Acausal trade: double decrease discussion post by Stuart Armstrong 47 days ago | 2 comments
post by Stuart Armstrong 47 days ago | discuss

A putative new idea for AI control; index here.

I’ve never really understood acausal trade. So in a short series of posts, I’ll attempt to analyse the concept sufficiently that I can grasp it - and hopefully so others can grasp it as well.

post by Tom Everitt 52 days ago | Abram Demski and Stuart Armstrong like this | 1 comment

Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.

Infinite ethics comparisons
post by Stuart Armstrong 54 days ago | 1 comment

Work done with Amanda Askell; the errors are mine.

It’s very difficult to compare utilities across worlds with infinite populations. For instance, it seems clear that world $$w_1$$ is better than $$w_2$$, if the number indicate the utilities of various agents:

• $$w_1 = 1,0,1,0,1,0,1,0,1,0, \ldots$$
• $$w_2 = 1,0,1,0,0,1,0,0,0,1, \ldots$$
 Intertheoretic utility comparison: outcomes, strategies and utilities discussion post by Stuart Armstrong 55 days ago | discuss
Older

### NEW DISCUSSION POSTS

Thus anything that can
 by Stuart Armstrong on Corrigibility thoughts II: the robot operator | 0 likes

Ah, thanks! That seems more
 by Stuart Armstrong on Loebian cooperation in the tiling agents problem | 0 likes

It doesn't mean computation
 by Vladimir Slepnev on Loebian cooperation in the tiling agents problem | 0 likes

I'm not sure this would work,
 by Stuart Armstrong on Loebian cooperation in the tiling agents problem | 0 likes

>How can the short term
 by Stuart Armstrong on Humans are not agents: short vs long term | 0 likes

I expect a workable approach
 by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

Not sure what your argument
 by Stuart Armstrong on Corrigibility thoughts II: the robot operator | 0 likes

It is ‘a preference for
 by Stuart Armstrong on Humans are not agents: short vs long term | 0 likes

Note that we don't need to
 by Paul Christiano on ALBA requires incremental design of good long-term... | 0 likes

If I want my boat to travel
 by Paul Christiano on Corrigibility thoughts II: the robot operator | 0 likes

I don't think it's much like
 by Abram Demski on An Approach to Logically Updateless Decisions | 0 likes

Yeah, I like tail dependence.
 by Sam Eisenstat on An Approach to Logically Updateless Decisions | 0 likes

This is basically the
 by Paul Christiano on Cooperative Oracles: Stratified Pareto Optima and ... | 1 like

I think AsDT has a limited
 by Abram Demski on An Approach to Logically Updateless Decisions | 2 likes

Yeah, the 5 and 10 problem in
 by Sam Eisenstat on Two Major Obstacles for Logical Inductor Decision ... | 1 like