Intelligent Agent Foundations Forumsign up / log in
An Approach to Logically Updateless Decisions
discussion post by Abram Demski 2 days ago | Scott Garrabrant likes this | discuss
AI safety: three human problems and one AI issue
post by Stuart Armstrong 4 days ago | Ryan Carey and Daniel Dewey like this | 1 comment

There have been various attempts to classify the problems in AI safety research. Our old Oracle paper that classified then-theoretical methods of control, to more recent classifications that grow out of modern more concrete problems.

These all serve their purpose, but I think a more enlightening classification of the AI safety problems is to look at what the issues we are trying to solve or avoid. And most of these issues are problems about humans.

continue reading »
Acausal trade: conclusion: theory vs practice
post by Stuart Armstrong 6 days ago | discuss

When I started this dive into acausal trade, I expected to find subtle and interesting theoretical considerations. Instead, most of the issues are practical.

continue reading »
Acausal trade: being unusual
discussion post by Stuart Armstrong 6 days ago | discuss
Acausal trade: different utilities, different trades
discussion post by Stuart Armstrong 7 days ago | discuss
Acausal trade: trade barriers
discussion post by Stuart Armstrong 7 days ago | discuss
Value Learning for Irrational Toy Models
discussion post by Patrick LaVictoire 7 days ago | discuss
Acausal trade: full decision algorithms
discussion post by Stuart Armstrong 8 days ago | discuss
Acausal trade: universal utility, or selling non-existence insurance too late
discussion post by Stuart Armstrong 8 days ago | discuss
Why I am not currently working on the AAMLS agenda
post by Jessica Taylor 10 days ago | Ryan Carey, Marcello Herreshoff, Sam Eisenstat, Abram Demski, Daniel Dewey, Scott Garrabrant and Stuart Armstrong like this | discuss

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.

continue reading »
Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 10 days ago | Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation.

continue reading »
Cooperative Oracles: Introduction
post by Scott Garrabrant 10 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

  1. Introduction
  2. Nonexploited Bargaining
  3. Stratified and Nearly Pareto Optima
  4. Definition and Existence Proof
  5. Alternate Notions of Dependency
continue reading »
Acausal trade: double decrease
discussion post by Stuart Armstrong 12 days ago | 2 comments
Acausal trade: introduction
post by Stuart Armstrong 12 days ago | discuss

I’ve never really understood acausal trade. So in a short series of posts, I’ll attempt to analyse the concept sufficiently that I can grasp it - and hopefully so others can grasp it as well.

continue reading »
CIRL Wireheading
post by Tom Everitt 16 days ago | Abram Demski and Stuart Armstrong like this | 1 comment

Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.

continue reading »
Infinite ethics comparisons
post by Stuart Armstrong 18 days ago | 1 comment

Work done with Amanda Askell; the errors are mine.

It’s very difficult to compare utilities across worlds with infinite populations. For instance, it seems clear that world \(w_1\) is better than \(w_2\), if the number indicate the utilities of various agents:

  • \(w_1 = 1,0,1,0,1,0,1,0,1,0, \ldots\)
  • \(w_2 = 1,0,1,0,0,1,0,0,0,1, \ldots\)
continue reading »
Intertheoretic utility comparison: outcomes, strategies and utilities
discussion post by Stuart Armstrong 20 days ago | discuss
Finding reflective oracle distributions using a Kakutani map
discussion post by Jessica Taylor 21 days ago | Vadim Kosoy likes this | discuss
A correlated analogue of reflective oracles
post by Jessica Taylor 21 days ago | Sam Eisenstat, Vadim Kosoy, Abram Demski and Scott Garrabrant like this | discuss

Summary: Reflective oracles correspond to Nash equilibria. A correlated version of reflective oracles exists and corresponds to correlated equilibria. The set of these objects is convex, which is useful.

continue reading »
Change utility, reduce extortion
post by Stuart Armstrong 25 days ago | 3 comments

EDIT: This method is not intended to solve extortion, just to remove the likelihood of extremely terrible outcomes (and slightly reduce the vulnerability to extortion).

continue reading »
A permutation argument for comparing utility functions
discussion post by Stuart Armstrong 26 days ago | 2 comments
Intertheoretic utility comparison: examples
discussion post by Stuart Armstrong 26 days ago | discuss
Intertheoretic utility comparison: simple theory
discussion post by Stuart Armstrong 26 days ago | Jessica Taylor likes this | 8 comments
Generalizing Foundations of Decision Theory II
post by Abram Demski 30 days ago | Sam Eisenstat, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments

As promised in the previous post, I develop my formalism for justifying as many of the decision-theoretic axioms as possible with generalized dutch-book arguments. (I’ll use the term “generalized dutch-book” to refer to arguments with a family resemblance to dutch-book or money-pump.) The eventual goal is to relax these assumptions in a way which addresses bounded processing power, but for now the goal is to get as much of classical decision theory as possible justified by a generalized dutch-book.

continue reading »
Two Major Obstacles for Logical Inductor Decision Theory
post by Scott Garrabrant 35 days ago | Alex Mennen, Sam Eisenstat, Abram Demski, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | 1 comment

In this post, I describe two major obstacles for logical inductor decision theory: untaken actions are not observable and no updatelessness for computations. I will concretely describe both of these problems in a logical inductor framework, but I believe that both issues are general enough to transcend that framework.

continue reading »
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

This comment is to explain
by Alex Mennen on Formal Open Problem in Decision Theory | 0 likes

Thanks for writing this -- I
by Daniel Dewey on AI safety: three human problems and one AI issue | 1 like

I think it does do the double
by Stuart Armstrong on Acausal trade: double decrease | 0 likes

>but the agent incorrectly
by Stuart Armstrong on CIRL Wireheading | 0 likes

I think the double decrease
by Owen Cotton-Barratt on Acausal trade: double decrease | 0 likes

The problem is that our
by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Yeah. The original generator
by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 0 likes

I don't see how it would. The
by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Does this generalise to
by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

>Every point in this set is a
by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This seems a proper version
by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This doesn't seem to me to
by Stuart Armstrong on Change utility, reduce extortion | 0 likes

[_Regret Theory with General
by Abram Demski on Generalizing Foundations of Decision Theory II | 0 likes

It's not clear whether we
by Paul Christiano on Infinite ethics comparisons | 1 like

Ah, but we would like an
by Stuart Armstrong on Change utility, reduce extortion | 0 likes

RSS

Privacy & Terms