Intelligent Agent Foundations Forumsign up / log in
Optimisation in manipulating humans: engineered fanatics vs yes-men
discussion post by Stuart Armstrong 3 days ago | discuss
An Approach to Logically Updateless Decisions
discussion post by Abram Demski 8 days ago | Scott Garrabrant likes this | discuss
AI safety: three human problems and one AI issue
post by Stuart Armstrong 9 days ago | Ryan Carey and Daniel Dewey like this | 1 comment

There have been various attempts to classify the problems in AI safety research. Our old Oracle paper that classified then-theoretical methods of control, to more recent classifications that grow out of modern more concrete problems.

These all serve their purpose, but I think a more enlightening classification of the AI safety problems is to look at what the issues we are trying to solve or avoid. And most of these issues are problems about humans.

continue reading »
Acausal trade: conclusion: theory vs practice
post by Stuart Armstrong 12 days ago | discuss

When I started this dive into acausal trade, I expected to find subtle and interesting theoretical considerations. Instead, most of the issues are practical.

continue reading »
Acausal trade: being unusual
discussion post by Stuart Armstrong 12 days ago | discuss
Acausal trade: different utilities, different trades
discussion post by Stuart Armstrong 12 days ago | discuss
Acausal trade: trade barriers
discussion post by Stuart Armstrong 12 days ago | discuss
Value Learning for Irrational Toy Models
discussion post by Patrick LaVictoire 13 days ago | discuss
Acausal trade: full decision algorithms
discussion post by Stuart Armstrong 13 days ago | discuss
Acausal trade: universal utility, or selling non-existence insurance too late
discussion post by Stuart Armstrong 13 days ago | discuss
Why I am not currently working on the AAMLS agenda
post by Jessica Taylor 16 days ago | Ryan Carey, Marcello Herreshoff, Sam Eisenstat, Abram Demski, Daniel Dewey, Scott Garrabrant and Stuart Armstrong like this | 1 comment

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.

continue reading »
Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 16 days ago | Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation.

continue reading »
Cooperative Oracles: Introduction
post by Scott Garrabrant 16 days ago | Jessica Taylor and Patrick LaVictoire like this | discuss

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

  1. Introduction
  2. Nonexploited Bargaining
  3. Stratified and Nearly Pareto Optima
  4. Definition and Existence Proof
  5. Alternate Notions of Dependency
continue reading »
Acausal trade: double decrease
discussion post by Stuart Armstrong 17 days ago | 2 comments
Acausal trade: introduction
post by Stuart Armstrong 17 days ago | discuss

I’ve never really understood acausal trade. So in a short series of posts, I’ll attempt to analyse the concept sufficiently that I can grasp it - and hopefully so others can grasp it as well.

continue reading »
CIRL Wireheading
post by Tom Everitt 21 days ago | Abram Demski and Stuart Armstrong like this | 1 comment

Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.

continue reading »
Infinite ethics comparisons
post by Stuart Armstrong 24 days ago | 1 comment

Work done with Amanda Askell; the errors are mine.

It’s very difficult to compare utilities across worlds with infinite populations. For instance, it seems clear that world \(w_1\) is better than \(w_2\), if the number indicate the utilities of various agents:

  • \(w_1 = 1,0,1,0,1,0,1,0,1,0, \ldots\)
  • \(w_2 = 1,0,1,0,0,1,0,0,0,1, \ldots\)
continue reading »
Intertheoretic utility comparison: outcomes, strategies and utilities
discussion post by Stuart Armstrong 25 days ago | discuss
Finding reflective oracle distributions using a Kakutani map
discussion post by Jessica Taylor 27 days ago | Vadim Kosoy likes this | discuss
A correlated analogue of reflective oracles
post by Jessica Taylor 27 days ago | Sam Eisenstat, Vadim Kosoy, Abram Demski and Scott Garrabrant like this | discuss

Summary: Reflective oracles correspond to Nash equilibria. A correlated version of reflective oracles exists and corresponds to correlated equilibria. The set of these objects is convex, which is useful.

continue reading »
Change utility, reduce extortion
post by Stuart Armstrong 30 days ago | 3 comments

EDIT: This method is not intended to solve extortion, just to remove the likelihood of extremely terrible outcomes (and slightly reduce the vulnerability to extortion).

continue reading »
A permutation argument for comparing utility functions
discussion post by Stuart Armstrong 31 days ago | 2 comments
Intertheoretic utility comparison: examples
discussion post by Stuart Armstrong 31 days ago | discuss
Intertheoretic utility comparison: simple theory
discussion post by Stuart Armstrong 31 days ago | Jessica Taylor likes this | 8 comments
Generalizing Foundations of Decision Theory II
post by Abram Demski 35 days ago | Sam Eisenstat, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments

As promised in the previous post, I develop my formalism for justifying as many of the decision-theoretic axioms as possible with generalized dutch-book arguments. (I’ll use the term “generalized dutch-book” to refer to arguments with a family resemblance to dutch-book or money-pump.) The eventual goal is to relax these assumptions in a way which addresses bounded processing power, but for now the goal is to get as much of classical decision theory as possible justified by a generalized dutch-book.

continue reading »
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

The "benign induction
by David Krueger on Why I am not currently working on the AAMLS agenda | 0 likes

This comment is to explain
by Alex Mennen on Formal Open Problem in Decision Theory | 0 likes

Thanks for writing this -- I
by Daniel Dewey on AI safety: three human problems and one AI issue | 1 like

I think it does do the double
by Stuart Armstrong on Acausal trade: double decrease | 0 likes

>but the agent incorrectly
by Stuart Armstrong on CIRL Wireheading | 0 likes

I think the double decrease
by Owen Cotton-Barratt on Acausal trade: double decrease | 0 likes

The problem is that our
by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Yeah. The original generator
by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 0 likes

I don't see how it would. The
by Scott Garrabrant on Cooperative Oracles: Nonexploited Bargaining | 1 like

Does this generalise to
by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

>Every point in this set is a
by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This seems a proper version
by Stuart Armstrong on Cooperative Oracles: Nonexploited Bargaining | 0 likes

This doesn't seem to me to
by Stuart Armstrong on Change utility, reduce extortion | 0 likes

[_Regret Theory with General
by Abram Demski on Generalizing Foundations of Decision Theory II | 0 likes

It's not clear whether we
by Paul Christiano on Infinite ethics comparisons | 1 like

RSS

Privacy & Terms