Intelligent Agent Foundations Forumsign up / log in
1.No Constant Distribution Can be a Logical Inductor
discussion post by Alex Appel 74 days ago | Sam Eisenstat, Abram Demski, Jessica Taylor and Stuart Armstrong like this | 1 comment
2.Being legible to other agents by committing to using weaker reasoning systems
post by Alex Mennen 200 days ago | Stuart Armstrong and Vladimir Slepnev like this | 1 comment

Suppose that an agent \(A_{1}\) reasons in a sound theory \(T_{1}\), and an agent \(A_{2}\) reasons in a theory \(T_{2}\), such that \(T_{1}\) proves that \(T_{2}\) is sound. Now suppose \(A_{1}\) is trying to reason in a way that is legible to \(A_{2}\), in the sense that \(A_{2}\) can rely on \(A_{1}\) to reach correct conclusions. One way of doing this is for \(A_{1}\) to restrict itself to some weaker theory \(T_{3}\), which \(T_{2}\) proves is sound, for the purposes of any reasoning that it wants to be legible to \(A_{2}\). Of course, in order for this to work, not only would \(A_{1}\) have to restrict itself to using \(T_{3}\), but \(A_{2}\) would to trust that \(A_{1}\) had done so. A plausible way for that to happen is for \(A_{1}\) to reach the decision quickly enough that \(A_{2}\) can simulate \(A_{1}\) making the decision to restrict itself to using \(T_{3}\).

continue reading »
3.The Happy Dance Problem
post by Abram Demski 216 days ago | Scott Garrabrant and Stuart Armstrong like this | 1 comment

Since the invention of logical induction, people have been trying to figure out what logically updateless reasoning could be. This is motivated by the idea that, in the realm of Bayesian uncertainty (IE, empirical uncertainty), updateless decision theory is the simple solution to the problem of reflective consistency. Naturally, we’d like to import this success to logically uncertain decision theory.

At a research retreat during the summer, we realized that updateless decision theory wasn’t so easy to define even in the seemingly simple Bayesian case. A possible solution was written up in Conditioning on Conditionals. However, that didn’t end up being especially satisfying.

Here, I introduce the happy dance problem, which more clearly illustrates the difficulty in defining updateless reasoning in the Bayesian case. I also outline Scott’s current thoughts about the correct way of reasoning about this problem.

continue reading »
4.Hyperreal Brouwer
post by Scott Garrabrant 258 days ago | Vadim Kosoy and Stuart Armstrong like this | 2 comments

This post explains how to view Kakutani’s fixed point theorem as a special case of Brouwer’s fixed point theorem with hyperreal numbers. This post is just math intuitions, but I found them useful in thinking about Kakutani’s fixed point theorem and many things in agent foundations. This came out of conversations with Sam Eisenstat.

continue reading »
5.Current thoughts on Paul Christano's research agenda
post by Jessica Taylor 339 days ago | Ryan Carey, Owen Cotton-Barratt, Sam Eisenstat, Paul Christiano, Stuart Armstrong and Wei Dai like this | 15 comments

This post summarizes my thoughts on Paul Christiano’s agenda in general and ALBA in particular.

continue reading »
6.Loebian cooperation in the tiling agents problem
post by Vladimir Slepnev 361 days ago | Alex Mennen, Vadim Kosoy, Abram Demski, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The tiling agents problem is about formalizing how AIs can create successor AIs that are at least as smart. Here’s a toy model I came up with, which is similar to Benya’s old model but simpler. A computer program X is asked one of two questions:

  • Would you like some chocolate?

  • Here’s the source code of another program Y. Do you accept it as your successor?

continue reading »
7.Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima
post by Scott Garrabrant 383 days ago | Vadim Kosoy, Patrick LaVictoire and Stuart Armstrong like this | 8 comments

In this post, we generalize the notions in Cooperative Oracles: Nonexploited Bargaining to deal with the possibility of introducing extra agents that have no control but have preferences. We further generalize this to infinitely many agents. (Part of the series started here.)

continue reading »
8.Futarchy Fix
post by Abram Demski 387 days ago | Scott Garrabrant and Stuart Armstrong like this | 9 comments

Robin Hanson’s Futarchy is a proposal to let prediction markets make governmental decisions. We can view an operating Futarchy as an agent, and ask if it is aligned with the interests of its constituents. I am aware of two main failures of alignment: (1) since predicting rare events is rewarded in proportion to their rareness, prediction markets heavily incentivise causing rare events to happen (I’ll call this the entropy-market problem); (2) it seems prediction markets would not be able to assign probability to existential risk, since you can’t collect on bets after everyone’s dead (I’ll call this the existential risk problem). I provide three formulations of (1) and solve two of them, and make some comments on (2). (Thanks to Scott for pointing out the second of these problems to me; I don’t remember who originally told me about the first problem, but also thanks.)

continue reading »
9.Why I am not currently working on the AAMLS agenda
post by Jessica Taylor 404 days ago | Ryan Carey, Marcello Herreshoff, Sam Eisenstat, Abram Demski, Daniel Dewey, Scott Garrabrant and Stuart Armstrong like this | 2 comments

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.

continue reading »
10.Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 404 days ago | Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

continue reading »
11.CIRL Wireheading
post by Tom Everitt 410 days ago | Abram Demski and Stuart Armstrong like this | 3 comments

Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.

continue reading »
12.Formal Open Problem in Decision Theory
post by Scott Garrabrant 447 days ago | Marcello Herreshoff, Sam Eisenstat, Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 13 comments

In this post, I present a new formal open problem. A positive answer would be valuable for decision theory research. A negative answer would be helpful, mostly for figuring out what is the closest we can get to a positive answer. I also give some motivation for the problem, and some partial progress.

Open Problem: Does there exist a topological space \(X\) (in some convenient category of topological spaces) such that there exists a continuous surjection from \(X\) to the space \([0,1]^X\) (of continuous functions from \(X\) to \([0,1]\))?

continue reading »
13.Some problems with making induction benign, and approaches to them
post by Jessica Taylor 483 days ago | Nate Soares, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The universal prior is malign. I’ll talk about sequence of problems causing it to be malign and possible solutions.

continue reading »
14.Censoring out-of-domain representations
discussion post by Patrick LaVictoire 505 days ago | Jessica Taylor and Stuart Armstrong like this | 3 comments
15.Strategies for coalitions in unit-sum games
post by Jessica Taylor 514 days ago | Patrick LaVictoire and Stuart Armstrong like this | 3 comments

I’m going to formalize some ideas related to my previous post about pursuing convergent instrumental goals without good priors and prove theorems about how much power a coalition can guarantee. The upshot is that, while non-majority coalitions can’t guarantee controlling a non-negligible fraction of the expected power, majority coalitions can guarantee controlling a large fraction of the expected power.

continue reading »
16.An impossibility result for doing without good priors
discussion post by Jessica Taylor 517 days ago | Stuart Armstrong likes this | discuss
17.Pursuing convergent instrumental subgoals on the user's behalf doesn't always require good priors
discussion post by Jessica Taylor 538 days ago | Daniel Dewey, Paul Christiano and Stuart Armstrong like this | 9 comments
18.My current take on the Paul-MIRI disagreement on alignability of messy AI
post by Jessica Taylor 544 days ago | Ryan Carey, Vadim Kosoy, Daniel Dewey, Patrick LaVictoire, Scott Garrabrant and Stuart Armstrong like this | 40 comments

Paul Christiano and “MIRI” have disagreed on an important research question for a long time: should we focus research on aligning “messy” AGI (e.g. one found through gradient descent or brute force search) with human values, or on developing “principled” AGI (based on theories similar to Bayesian probability theory)? I’m going to present my current model of this disagreement and additional thoughts about it.

continue reading »
19.My recent posts
discussion post by Paul Christiano 568 days ago | Ryan Carey, Jessica Taylor, Patrick LaVictoire, Stuart Armstrong and Tsvi Benson-Tilsen like this | discuss
20.Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences
discussion post by Patrick LaVictoire 733 days ago | Jessica Taylor and Stuart Armstrong like this | discuss
21.Two problems with causal-counterfactual utility indifference
discussion post by Jessica Taylor 756 days ago | Patrick LaVictoire, Stuart Armstrong and Vladimir Slepnev like this | discuss
22.Anything you can do with n AIs, you can do with two (with directly opposed objectives)
post by Jessica Taylor 777 days ago | Patrick LaVictoire and Stuart Armstrong like this | 2 comments

Summary: For any normal-form game, it’s possible to cast the problem of finding a correlated equilibrium in this game as a 2-player zero-sum game. This seems useful because zero-sum games are easy to analyze and more resistant to collusion.

continue reading »
23.Maximizing a quantity while ignoring effect through some channel
discussion post by Jessica Taylor 810 days ago | Patrick LaVictoire and Stuart Armstrong like this | 12 comments
24.What does it mean for correct operation to rely on transfer learning?
post by Jessica Taylor 839 days ago | Daniel Dewey, Patrick LaVictoire, Paul Christiano and Stuart Armstrong like this | discuss

Summary: Some approaches to AI value alignment rely on transfer learning. I attempt to explain this idea more clearly.

continue reading »
25.Another view of quantilizers: avoiding Goodhart's Law
discussion post by Jessica Taylor 894 days ago | Abram Demski, Patrick LaVictoire and Stuart Armstrong like this | 1 comment
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

I found an improved version
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

RSS

Privacy & Terms