Intelligent Agent Foundations Forumsign up / log in
1.Autopoietic systems and difficulty of AGI alignment
post by Jessica Taylor 332 days ago | Ryan Carey, Owen Cotton-Barratt and Paul Christiano like this | 13 comments

I have recently come to the opinion that AGI alignment is probably extremely hard. But it’s not clear exactly what AGI or AGI alignment are. And there are some forms of aligment of “AI” systems that are easy. Here I operationalize “AGI” and “AGI alignment” in some different ways and evaluate their difficulties.

continue reading »
2.Current thoughts on Paul Christano's research agenda
post by Jessica Taylor 364 days ago | Ryan Carey, Owen Cotton-Barratt, Sam Eisenstat, Paul Christiano, Stuart Armstrong and Wei Dai like this | 15 comments

This post summarizes my thoughts on Paul Christiano’s agenda in general and ALBA in particular.

continue reading »
3.Why I am not currently working on the AAMLS agenda
post by Jessica Taylor 429 days ago | Ryan Carey, Marcello Herreshoff, Sam Eisenstat, Abram Demski, Daniel Dewey, Scott Garrabrant and Stuart Armstrong like this | 2 comments

(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)

The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.

continue reading »
4.Finding reflective oracle distributions using a Kakutani map
discussion post by Jessica Taylor 440 days ago | Vadim Kosoy likes this | discuss
5.A correlated analogue of reflective oracles
post by Jessica Taylor 440 days ago | Sam Eisenstat, Vadim Kosoy, Abram Demski and Scott Garrabrant like this | discuss

Summary: Reflective oracles correspond to Nash equilibria. A correlated version of reflective oracles exists and corresponds to correlated equilibria. The set of these objects is convex, which is useful.

continue reading »
6.Maximally efficient agents will probably have an anti-daemon immune system
discussion post by Jessica Taylor 508 days ago | Ryan Carey, Patrick LaVictoire and Scott Garrabrant like this | 1 comment
7.Are daemons a problem for ideal agents?
discussion post by Jessica Taylor 520 days ago | 1 comment
8.How likely is a random AGI to be honest?
discussion post by Jessica Taylor 521 days ago | 1 comment
9.Some problems with making induction benign, and approaches to them
post by Jessica Taylor 508 days ago | Nate Soares, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The universal prior is malign. I’ll talk about sequence of problems causing it to be malign and possible solutions.

continue reading »
10.Strategies for coalitions in unit-sum games
post by Jessica Taylor 539 days ago | Patrick LaVictoire and Stuart Armstrong like this | 3 comments

I’m going to formalize some ideas related to my previous post about pursuing convergent instrumental goals without good priors and prove theorems about how much power a coalition can guarantee. The upshot is that, while non-majority coalitions can’t guarantee controlling a non-negligible fraction of the expected power, majority coalitions can guarantee controlling a large fraction of the expected power.

continue reading »
11.An impossibility result for doing without good priors
discussion post by Jessica Taylor 542 days ago | Stuart Armstrong likes this | discuss
12.On motivations for MIRI's highly reliable agent design research
post by Jessica Taylor 538 days ago | Ryan Carey, Sam Eisenstat, Daniel Dewey, Nate Soares, Patrick LaVictoire, Paul Christiano, Tsvi Benson-Tilsen and Vladimir Nesov like this | 10 comments

(this post came out of a conversation between me and Owen Cotton-Barratt, plus a follow-up conversation with Nate)

continue reading »
13.Pursuing convergent instrumental subgoals on the user's behalf doesn't always require good priors
discussion post by Jessica Taylor 563 days ago | Daniel Dewey, Paul Christiano and Stuart Armstrong like this | 9 comments
14.My current take on the Paul-MIRI disagreement on alignability of messy AI
post by Jessica Taylor 569 days ago | Ryan Carey, Vadim Kosoy, Daniel Dewey, Patrick LaVictoire, Scott Garrabrant and Stuart Armstrong like this | 40 comments

Paul Christiano and “MIRI” have disagreed on an important research question for a long time: should we focus research on aligning “messy” AGI (e.g. one found through gradient descent or brute force search) with human values, or on developing “principled” AGI (based on theories similar to Bayesian probability theory)? I’m going to present my current model of this disagreement and additional thoughts about it.

continue reading »
15.Predicting HCH using expert advice
post by Jessica Taylor 595 days ago | Ryan Carey, Patrick LaVictoire and Paul Christiano like this | 1 comment

Summary: in approximating a scheme like HCH , we would like some notion of “the best the prediction can be given available AI capabilities”. There’s a natural notion of “the best prediction of a human we should expect to get”. In general this doesn’t yield good predictions of HCH, but it does yield an HCH-like computation model that seems useful.

continue reading »
16.ALBA requires incremental design of good long-term memory systems
discussion post by Jessica Taylor 595 days ago | Ryan Carey likes this | 1 comment
17.Modeling the capabilities of advanced AI systems as episodic reinforcement learning
post by Jessica Taylor 696 days ago | Patrick LaVictoire likes this | 6 comments

Here I’ll summarize the main abstraction I use for thinking about future AI systems. This is essentially the same model that Paul uses. I’m not actually introducing any new ideas in this post; mostly this is intended to summarize my current views.

continue reading »
18.Generative adversarial models, informed by arguments
discussion post by Jessica Taylor 748 days ago | discuss
19.In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy
post by Jessica Taylor 765 days ago | Vadim Kosoy and Abram Demski like this | 3 comments

Summary: I define a memoryless Cartesian environments (which can model many familiar decision problems), note the similarity to memoryless POMDPs, and define a local optimality condition for policies, which can be roughly stated as “the policy is consistent with maximizing expected utility using CDT and subjective probabilities derived from SIA”. I show that this local optimality condition is necesssary but not sufficient for global optimality (UDT).

continue reading »
20.Two problems with causal-counterfactual utility indifference
discussion post by Jessica Taylor 781 days ago | Patrick LaVictoire, Stuart Armstrong and Vladimir Slepnev like this | discuss
21.Anything you can do with n AIs, you can do with two (with directly opposed objectives)
post by Jessica Taylor 802 days ago | Patrick LaVictoire and Stuart Armstrong like this | 2 comments

Summary: For any normal-form game, it’s possible to cast the problem of finding a correlated equilibrium in this game as a 2-player zero-sum game. This seems useful because zero-sum games are easy to analyze and more resistant to collusion.

continue reading »
22.Lagrangian duality for constraints on expectations
post by Jessica Taylor 803 days ago | Patrick LaVictoire likes this | discuss

Summary: It’s possible to set up an zero-sum game between two agents so that, in any Nash equilibrium, one agent picks a policy to optimize a particular objective subject to some constraints on expected features of the state resulting from the policy. This seems potentially useful for getting an approximate agent to maximize some objective subject to constraints.

continue reading »
23.Maximizing a quantity while ignoring effect through some channel
discussion post by Jessica Taylor 835 days ago | Patrick LaVictoire and Stuart Armstrong like this | 12 comments
24.Rényi divergence as a secondary objective
discussion post by Jessica Taylor 831 days ago | Vadim Kosoy and Patrick LaVictoire like this | 1 comment
25.Informed oversight through an entropy-maximization objective
discussion post by Jessica Taylor 863 days ago | 9 comments
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms