Intelligent Agent Foundations Forumsign up / log in
1.Meta: IAFF vs LessWrong
discussion post by Vadim Kosoy 15 days ago | Jessica Taylor likes this | 5 comments
2.The Learning-Theoretic AI Alignment Research Agenda
post by Vadim Kosoy 15 days ago | Alex Appel and Jessica Taylor like this | 36 comments

In this essay I will try to explain the overall structure and motivation of my AI alignment research agenda. The discussion is informal and no new theorems are proved here. The main features of my research agenda, as I explain them here, are

  • Viewing AI alignment theory as part of a general abstract theory of intelligence

  • Using desiderata and axiomatic definitions as starting points, rather than specific algorithms and constructions

  • Formulating alignment problems in the language of learning theory

  • Evaluating solutions by their formal mathematical properties, ultimately aiming at a quantitative theory of risk assessment

  • Relying on the mathematical intuition derived from learning theory to pave the way to solving philosophical questions

continue reading »
3.Logical Inductors Converge to Correlated Equilibria (Kinda)
post by Alex Appel 51 days ago | Sam Eisenstat and Jessica Taylor like this | 1 comment

Logical inductors of “similar strength”, playing against each other in a repeated game, will converge to correlated equilibria of the one-shot game, for the same reason that players that react to the past plays of their opponent converge to correlated equilibria. In fact, this proof is essentially just the proof from Calibrated Learning and Correlated Equilibrium by Forster (1997), adapted to a logical inductor setting.

continue reading »
4.Resource-Limited Reflective Oracles
discussion post by Alex Appel 96 days ago | Sam Eisenstat, Abram Demski and Jessica Taylor like this | 1 comment
5.No Constant Distribution Can be a Logical Inductor
discussion post by Alex Appel 100 days ago | Sam Eisenstat, Vadim Kosoy, Abram Demski, Jessica Taylor and Stuart Armstrong like this | 1 comment
6.An Untrollable Mathematician
post by Abram Demski 173 days ago | Alex Appel, Sam Eisenstat, Vadim Kosoy, Jack Gallagher, Jessica Taylor, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this | 1 comment

Follow-up to All Mathematicians are Trollable.

It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the non-convergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild non-convergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.

continue reading »
7.Where does ADT Go Wrong?
discussion post by Abram Demski 240 days ago | Jack Gallagher and Jessica Taylor like this | 1 comment
8.Reflective oracles as a solution to the converse Lawvere problem
post by Sam Eisenstat 242 days ago | Alex Mennen, Alex Appel, Vadim Kosoy, Abram Demski, Jessica Taylor, Scott Garrabrant and Vladimir Slepnev like this | discuss

1 Introduction

Before the work of Turing, one could justifiably be skeptical of the idea of a universal computable function. After all, there is no computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\) such that for all computable \(g\colon\mathbb{N}\to\mathbb{N}\) there is some index \(i_{g}\) such that \(f\left(i_{g},n\right)=g\left(n\right)\) for all \(n\). If there were, we could pick \(g\left(n\right)=f\left(n,n\right)+1\), and then \[g\left(i_{g}\right)=f\left(i_{g},i_{g}\right)+1=g\left(i_{g}\right)+1,\] a contradiction. Of course, universal Turing machines don’t run into this obstacle; as Gödel put it, “By a kind of miracle it is not necessary to distinguish orders, and the diagonal procedure does not lead outside the defined notion.” [1]

The miracle of Turing machines is that there is a partial computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) such that for all partial computable \(g\colon\mathbb{N}\to\mathbb{N}\cup\left\{ \bot\right\}\) there is an index \(i\) such that \(f\left(i,n\right)=g\left(n\right)\) for all \(n\). Here, we look at a different “miracle”, that of reflective oracles [2,3]. As we will see in Theorem 1, given a reflective oracle \(O\), there is a (stochastic) \(O\)-computable function \(f\colon\mathbb{N}\times\mathbb{N}\to\left\{ 0,1\right\}\) such that for any (stochastic) \(O\)-computable function \(g\colon\mathbb{N}\to\left\{ 0,1\right\}\), there is some index \(i\) such that \(f\left(i,n\right)\) and \(g\left(n\right)\) have the same distribution for all \(n\). This existence theorem seems to skirt even closer to the contradiction mentioned above.

We use this idea to answer “in spirit” the converse Lawvere problem posed in [4]. These methods also generalize to prove a similar analogue of the ubiquitous converse Lawvere problem from [5]. The original questions, stated in terms of topology, remain open, but I find that the model proposed here, using computability, is equally satisfying from the point of view of studying reflective agents. Those references can be consulted for more motivation on these problems from the perspective of reflective agency.

Section 3 proves the main lemma, and proves the converse Lawvere theorem for reflective oracles. In section 4, we use that to give a (circular) proof of Brouwer’s fixed point theorem, as mentioned in [4]. In section 5, we prove the ubiquitous converse Lawvere theorem for reflective oracles.

continue reading »
9.Cooperative Oracles: Nonexploited Bargaining
post by Scott Garrabrant 429 days ago | Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 6 comments

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

continue reading »
10.Cooperative Oracles: Introduction
post by Scott Garrabrant 429 days ago | Abram Demski, Jessica Taylor and Patrick LaVictoire like this | 1 comment

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

  1. Introduction
  2. Nonexploited Bargaining
  3. Stratified Pareto Optima and Almost Stratified Pareto Optima
  4. Definition and Existence Proof
  5. Alternate Notions of Dependency
continue reading »
11.Intertheoretic utility comparison: simple theory
discussion post by Stuart Armstrong 445 days ago | Jessica Taylor likes this | 8 comments
12.Generalizing Foundations of Decision Theory II
post by Abram Demski 449 days ago | Sam Eisenstat, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments

As promised in the previous post, I develop my formalism for justifying as many of the decision-theoretic axioms as possible with generalized dutch-book arguments. (I’ll use the term “generalized dutch-book” to refer to arguments with a family resemblance to dutch-book or money-pump.) The eventual goal is to relax these assumptions in a way which addresses bounded processing power, but for now the goal is to get as much of classical decision theory as possible justified by a generalized dutch-book.

continue reading »
13.Two Major Obstacles for Logical Inductor Decision Theory
post by Scott Garrabrant 454 days ago | Alex Mennen, Sam Eisenstat, Abram Demski, Jessica Taylor, Patrick LaVictoire and Tsvi Benson-Tilsen like this | 3 comments

In this post, I describe two major obstacles for logical inductor decision theory: untaken actions are not observable and no updatelessness for computations. I will concretely describe both of these problems in a logical inductor framework, but I believe that both issues are general enough to transcend that framework.

continue reading »
14.The Ubiquitous Converse Lawvere Problem
post by Scott Garrabrant 461 days ago | Marcello Herreshoff, Sam Eisenstat, Jessica Taylor and Patrick LaVictoire like this | discuss

In this post, I give a stronger version of the open question presented here, and give a motivation for this stronger property. This came out of conversations with Marcello, Sam, and Tsvi.

Definition: A continuous function \(f:X\rightarrow Y\) is called ubiquitous if for every continuous function \(g:X\rightarrow Y\), there exists a point \(x\in X\) such that \(f(x)=g(x)\).

Open Problem: Does there exist a topological space \(X\) with a ubiquitous function \(f:X\rightarrow[0,1]^X\)?

continue reading »
15.Formal Open Problem in Decision Theory
post by Scott Garrabrant 472 days ago | Marcello Herreshoff, Sam Eisenstat, Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 13 comments

In this post, I present a new formal open problem. A positive answer would be valuable for decision theory research. A negative answer would be helpful, mostly for figuring out what is the closest we can get to a positive answer. I also give some motivation for the problem, and some partial progress.

Open Problem: Does there exist a topological space \(X\) (in some convenient category of topological spaces) such that there exists a continuous surjection from \(X\) to the space \([0,1]^X\) (of continuous functions from \(X\) to \([0,1]\))?

continue reading »
16.Learning incomplete models using dominant markets
post by Vadim Kosoy 486 days ago | Jessica Taylor likes this | discuss

This post is formal treatment of the idea outlined here.

Given a countable set of incomplete models, we define a forecasting function that converges in the Kantorovich-Rubinstein metric with probability 1 to every one of the models which is satisfied by the true environment. This is analogous to Blackwell-Dubins merging of opinions for complete models, except that Kantorovich-Rubinstein convergence is weaker than convergence in total variation. The forecasting function is a dominant stochastic market for a suitably constructed set of traders.

continue reading »
17.Generalizing Foundations of Decision Theory
discussion post by Abram Demski 505 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Scott Garrabrant like this | 8 comments
18.Censoring out-of-domain representations
discussion post by Patrick LaVictoire 530 days ago | Jessica Taylor and Stuart Armstrong like this | 3 comments
19.A measure-theoretic generalization of logical induction
discussion post by Vadim Kosoy 547 days ago | Jessica Taylor and Scott Garrabrant like this | discuss
20.Open problem: thin logical priors
discussion post by Tsvi Benson-Tilsen 550 days ago | Ryan Carey, Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 2 comments
21.Towards learning incomplete models using inner prediction markets
discussion post by Vadim Kosoy 554 days ago | Jessica Taylor and Paul Christiano like this | 4 comments
22.The universal prior is malign
link by Paul Christiano 592 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments
23.My recent posts
discussion post by Paul Christiano 593 days ago | Ryan Carey, Jessica Taylor, Patrick LaVictoire, Stuart Armstrong and Tsvi Benson-Tilsen like this | discuss
24.(Non-)Interruptibility of Sarsa(λ) and Q-Learning
link by Richard Möhn 607 days ago | Jessica Taylor and Patrick LaVictoire like this | 5 comments
25.An algorithm with preferences: from zero to one variable
discussion post by Stuart Armstrong 608 days ago | Ryan Carey, Jessica Taylor and Patrick LaVictoire like this | discuss
Older

NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms