Generalizing Foundations of Decision Theory II
post by Abram Demski 1 day ago | Sam Eisenstat likes this | 1 comment

As promised in the previous post, I develop my formalism for justifying as many of the decision-theoretic axioms as possible with generalized dutch-book arguments. (I’ll use the term “generalized dutch-book” to refer to arguments with a family resemblance to dutch-book or money-pump.) The eventual goal is to relax these assumptions in a way which addresses bounded processing power, but for now the goal is to get as much of classical decision theory as possible justified by a generalized dutch-book.

Two Major Obstacles for Logical Inductor Decision Theory
post by Scott Garrabrant 6 days ago | Alex Mennen, Sam Eisenstat, Abram Demski, Patrick LaVictoire and Tsvi Benson-Tilsen like this | discuss

In this post, I describe two major obstacles for logical inductor decision theory: untaken actions are not observable and no updatelessness for computations. I will concretely describe both of these problems in a logical inductor framework, but I believe that both issues are general enough to transcend that framework.

ALBA: can you be "aligned" at increased "capacity"?
post by Stuart Armstrong 10 days ago | 13 comments

I think that Paul Christiano’s ALBA proposal is good in practice, but has conceptual problems in principle.

Specifically, I don’t think it makes sense to talk about bootstrapping an “aligned” agent to one that is still “aligned” but that has an increased capacity.

The Ubiquitous Converse Lawvere Problem
post by Scott Garrabrant 13 days ago | Marcello Herreshoff, Sam Eisenstat, Jessica Taylor and Patrick LaVictoire like this | discuss

In this post, I give a stronger version of the open question presented here, and give a motivation for this stronger property. This came out of conversations with Marcello, Sam, and Tsvi.

Definition: A continuous function $$f:X\rightarrow Y$$ is called ubiquitous if for every continuous function $$g:X\rightarrow Y$$, there exists a point $$x\in X$$ such that $$f(x)=g(x)$$.

Open Problem: Does there exist a topological space $$X$$ with a ubiquitous function $$f:X\rightarrow[0,1]^X$$?

Agents that don't become maximisers
post by Stuart Armstrong 17 days ago | discuss

According to the basic AI drives thesis, (almost) any agent capable of self-modification will self-modify into an expected utility maximiser.

The typical examples are the inconsistent utility maximisers, the satisficers, unexploitable agents, and it’s easy to think that all agents fall roughly into these broad categories. There’s also the observation that when looking at full policies rather than individual actions, many biased agents become expected utility maximisers (unless they want to lose pointlessly).

Nevertheless… there is an entire category of agents that generically seem to not self-modify into maximisers. These are agents that attempt to maximise $$f(\mathbb{E}(U))$$ where $$U$$ is some utility function, $$\mathbb{E}(U)$$ is its expectation, and $$f$$ is a function that is neither wholly increasing nor decreasing.

Understanding the important facts
post by Stuart Armstrong 18 days ago | discuss

I’ve got a partial design for motivating an AI to improve human understanding.

However, the AI is rewarded for generic human understanding of many variables, most of the quite pointless from our perspective. Can we motivate the AI to ensure our understanding on variables we find important? The presence of free humans, say, rather than air pressure in Antarctica?

Formal Open Problem in Decision Theory
post by Scott Garrabrant 24 days ago | Marcello Herreshoff, Sam Eisenstat, Vadim Kosoy, Jessica Taylor, Patrick LaVictoire and Stuart Armstrong like this | 9 comments

In this post, I present a new formal open problem. A positive answer would be valuable for decision theory research. A negative answer would be helpful, mostly for figuring out what is the closest we can get to a positive answer. I also give some motivation for the problem, and some partial progress.

Open Problem: Does there exist a topological space $$X$$ (in some convenient category of topological spaces) such that there exists a continuous surjection from $$X$$ to the space $$[0,1]^X$$ (of continuous functions from $$X$$ to $$[0,1]$$)?

 Low impact versus low side effects discussion post by Stuart Armstrong 31 days ago | Victoria Krakovna likes this | discuss
Learning incomplete models using dominant markets
post by Vadim Kosoy 37 days ago | Jessica Taylor likes this | discuss

This post is formal treatment of the idea outlined here.

Given a countable set of incomplete models, we define a forecasting function that converges in the Kantorovich-Rubinstein metric with probability 1 to every one of the models which is satisfied by the true environment. This is analogous to Blackwell-Dubins merging of opinions for complete models, except that Kantorovich-Rubinstein convergence is weaker than convergence in total variation. The forecasting function is a dominant stochastic market for a suitably constructed set of traders.

 HCH as a measure of manipulation discussion post by Patrick LaVictoire 44 days ago | 6 comments
Dominant stochastic markets
post by Vadim Kosoy 45 days ago | discuss

We generalize the formalism of dominant markets to account for stochastic “deductive processes,” and prove a theorem regarding the asymptotic behavior of such markets. In a following post, we will show how to use these tools to formalize the ideas outlined here.

Modal Combat for games other than the prisoner's dilemma
post by Alex Mennen 57 days ago | Patrick LaVictoire and Scott Garrabrant like this | 1 comment
 Generalizing Foundations of Decision Theory discussion post by Abram Demski 57 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Scott Garrabrant like this | 8 comments
Translation "counterfactual"
post by Stuart Armstrong 58 days ago | discuss

In a previous post, I briefly mentioned translations as one of three possible counterfactuals for indifference. Here I want to clarify what I meant there, because the idea is interesting.

Nearest unblocked strategy versus learning patches
post by Stuart Armstrong 59 days ago | 9 comments

The nearest unblocked strategy problem (NUS) is the idea that if you program a restriction or a patch into an AI, then the AI will often be motivated to pick a strategy that is as close as possible to the banned strategy, very similar in form, and maybe just as dangerous.

For instance, if the AI is maximising a reward $$R$$, and does some behaviour $$B_i$$ that we don’t like, we can patch the AI’s algorithm with patch $$P_i$$ (‘maximise $$R_0$$ subject to these constraints…’), or modify $$R$$ to $$R_i$$ so that $$B_i$$ doesn’t come up. I’ll focus more on the patching example, but the modified reward one is similar.

Some problems with making induction benign, and approaches to them
post by Jessica Taylor 60 days ago | Nate Soares, Patrick LaVictoire and Stuart Armstrong like this | 4 comments

The universal prior is malign. I’ll talk about sequence of problems causing it to be malign and possible solutions.

 Maximally efficient agents will probably have an anti-daemon immune system discussion post by Jessica Taylor 60 days ago | Ryan Carey, Patrick LaVictoire and Scott Garrabrant like this | discuss
 All the indifference designs discussion post by Stuart Armstrong 61 days ago | Patrick LaVictoire likes this | 1 comment
Prediction Based Robust Cooperation
post by Scott Garrabrant 61 days ago | Patrick LaVictoire likes this | 1 comment

In this post, We present a new approach to robust cooperation, as an alternative to the “modal combat” framework. This post is very hand-waivey. If someone would like to work on making it better, let me know.

Counterfactually uninfluenceable agents
post by Stuart Armstrong 61 days ago | discuss

Techniques used to counter agents taking biased decisions do not produce uninfluenceable agents.


 Indifference and compensatory rewards discussion post by Stuart Armstrong 67 days ago | discuss
 Are daemons a problem for ideal agents? discussion post by Jessica Taylor 72 days ago | 1 comment
Entangled Equilibria and the Twin Prisoners' Dilemma
post by Scott Garrabrant 72 days ago | Vadim Kosoy and Patrick LaVictoire like this | 2 comments

In this post, I present a generalization of Nash equilibria to non-CDT agents. I will use this formulation to model mutual cooperation in a twin prisoners’ dilemma, caused by the belief that the other player is similar to you, and not by mutual prediction. (This post came mostly out of a conversation with Sam Eisenstat, as well as contributions from Tsvi Benson-Tilsen and Jessica Taylor)

 How likely is a random AGI to be honest? discussion post by Jessica Taylor 73 days ago | 1 comment
 Minimizing Empowerment for Safety discussion post by David Krueger 74 days ago | 2 comments
Older

### NEW DISCUSSION POSTS

This isn't too related to
 by Sam Eisenstat on Generalizing Foundations of Decision Theory II | 0 likes

I also commented there last
 by Daniel Dewey on Where's the first benign agent? | 0 likes

(I replied last weekend, but
 by Paul Christiano on Where's the first benign agent? | 0 likes

$g$ can be a fiber of $f$,
 by Alex Mennen on Formal Open Problem in Decision Theory | 0 likes

>It seems like that can be
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

I disagree. I'm arguing that
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

But this could happen even if
 by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

 by Daniel Dewey on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

I like this suggestion of a
 by Patrick LaVictoire on Proposal for an Implementable Toy Model of Informe... | 0 likes

>It may generalize
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

I don't know what you really
 by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

>“is trying its best to do
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

In practice, I'd run your
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

>that is able to give
 by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"... | 0 likes

> good in practice, but has
 by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"... | 0 likes