 Generalizing Foundations of Decision Theory II   post by Abram Demski 1 day ago  Sam Eisenstat likes this  1 comment  
 As promised in the previous post, I develop my formalism for justifying as many of the decisiontheoretic axioms as possible with generalized dutchbook arguments. (I’ll use the term “generalized dutchbook” to refer to arguments with a family resemblance to dutchbook or moneypump.) The eventual goal is to relax these assumptions in a way which addresses bounded processing power, but for now the goal is to get as much of classical decision theory as possible justified by a generalized dutchbook.
 
    The Ubiquitous Converse Lawvere Problem   post by Scott Garrabrant 13 days ago  Marcello Herreshoff, Sam Eisenstat, Jessica Taylor and Patrick LaVictoire like this  discuss  
 In this post, I give a stronger version of the open question presented here, and give a motivation for this stronger property. This came out of conversations with Marcello, Sam, and Tsvi.
Definition: A continuous function \(f:X\rightarrow Y\) is called ubiquitous if for every continuous function \(g:X\rightarrow Y\), there exists a point \(x\in X\) such that \(f(x)=g(x)\).
Open Problem: Does there exist a topological space \(X\) with a ubiquitous function \(f:X\rightarrow[0,1]^X\)?
 
  Agents that don't become maximisers   post by Stuart Armstrong 17 days ago  discuss  
 According to the basic AI drives thesis, (almost) any agent capable of selfmodification will selfmodify into an expected utility maximiser.
The typical examples are the inconsistent utility maximisers, the satisficers, unexploitable agents, and it’s easy to think that all agents fall roughly into these broad categories. There’s also the observation that when looking at full policies rather than individual actions, many biased agents become expected utility maximisers (unless they want to lose pointlessly).
Nevertheless… there is an entire category of agents that generically seem to not selfmodify into maximisers. These are agents that attempt to maximise \(f(\mathbb{E}(U))\) where \(U\) is some utility function, \(\mathbb{E}(U)\) is its expectation, and \(f\) is a function that is neither wholly increasing nor decreasing.
 
  Understanding the important facts   post by Stuart Armstrong 18 days ago  discuss  
 I’ve got a partial design for motivating an AI to improve human understanding.
However, the AI is rewarded for generic human understanding of many variables, most of the quite pointless from our perspective. Can we motivate the AI to ensure our understanding on variables we find important? The presence of free humans, say, rather than air pressure in Antarctica?
 
          Nearest unblocked strategy versus learning patches   post by Stuart Armstrong 59 days ago  9 comments  
 The nearest unblocked strategy problem (NUS) is the idea that if you program a restriction or a patch into an AI, then the AI will often be motivated to pick a strategy that is as close as possible to the banned strategy, very similar in form, and maybe just as dangerous.
For instance, if the AI is maximising a reward \(R\), and does some behaviour \(B_i\) that we don’t like, we can patch the AI’s algorithm with patch \(P_i\) (‘maximise \(R_0\) subject to these constraints…’), or modify \(R\) to \(R_i\) so that \(B_i\) doesn’t come up. I’ll focus more on the patching example, but the modified reward one is similar.
 
         Entangled Equilibria and the Twin Prisoners' Dilemma   post by Scott Garrabrant 72 days ago  Vadim Kosoy and Patrick LaVictoire like this  2 comments  
 In this post, I present a generalization of Nash equilibria to nonCDT agents. I will use this formulation to model mutual cooperation in a twin prisoners’ dilemma, caused by the belief that the other player is similar to you, and not by mutual prediction. (This post came mostly out of a conversation with Sam Eisenstat, as well as contributions from Tsvi BensonTilsen and Jessica Taylor)  
   
Older 
 NEW POSTSNEW DISCUSSION POSTSThis isn't too related to
by Sam Eisenstat on Generalizing Foundations of Decision Theory II  0 likes 
I also commented there last
by Daniel Dewey on Where's the first benign agent?  0 likes 
(I replied last weekend, but
$g$ can be a fiber of $f$,
by Alex Mennen on Formal Open Problem in Decision Theory  0 likes 
>It seems like that can be
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
I disagree. I'm arguing that
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
But this could happen even if
by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
If I read Paul's post
by Daniel Dewey on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
I like this suggestion of a
>It may generalize
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
I don't know what you really
by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
>“is trying its best to do
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
In practice, I'd run your
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
>that is able to give
by Stuart Armstrong on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
> good in practice, but has
by Paul Christiano on ALBA: can you be "aligned" at increased "capacity"...  0 likes 
