  2.  Being legible to other agents by committing to using weaker reasoning systems   post by Alex Mennen 289 days ago  Stuart Armstrong and Vladimir Slepnev like this  1 comment  
 Suppose that an agent \(A_{1}\) reasons in a sound theory \(T_{1}\), and an agent \(A_{2}\) reasons in a theory \(T_{2}\), such that \(T_{1}\) proves that \(T_{2}\) is sound. Now suppose \(A_{1}\) is trying to reason in a way that is legible to \(A_{2}\), in the sense that \(A_{2}\) can rely on \(A_{1}\) to reach correct conclusions. One way of doing this is for \(A_{1}\) to restrict itself to some weaker theory \(T_{3}\), which \(T_{2}\) proves is sound, for the purposes of any reasoning that it wants to be legible to \(A_{2}\). Of course, in order for this to work, not only would \(A_{1}\) have to restrict itself to using \(T_{3}\), but \(A_{2}\) would to trust that \(A_{1}\) had done so. A plausible way for that to happen is for \(A_{1}\) to reach the decision quickly enough that \(A_{2}\) can simulate \(A_{1}\) making the decision to restrict itself to using \(T_{3}\).  
  3.  The Happy Dance Problem   post by Abram Demski 305 days ago  Scott Garrabrant and Stuart Armstrong like this  1 comment  
 Since the invention of logical induction, people have been trying to figure out what logically updateless reasoning could be. This is motivated by the idea that, in the realm of Bayesian uncertainty (IE, empirical uncertainty), updateless decision theory is the simple solution to the problem of reflective consistency. Naturally, we’d like to import this success to logically uncertain decision theory.
At a research retreat during the summer, we realized that updateless decision theory wasn’t so easy to define even in the seemingly simple Bayesian case. A possible solution was written up in Conditioning on Conditionals. However, that didn’t end up being especially satisfying.
Here, I introduce the happy dance problem, which more clearly illustrates the difficulty in defining updateless reasoning in the Bayesian case. I also outline Scott’s current thoughts about the correct way of reasoning about this problem.
 
  4.  Hyperreal Brouwer   post by Scott Garrabrant 347 days ago  Vadim Kosoy and Stuart Armstrong like this  2 comments  
 This post explains how to view Kakutani’s fixed point theorem as a special case of Brouwer’s fixed point theorem with hyperreal numbers. This post is just math intuitions, but I found them useful in thinking about Kakutani’s fixed point theorem and many things in agent foundations. This came out of conversations with Sam Eisenstat.  
     8.  Futarchy Fix   post by Abram Demski 476 days ago  Scott Garrabrant and Stuart Armstrong like this  9 comments  
 Robin Hanson’s Futarchy is a proposal to let prediction markets make governmental decisions. We can view an operating Futarchy as an agent, and ask if it is aligned with the interests of its constituents. I am aware of two main failures of alignment: (1) since predicting rare events is rewarded in proportion to their rareness, prediction markets heavily incentivise causing rare events to happen (I’ll call this the entropymarket problem); (2) it seems prediction markets would not be able to assign probability to existential risk, since you can’t collect on bets after everyone’s dead (I’ll call this the existential risk problem). I provide three formulations of (1) and solve two of them, and make some comments on (2). (Thanks to Scott for pointing out the second of these problems to me; I don’t remember who originally told me about the first problem, but also thanks.)
 
    11.  CIRL Wireheading   post by Tom Everitt 499 days ago  Abram Demski and Stuart Armstrong like this  3 comments  
 Cooperative inverse reinforcement learning (CIRL) generated a lot of attention last year, as it seemed to do a good job aligning an agent’s incentives with its human supervisor’s. Notably, it led to an elegant solution to the shutdown problem.  
               
Older 
 NEW POSTSNEW DISCUSSION POSTSThere should be a chat icon
Apparently "You must be
There is a replacement for
Regarding the physical
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
I think that we should expect
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
I think I understand your
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
This seems like a hack. The
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
After thinking some more,
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
Yes, I think that we're
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
My intuition is that it must
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
To first approximation, a
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
Actually, I *am* including
by Vadim Kosoy on The LearningTheoretic AI Alignment Research Agend...  0 likes 
Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds  0 likes 
> Well, we could give up on
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
> For another thing, consider
by Jessica Taylor on The LearningTheoretic AI Alignment Research Agend...  0 likes 
