Welcome!
post by Benja Fallenstein 1598 days ago | Abram Demski, Luke Muehlhauser, Nate Soares, Patrick LaVictoire and Vladimir Slepnev like this | 2 comments

Welcome to MIRI’s as-yet-unnamed new forum for technical Friendly AI research—whether or not it’s associated with MIRI! We want to provide a place for posting and discussing work on topics like the Löbian obstacle, updateless decision theory, and corrigibility. The LessWrong group blog has hosted some discussions on topics like these in the past, but Friendly AI research has never been entirely on topic there. By creating a forum focused entirely on research, we hope to make it easier to find, and have, interesting discussions of interesting new work.

The forum is world-readable, but posting and commenting will be invite-only. We have some ideas for a review process for submissions by non-members, but contributions from non-members will probably not be accepted until early 2015 at the earliest.

I’m hoping that having a forum where people are already familiar with some of our background assumptions will help MIRI to more quickly communicate new research results—currently, there are a number of things we haven’t finished writing up yet, not because the technical results are all that hard to write up, but because it is difficult to motivate them and provide enough surrounding context for them to be useful as a stand-alone tech report. Here are four topics I plan to post about in the next two weeks:

• A formalism for logically omniscient, perfectly Bayesian agents, which live in and are able to reason about a world that contains other such agents, and in which such agents aren’t “special” (using classical game theory—i.e., mixed strategies and Nash equilibria—to get around diagonalization problems). (First post: Topological truth predicates.)

• Using this: A variant of AIXI that is able to reason about a world containing other (variant) AIXIs which are equally powerful.
• A formal model of Stuart Armstrong’s utility indifference and problems with it (these are in the Corrigibility paper MIRI recently released), and a new variant we’ve been considering which avoids these problems (but may introduce others; we don’t know yet!).

• A result that proof-based UDT is optimal in a certain sense: For every particular decision theory, there is a “fair” decision problem (in a certain formal sense) that fails on this decision problem, but for every provably “fair” decision problem, there is an $$N$$ such that UDT with any proof length $$\ge N$$ succeeds on this decision problem.

• An idea that might point the way towards “safe” oracles, which try to predict what will happen in the world but don’t have incentives to manipulate their human operators in order to make them easier to predict. (This has some known issues and probably more not-yet-known ones, but seems like a promising avenue for further investigation.)

I’m also hoping that Eliezer and Nate will occasionally post here, though currently they’re even busier than me with writing a series of introductory papers about Friendly AI research. And of course I hope that some people who don’t work for MIRI will post here as well!

As you can see, the software is still fairly minimal and a little rough around the edges (though we do have LaTeX support). We hope to improve quickly! If you want to help us, the code is on GitHub. And you find bugs, we hope you’ll let us know!

Naming suggestions are also welcome :-)

# Acknowledgements

Thanks to Elliott Jin for helping me make this happen! And thanks to Vladimir Slepnev for providing the initial impetus.

 by Abram Demski 1598 days ago | Benja Fallenstein, Daniel Dewey and Nate Soares like this | link $$\LaTeX$$ support? $$N^I C_E$$! I think this forum will help to gather discussions which otherwise might get somewhat lost on lesswrong, or not find a particular place to get going. In my experience, highly mathematical discussions will typically lack the fast back-and-forth which is typical of forums or mailing lists, since mathematical proposals take more time to create, read, and digest. I would encourage people not to be too quickly discouraged if it seems like no one is reading or responding. reply
 by Daniel Dewey 1597 days ago | Eliezer Yudkowsky likes this | link Thanks Benja, Elliott, and Vladimir for creating the site – it looks great. reply

### NEW DISCUSSION POSTS

[Note: This comment is three
 by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
 by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
 by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
 by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
 by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
 by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
 by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes