Welcome to MIRI’s as-yet-unnamed new forum for technical Friendly AI research—whether or not it’s associated with MIRI! We want to provide a place for posting and discussing work on topics like the Löbian obstacle, updateless decision theory, and corrigibility. The LessWrong group blog has hosted some discussions on topics like these in the past, but Friendly AI research has never been entirely on topic there. By creating a forum focused entirely on research, we hope to make it easier to find, and have, interesting discussions of interesting new work.
The forum is world-readable, but posting and commenting will be invite-only. We have some ideas for a review process for submissions by non-members, but contributions from non-members will probably not be accepted until early 2015 at the earliest.
I’m hoping that having a forum where people are already familiar with some of our background assumptions will help MIRI to more quickly communicate new research results—currently, there are a number of things we haven’t finished writing up yet, not because the technical results are all that hard to write up, but because it is difficult to motivate them and provide enough surrounding context for them to be useful as a stand-alone tech report. Here are four topics I plan to post about in the next two weeks:
A formalism for logically omniscient, perfectly Bayesian agents, which live in and are able to reason about a world that contains other such agents, and in which such agents aren’t “special” (using classical game theory—i.e., mixed strategies and Nash equilibria—to get around diagonalization problems). (First post: Topological truth predicates.)
- Using this: A variant of AIXI that is able to reason about a world containing other (variant) AIXIs which are equally powerful.
A formal model of Stuart Armstrong’s utility indifference and problems with it (these are in the Corrigibility paper MIRI recently released), and a new variant we’ve been considering which avoids these problems (but may introduce others; we don’t know yet!).
A result that proof-based UDT is optimal in a certain sense: For every particular decision theory, there is a “fair” decision problem (in a certain formal sense) that fails on this decision problem, but for every provably “fair” decision problem, there is an \(N\) such that UDT with any proof length \(\ge N\) succeeds on this decision problem.
An idea that might point the way towards “safe” oracles, which try to predict what will happen in the world but don’t have incentives to manipulate their human operators in order to make them easier to predict. (This has some known issues and probably more not-yet-known ones, but seems like a promising avenue for further investigation.)
I’m also hoping that Eliezer and Nate will occasionally post here, though currently they’re even busier than me with writing a series of introductory papers about Friendly AI research. And of course I hope that some people who don’t work for MIRI will post here as well!
As you can see, the software is still fairly minimal and a little rough around the edges (though we do have LaTeX support). We hope to improve quickly! If you want to help us, the code is on GitHub. And you find bugs, we hope you’ll let us know!
Naming suggestions are also welcome :-)
Thanks to Elliott Jin for helping me make this happen! And thanks to Vladimir Slepnev for providing the initial impetus.