Intelligent Agent Foundations Forumsign up / log in

How to contribute

This is a publicly visible discussion forum for foundational mathematical research in artificial intelligence. The goal of this forum is to move toward a more formal and general understanding of "robust and beneficial" AI systems, as discussed in the Future of Life Institute's research priorities letter and the Machine Intelligence Research Institute's technical agenda.

Like Math Overflow, the Intelligent Agent Foundations Forum has a tiered system for becoming a member. If you make an account with a Facebook login, you can contribute comments and links, e.g. to an external post on Medium, Authorea, or on a personal blog. These comments and links will initially be visible to forum members; if your contribution acquires a few Likes from members, it will become visible to all site visitors.

If you frequently link to some good original content that you have written, the administrators will invite you to become a full member. The details of this system are still being worked out, and will change as we get a larger community of users.

What are the main topics of this forum?

Broadly speaking, the topics of this forum concern the difficulties of value alignment- the problem of how to ensure that machine intelligences of various levels adequately understand and pursue the goals that their developers actually intended, rather than getting stuck on some proxy for the real goal or failing in other unexpected (and possibly dangerous) ways. As these failure modes are more devastating the farther we advance in building machine intelligences, MIRI’s goal is to work today on the foundations of goal systems and architectures that would work even when the machine intelligence has general creative problem-solving ability beyond that of its developers, and has the ability to modify itself or build successors.

In that context, there are many interesting problems that come up. Here is a non-exhaustive list of relevant topics:
  • Decision theory: One class of topics comes from the distortions that arise when an agent predicts its environment, including its own future actions or the predictions of other agents, and tries to make decisions based on those. The tools of classical game theory and decision theory begin to make substandard recommendations on Newcomblike problems, blackmail problems, and other topics in this domain, and formal models of decision theories have brought up entirely unexpected self-referential failure modes. This has spurred the development of some new mathematical models of decision theory and counterfactual reasoning. (MIRI research agenda paper on decision theory)
  • Logical uncertainty: In the classical formalism of Bayesian agents, the agent updates on new evidence in a way that makes use of all logical consequences. In any interesting universe (even, say, the theory of arithmetic), this is actually an impossible assumption. Any bounded reasoner must have a satisfactory way of dealing with hypotheses that may in fact be determined from the data, but which have not yet been deduced either way. There are some interesting and analogous models of coherent (or locally coherent) probability distributions on the theory of arithmetic. (MIRI research agenda paper on logical uncertainty)
  • Reflective world-models: The distinction between an agent and its environment is a fuzzy one. Performing an action in the environment (e.g. sabotaging one’s own hardware) can predictably affect the agent’s future inferential processes. Furthermore, there are some models of intelligence and learning in which the correct hypotheses about the agent itself are not accessible to the agent. In both cases, there has been some progress on building mathematical models of systems that represent themselves more sensibly. (MIRI research paper on reflective world-models)
  • Corrigibility: Many goal systems, if they can reason reflectively and strategically, will seek to preserve themselves (because otherwise, their current goal state will be less likely to be reached). This gives rise to a potential problem with communicating human value to a machine intelligence: if the developers make a mistake in doing so, the machine intelligence may seek ways to avoid being corrected. There are several models of this, and a few proposals. (MIRI research paper on corrigibility)
  • Self-trust and Vingean reflection: Informally, if an agent self-modifies to become better at problem-solving or inference, it should be able to trust that its modified self will be better at achieving its goals. As it turns out, there is a self-referential obstacle with simple models of this (akin to the fact that only inconsistent formal systems believe themselves to be consistent), and one method of fixing it results in the possibility of indefinitely deferred actions or deductions. (MIRI research paper on Vingean reflection)
  • Value Learning: Since human beings have not succeeded at specifying human values (citation: look at the lack of total philosophical consensus on ethics), we may in fact need the help of a machine intelligence itself to specify the values to a machine intelligence. This sort of “indirect normativity” presents its own interesting challenges. (MIRI research paper on value learning)
Again, this list is not exhaustive! Besides the topics mentioned there, other relevant subjects for this forum include groundwork for self-modifying agents, abstract properties of goal systems, tractable theoretical or computational models of the topics above, and anything else that is directly connected to MIRI’s research mission.

It’s important for us to keep the forum focused, though; there are other good places to talk about subjects that are more indirectly related to MIRI’s research, and the moderators here may close down discussions on subjects that aren’t a good fit for this forum. Some examples of subjects that we would consider off-topic (unless directly applied to a more relevant area) include general advances in artificial intelligence and machine learning, general mathematical logic, general philosophy of mind, general futurism, existential risks, effective altruism, human rationality, and non-technical philosophizing.

Contact Us

You can reach us at with any questions.

Thanks for reading, and we look forward to your contributions to this forum!





I found an improved version
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes


Privacy & Terms