Intelligent Agent Foundations Forumsign up / log in
1.Two Types of Updatelessness
discussion post by Abram Demski 36 days ago | discuss
2.Stable Pointers to Value II: Environmental Goals
discussion post by Abram Demski 43 days ago | 1 comment
3.An Untrollable Mathematician
post by Abram Demski 59 days ago | Alex Appel, Sam Eisenstat, Vadim Kosoy, Jack Gallagher, Jessica Taylor, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this | 1 comment

Follow-up to All Mathematicians are Trollable.

It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the non-convergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild non-convergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.

continue reading »
4.Where does ADT Go Wrong?
discussion post by Abram Demski 126 days ago | Jack Gallagher and Jessica Taylor like this | 1 comment
5.The Happy Dance Problem
post by Abram Demski 127 days ago | Scott Garrabrant and Stuart Armstrong like this | 1 comment

Since the invention of logical induction, people have been trying to figure out what logically updateless reasoning could be. This is motivated by the idea that, in the realm of Bayesian uncertainty (IE, empirical uncertainty), updateless decision theory is the simple solution to the problem of reflective consistency. Naturally, we’d like to import this success to logically uncertain decision theory.

At a research retreat during the summer, we realized that updateless decision theory wasn’t so easy to define even in the seemingly simple Bayesian case. A possible solution was written up in Conditioning on Conditionals. However, that didn’t end up being especially satisfying.

Here, I introduce the happy dance problem, which more clearly illustrates the difficulty in defining updateless reasoning in the Bayesian case. I also outline Scott’s current thoughts about the correct way of reasoning about this problem.

continue reading »
6.Policy Selection Solves Most Problems
post by Abram Demski 116 days ago | Alex Appel and Vladimir Slepnev like this | 4 comments

It seems like logically updateless reasoning is what we would want in order to solve many decision-theory problems. I show that several of the problems which seem to require updateless reasoning can instead be solved by selecting a policy with a logical inductor that’s run a small amount of time. The policy specifies how to make use of knowledge from a logical inductor which is run longer. This addresses the difficulties which seem to block logically updateless decision theory in a fairly direct manner. On the other hand, it doesn’t seem to hold much promise for the kind of insights which we would want from a real solution.

continue reading »
7.XOR Blackmail & Causality
discussion post by Abram Demski 129 days ago | discuss
8.Mixed-Strategy Ratifiability Implies CDT=EDT
post by Abram Demski 144 days ago | discuss

I provide conditions under which CDT=EDT in Bayes-net causal models.

continue reading »
9.Predictable Exploration
discussion post by Abram Demski 150 days ago | 5 comments
10.Smoking Lesion Steelman III: Revenge of the Tickle Defense
post by Abram Demski 170 days ago | Scott Garrabrant likes this | 2 comments

I improve the theory I put forward last time a bit, locate it in the literature, and discuss conditions when this approach unifies CDT and EDT.

continue reading »
11.Smoking Lesion Steelman II
post by Abram Demski 175 days ago | Tom Everitt and Scott Garrabrant like this | 1 comment

After Johannes Treutlein’s comment on Smoking Lesion Steelman, and a number of other considerations, I had almost entirely given up on CDT. However, there were still nagging questions about whether the kind of self-ignorance needed in Smoking Lesion Steelman could arise naturally, how it should be dealt with if so, and what role counterfactuals ought to play in decision theory if CDT-like behavior is incorrect. Today I sat down to collect all the arguments which have been rolling around in my head on this and related issues, and arrived at a place much closer to CDT than I expected.

continue reading »
12.Comparing LICDT and LIEDT
post by Abram Demski 153 days ago | Alex Appel likes this | discuss

Attempted versions of CDT and EDT can be constructed using logical inductors, called LICDT and LIEDT. It is shown, however, that LICDT fails XOR Blackmail, and LIEDT fails Newcomb. One interpretation of this is that LICDT and LIEDT do not implement CDT and EDT very well. I argue that they are indeed forms of CDT and EDT, but stray from expectations because they also implement the ratifiability condition I discussed previously. Continuing the line of thinking from that post, I discuss conditions in which LICDT=LIEDT, and try to draw out broader implications for decision theory.

continue reading »
13.Stable Pointers to Value: An Agent Embedded in Its Own Utility Function
discussion post by Abram Demski 219 days ago | Tom Everitt, Scott Garrabrant and Vladimir Slepnev like this | discuss
14.Smoking Lesion Steelman
post by Abram Demski 265 days ago | Tom Everitt, Sam Eisenstat, Vadim Kosoy, Paul Christiano and Scott Garrabrant like this | 8 comments

It seems plausible to me that any example I’ve seen so far which seems to require causal/counterfactual reasoning is more properly solved by taking the right updateless perspective, and taking the action or policy which achieves maximum expected utility from that perspective. If this were the right view, then the aim would be to construct something like updateless EDT.

I give a variant of the smoking lesion problem which overcomes an objection to the classic smoking lesion, and which is solved correctly by CDT, but which is not solved by updateless EDT.

continue reading »
15.Futarchy Fix
post by Abram Demski 298 days ago | Scott Garrabrant and Stuart Armstrong like this | 9 comments

Robin Hanson’s Futarchy is a proposal to let prediction markets make governmental decisions. We can view an operating Futarchy as an agent, and ask if it is aligned with the interests of its constituents. I am aware of two main failures of alignment: (1) since predicting rare events is rewarded in proportion to their rareness, prediction markets heavily incentivise causing rare events to happen (I’ll call this the entropy-market problem); (2) it seems prediction markets would not be able to assign probability to existential risk, since you can’t collect on bets after everyone’s dead (I’ll call this the existential risk problem). I provide three formulations of (1) and solve two of them, and make some comments on (2). (Thanks to Scott for pointing out the second of these problems to me; I don’t remember who originally told me about the first problem, but also thanks.)

continue reading »
16.An Approach to Logically Updateless Decisions
discussion post by Abram Demski 307 days ago | Sam Eisenstat, Jack Gallagher and Scott Garrabrant like this | 4 comments
17.Generalizing Foundations of Decision Theory II
post by Abram Demski 335 days ago | Sam Eisenstat, Vadim Kosoy, Jessica Taylor and Patrick LaVictoire like this | 4 comments

As promised in the previous post, I develop my formalism for justifying as many of the decision-theoretic axioms as possible with generalized dutch-book arguments. (I’ll use the term “generalized dutch-book” to refer to arguments with a family resemblance to dutch-book or money-pump.) The eventual goal is to relax these assumptions in a way which addresses bounded processing power, but for now the goal is to get as much of classical decision theory as possible justified by a generalized dutch-book.

continue reading »
18.Generalizing Foundations of Decision Theory
discussion post by Abram Demski 391 days ago | Ryan Carey, Vadim Kosoy, Jessica Taylor and Scott Garrabrant like this | 8 comments
19.Questioning GLS-Coherence
discussion post by Abram Demski 643 days ago | discuss
20.Is logic epistemically appropriate?
discussion post by Abram Demski 668 days ago | Jessica Taylor likes this | discuss
21.You can't beat a troll by predicting it.
discussion post by Abram Demski 673 days ago | discuss
22.All Mathematicians are Trollable: Divergence of Naturalistic Logical Updates
post by Abram Demski 689 days ago | Jessica Taylor, Patrick LaVictoire, Scott Garrabrant and Vladimir Slepnev like this | 1 comment

The post on naturalistic logical updates left open the question of whether the probability distribution converges as we condition on more logical information. Here, I show that this cannot always be the case: for any computable probability distribution with naturalistic logical updates, we can show it proofs in an order which will prevent convergence. In fact, at any time, we can drive the probability of \(x\) up or down as much as we like, for a wide variety of sentences \(x\).

As an aid to intuition, I describe the theorem informally as “all mathematicians are trollable”. I was once told that there was an “all mathematicians go to Bayesian hell” theorem, based on the fact that a computable probability distribution must suffer arbitrarily large log-loss when trying to model mathematics. The idea here is similar. We are representing the belief state of a mathematician with a computable probability distribution, and trying to manipulate that belief state by proving carefully-selected theorems to the mathematician.

continue reading »
23.Naturalistic Logical Updates
post by Abram Demski 774 days ago | Patrick LaVictoire and Scott Garrabrant like this | 3 comments

Vadim pointed out in a comment to my post on logical counterfactuals that a very similar idea had been explained in a LessWrong post summarizing work by Vladimir Slepnev and Paul Christiano at a MIRI workshop in 2013. The algorithm which they suggested was called UDT 1.5. In fact, the essential idea is already argued by Vladimir Slepnev (cousin_it) in a post from 2012: Should logical probabilities be updateless too?.

Here, I continue to develop these ideas and those in my logical dutch-book post. I present an alternate prior to the one which was used for UDT 1.5. I show that this new prior has naturalistic logical updates, a kind of improvement-endorsing property which seems likely to work well with UDT. This property also gets around Paul Christiano’s paradox of ignorance.

continue reading »
24.Slack Chat
discussion post by Abram Demski 786 days ago | Vadim Kosoy and Scott Garrabrant like this | discuss
25.Thoughts on Logical Dutch Book Arguments
post by Abram Demski 815 days ago | Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | discuss

This post examines the application of Dutch Book arguments to logical uncertainty, as part of an attempt to fill out the ideas I speculated on in this post.

continue reading »





If you drop the
by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

Intermediate update: The
by Alex Appel on Further Progress on a Bayesian Version of Logical ... | 0 likes

Since Briggs [1] shows that
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

This doesn't quite work. The
by Nisan Stiennon on Logical counterfactuals and differential privacy | 0 likes

I at first didn't understand
by Sam Eisenstat on An Untrollable Mathematician | 1 like

This is somewhat related to
by Vadim Kosoy on The set of Logical Inductors is not Convex | 0 likes

This uses logical inductors
by Abram Demski on The set of Logical Inductors is not Convex | 0 likes

Nice writeup. Is one-boxing
by Tom Everitt on Smoking Lesion Steelman II | 0 likes

Hi Alex! The definition of
by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

A summary that might be
by Alex Appel on Delegative Inverse Reinforcement Learning | 1 like

I don't believe that
by Alex Appel on Delegative Inverse Reinforcement Learning | 0 likes

This is exactly the sort of
by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes


Privacy & Terms