Intelligent Agent Foundations Forumsign up / log in
A new proposal for logical counterfactuals
link by Jack Gallagher 864 days ago | Jessica Taylor, Patrick LaVictoire and Scott Garrabrant like this | 3 comments

by Patrick LaVictoire 859 days ago | Vadim Kosoy likes this | link

Can you define more precisely what you mean by “censoring contradictions”?


by Jack Gallagher 856 days ago | link

By censoring I mean a specific technique for forcing the consistency of a possibly inconsistent set of axioms.

Suppose you have a set of deduction rules \(D\) over a language \(\ell\). You can construct a function \(f_D : P(\ell) \to P(\ell)\) that takes a set of sentences \(S\) and outputs all the sentences that can be proved in one step using \(D\) and the sentences in \(S\). You can also construct a censored \(f'_D\) by letting \(f'_D(S) = \{\phi\ |\ \phi \in f_D(S) \wedge \neg \phi \not\in S\}\).


by Sam Eisenstat 814 days ago | link

I’m copying over some (lightly edited) conversation about this post from Facebook.

Sam Eisenstat: In the original counterexample to the Trolljecture, I guess you’re proposing that \(\rm{Con}(\rm{PA})\) be logically prior to \(A() = 1\), so we can go back and replace \(A() = 1\) with \(A() = 2\), but still have \(\rm{Con}(\rm{PA})\) to use in order to derive \(A() = 2 \:\Box\negthickspace\!\rightarrow U() = 10\)? Here I’m just using “\(\Box\negthickspace\!\rightarrow\)” to mean “counterfactually results in” without meaning to imply that this should be defined modally.

I agree that this is intuitively sensible, but it is difficult to give a general definition that agrees with intuition - it’s hard to even have an intuition in the general case. There has been some work along these lines though; see for example though you may already know about this since you’ve been talking with Scott. I’m pessimistic about this direction in general, though I’d be happy to talk further about it.

Jack Gallagher: What’s the generator behind your being pessimistic?

Sam Eisenstat: Well, like Daniel suggests, we can ask whether the ordering is subjective/based on an agents epistemic state. Unlike Daniel, I think there are problems with this. The best examples are things like transparent Newcomb’s problem, where the agent needs to counterfact on things that it already knows the fact of the matter about. You can try to find loopholes in this, e.g. maybe the agent is doing something wrong if it isn’t ignorant of certain facts, but I haven’t been able to find something that’s satisfying (e.g. doesn’t fall to other thought experiments).

It thus seems that an agent should just counterfact on things even though it already knows them to be true, but this is a weird sort of thing to call “epistemic”. It doesn’t just depend on the agent’s probability assignments to various statements. Thus, to go farther here would seem to require a richer idea of “agent” than we presently have.

The alternative is an objective temporal ordering on logical statements. (This is a terrible use of the excluded middle, since “subjective” and “objective” are smuggling in a lot, so the place to look is probably somewhere that feels like a third option.) Anyway, it seems absurd for there to be a fact of the matter as to whether 417*596 = 248532 comes temporally before or after 251*622 = 156122.

Part of this is just a feeling of arbitrariness because even if one of those did come before the other, we don’t have to tools to know which one. Still, it seems the tools we do have are the wrong shape for building something that would work. For example, there seems to be an affinity between the ideas of being “easy”, of depending on few things, but being depended on by many, and of being temporally “early”. Patrick’s revived trolljecture is one attempt at formalizing it, but there was nothing deep about my counterexample, it’s just easy to generate statements that are difficult to evaluate. (There are a different versions of this, like independent statements in undecidable theories, or hard SAT instances in propositional logic.)

I used the word “we” in the last paragraph, but maybe you have type-theory tools that are more suited to this?

Daniel Satanove: It’s easier to come up with UDT when you have CDT. Is there a version of logical counterfactuals that works in more intuitive cases, but fail on stranger edge cases like transparent Newcomb’s?

Sam Eisenstat: The EDT of logical counterfactuals is just EDT with a prior over logic (e.g. bounded Demski, something more sophisticated). Proof-based UDT is a special case of this; it prescribes actions if any reasonable prior would agree, and is silent otherwise.

Unfortunately, as shown by the trolljecture counterexample, there are counterfactuals that proof-based UDT gets wrong (the trolljecture basically says that what proof-based UDT does should be read as counterfactuals), and therefore that EDT gets wrong given any reasonable prior, so it’s hard to define a “domain of validity” for this (not that that’s necessary for it be informative).

One candidate for the CDT of logical counterfactuals would just be putting a causal DAG structure on logical statements, like discussed in Scott’s forum post that I linked above. The TDT paper is informal, and I haven’t looked at it in a while, but I think that you can read it as proposing exactly this.

I believe that in all thought experiments where people have an intuition about what should be done, they are using this sort of reasoning plus strategy selection (i.e. updatelessness). I would be interested in seeing any thought experiments that people do not analyze in this way.

There isn’t much in the direction of actually defining such causal networks. Back when the trolljecture looked plausible we tried to interpret it in terms of causal networks, which didn’t quite work (see I expect something like Patrick’s revived trolljecture to be even harder to view in this way, since it’s inherently approximate; a notion of approximate causal network would be more appropriate here, if one could be found. I don’t know of any proposal that (1) actually defines a causal network and (2) works on some easy examples.

Well, that’s not entirely true. You can just pick any ordering that puts the agents decision at the beginning, and use the resulting fully-connected DAG. This works wherever proof-based UDT works, but fails on the trolljecture, exactly because it assumes anything that provably follows from your agent’s action is causally downstream of it.

Tsvi BT: “One candidate for the CDT of logical counterfactuals would just be putting a causal DAG structure on logical statements”

How does this interact with logical updatelessness? What happens to your DAG as you learn new facts?

Sam Eisenstat: I’m not sure if it’s obvious, but the idea is that each node has a full table of counterfactuals. An instance of this has more data than just an ordering.

It’s not intended to be updateless; it’s a CDT/TDT-style theory rather than UDT-style. You could layer strategy selection on top of it (though it could be computationally difficult if your DAG is hard to work with).

When you learn new facts you condition the probability distribution associated with the DAG on them, like for any Bayesian network.

Tsvi BT: So then this is dynamically inconsistent in the counterfactual mugging with a logical coin.

Sam Eisenstat: Oh yeah, “layer strategy selection on top” doesn’t work for logical uncertainty. Anyway, this was in response to Daniel’s request for a theory that “works in more intuitive cases”; I hope I didn’t give the impression that this would be hard to break.

On the other hand, you could try to “just do strategy selection” with respect to a particular prior if you’re willing to treat everything you learn as an observation. This is easy if you pick a coherent prior, but it’s uncomputable and it doesn’t let you pay counterfactual muggers with logical coins. Abram’s GLS stuff tries to do this with priors that don’t already know all logical facts. What do you think of that sort of approach?

Tsvi BT: I don’t have much more concrete to say. I currently view decision theory as being the questions of how to do logical updatelessness, and how to do strategy selection with a bounded reasoner. The GLS prior might be close, but there are basic open questions about how it works. “Strategic uncertainty” (or something) may or may not just boil down to logical uncertainty; the reason it might not is that you might actually have to define an optimization target other than “what Ideal Me thinks is the best strategy”, to take into account that actual versions of you are bounded in different ways. These two “components” (I wildly speculate) could combine to give a reflectively stable decision theory.

Daniel Satanove: Proof lengths probably aren’t the only way of doing the time step thing. Take the proof length definition of observation at time k, except delay all proofs of statements of the form a + b = c by one time step. This, or something sort of like this, will probably also work. Also any strictly increasing function on observation time should work.

The observation at time k should be an epistemic thing that describes how an agent learns of new theorems, rather than some fundamental property of math.

Jack Gallagher: What do you mean by “an epistemic thing”?

Daniel Satanove: “In order to break the cycles, we want some notion of logical time from which influence can only propagate forward. Proof length seems, at least to me, to be the most (only?) natural way to pull something like that off. A proof of length k can be thought of as a logical observation at time k.”

You are restricting yourself to agents who do a breadth first search through proofs rather than anything that can do a more directed approach. Saying that you want some notion of “logical time” kind of sounds like you want something that is a property of the math rather than a property of the agent.






[Note: This comment is three
by Ryan Carey on A brief note on factoring out certain variables | 0 likes

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes


Privacy & Terms