Intelligent Agent Foundations Forumsign up / log in
Logical Inductors Converge to Correlated Equilibria (Kinda)
post by Alex Appel 1 day ago | Jessica Taylor likes this | discuss

Logical inductors of “similar strength”, playing against each other in a repeated game, will converge to correlated equilibria of the one-shot game, for the same reason that players that react to the past plays of their opponent converge to correlated equilibria. In fact, this proof is essentially just the proof from Calibrated Learning and Correlated Equilibrium by Forster (1997), adapted to a logical inductor setting.

continue reading »
Logical Inductor Lemmas
discussion post by Alex Appel 1 day ago | discuss
Two Notions of Best Response
post by Alex Appel 1 day ago | discuss

In game theory, there are two different notions of “best response” at play. Causal best-response corresponds to standard game-theoretic reasoning, because it assumes that the joint probability distribution over everyone else’s moves remains unchanged if one player changes their move. The second one, Evidential best-response, can model cases where the actions of the various players are not subjectively independent, such as Death in Damascus, Twin Prisoner’s Dilemma, Troll Bridge, Newcomb, and Smoking Lesion, and will be useful to analyze the behavior of logical inductors in repeated games. This is just a quick rundown of the basic properties of these two notions of best response.

continue reading »
Doubts about Updatelessness
discussion post by Alex Appel 24 days ago | Abram Demski likes this | 2 comments
Computing an exact quantilal policy
discussion post by Vadim Kosoy 45 days ago | discuss
Resource-Limited Reflective Oracles
post by Alex Appel 46 days ago | Abram Demski and Jessica Taylor like this | discuss

Reflective oracles accurately answer questions about what arbitrary halting probabilistic oracle machines output. It is possible to make a variant of a reflective oracle that accurately answers questions about what sufficiently short-running Turing machines with access to the same oracle output.

continue reading »
No Constant Distribution Can be a Logical Inductor
discussion post by Alex Appel 50 days ago | Abram Demski, Jessica Taylor and Stuart Armstrong like this | 1 comment
Musings on Exploration
discussion post by Alex Appel 55 days ago | Vadim Kosoy likes this | 4 comments
Quantilal control for finite MDPs
post by Vadim Kosoy 57 days ago | Ryan Carey, Alex Appel and Abram Demski like this | discuss

We introduce a variant of the concept of a “quantilizer” for the setting of choosing a policy for a finite Markov decision process (MDP), where the generic unknown cost is replaced by an unknown penalty term in the reward function. This is essentially a generalization of quantilization in repeated games with a cost independence assumption. We show that the “quantilal” policy shares some properties with the ordinary optimal policy, namely that (i) it can always be chosen to be Markov (ii) it can be chosen to be stationary when time discount is geometric (iii) the “quantilum” value of an MDP with geometric time discount is a continuous piecewise rational function of the parameters, and it converges when the discount parameter \(\lambda\) approaches 1. Finally, we demonstrate a polynomial-time algorithm for computing the quantilal policy, showing that quantilization is not qualitatively harder than ordinary optimization.

continue reading »
A Difficulty With Density-Zero Exploration
discussion post by Alex Appel 62 days ago | 1 comment
Distributed Cooperation
post by Alex Appel 70 days ago | Abram Demski and Scott Garrabrant like this | 2 comments

Reflective oracles can be approximated by computing Nash equilibria. But is there some procedure that produces a Pareto-optimal equilibrium in a game, aka, a point produced by a Cooperative oracle? It turns out there is. There are some interesting philosophical aspects to it, which will be typed up in the next post.

The result is not original to me, it’s been floating around MIRI for a while. I think Scott, Sam, and Abram worked on it, but there might have been others. All I did was formalize it a bit, and generalize from the 2-player 2-move case to the n-player n-move case. With the formalism here, it’s a bit hard to intuitively understand what’s going on, so I’ll indicate where to visualize an appropriate 3-dimensional object.

continue reading »
Passing Troll Bridge
discussion post by Alex Appel 93 days ago | Abram Demski likes this | discuss
Why we want unbiased learning processes
post by Stuart Armstrong 96 days ago | discuss

Crossposted at Lesserwrong.

tl;dr: if an agent has a biased learning process, it may choose actions that are worse (with certainty) for every possible reward function it could be learning.

continue reading »
Two Types of Updatelessness
discussion post by Abram Demski 101 days ago | discuss
Stable Pointers to Value II: Environmental Goals
discussion post by Abram Demski 107 days ago | 1 comment
Further Progress on a Bayesian Version of Logical Uncertainty
post by Alex Appel 115 days ago | Scott Garrabrant likes this | 1 comment

I’d like to credit Daniel Demski for helpful discussion.

continue reading »
Strategy Nonconvexity Induced by a Choice of Potential Oracles
discussion post by Alex Appel 121 days ago | Abram Demski likes this | discuss
An Untrollable Mathematician
post by Abram Demski 124 days ago | Alex Appel, Sam Eisenstat, Vadim Kosoy, Jack Gallagher, Jessica Taylor, Paul Christiano, Scott Garrabrant and Vladimir Slepnev like this | 1 comment

Follow-up to All Mathematicians are Trollable.

It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the non-convergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian’s beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild non-convergence behavior “trollability”. Previously, I showed that if the Bayesian updates on the provabilily of a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.

continue reading »
Logical counterfactuals and differential privacy
post by Nisan Stiennon 125 days ago | Abram Demski and Scott Garrabrant like this | 1 comment

Edit: This article has major flaws. See my comment below.

This idea was informed by discussions with Abram Demski, Scott Garrabrant, and the MIRIchi discussion group.

continue reading »
More precise regret bound for DRL
post by Vadim Kosoy 156 days ago | Alex Appel likes this | discuss

We derive a regret bound for DRL reflecting dependence on:

  • Number of hypotheses

  • Mixing time of MDP hypotheses

  • The probability with which the advisor takes optimal actions

That is, the regret bound we get is fully explicit up to a multiplicative constant (which can also be made explicit). Currently we focus on plain (as opposed to catastrophe) and uniform (finite number of hypotheses, uniform prior) DRL, although this result can and should be extended to the catastrophe and/or non-uniform settings.

continue reading »
Value learning subproblem: learning goals of simple agents
discussion post by Alex Mennen 161 days ago | discuss
Oracle paper
discussion post by Stuart Armstrong 165 days ago | Vladimir Slepnev likes this | discuss
Being legible to other agents by committing to using weaker reasoning systems
post by Alex Mennen 175 days ago | Stuart Armstrong and Vladimir Slepnev like this | 1 comment

Suppose that an agent \(A_{1}\) reasons in a sound theory \(T_{1}\), and an agent \(A_{2}\) reasons in a theory \(T_{2}\), such that \(T_{1}\) proves that \(T_{2}\) is sound. Now suppose \(A_{1}\) is trying to reason in a way that is legible to \(A_{2}\), in the sense that \(A_{2}\) can rely on \(A_{1}\) to reach correct conclusions. One way of doing this is for \(A_{1}\) to restrict itself to some weaker theory \(T_{3}\), which \(T_{2}\) proves is sound, for the purposes of any reasoning that it wants to be legible to \(A_{2}\). Of course, in order for this to work, not only would \(A_{1}\) have to restrict itself to using \(T_{3}\), but \(A_{2}\) would to trust that \(A_{1}\) had done so. A plausible way for that to happen is for \(A_{1}\) to reach the decision quickly enough that \(A_{2}\) can simulate \(A_{1}\) making the decision to restrict itself to using \(T_{3}\).

continue reading »
Why DRL doesn't work for arbitrary environments
discussion post by Vadim Kosoy 178 days ago | discuss
Stable agent, subagent-unstable
discussion post by Stuart Armstrong 180 days ago | discuss





Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

If you drop the
by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like


Privacy & Terms