Intelligent Agent Foundations Forumsign up / log in
Density Zero Exploration
post by Alex Mennen 310 days ago | Abram Demski, Paul Christiano and Scott Garrabrant like this | discuss

The idea here is due to Scott Garrabrant. All I did was write it.

Let’s say a logical induction-based agent is making an infinite sequence of decisions, and is using \(\varepsilon\)-exploration on each decision. There are two desirable criteria, which are somewhat in conflict:

First, we want there to be enough exporation that traders attempting to bet that good strategies would have bad outcomes (and thus prevent the good strategies from being tried, so that the bet never gets settled) will lose arbitrarily large amounts of money if they try doing that every time. This requires that in total, there is an infinite amount of exploration. For example, if the agent \(2^{-n}\)-explores on step \(n\), then it is possible for a sufficiently wealthy malicious trader to bet against a good strategy by enough that the agent will avoid it every time, without the trader losing all its money, because the actions it is discouraging only are taken anyway finitely many times. But if the agent \(\varepsilon\)-explores on step \(n\) for some fixed \(\varepsilon>0\), then this is not possible, because each action is taken infinitely many times no matter what any of the traders do, so no trader can consistently make some good action appear bad without losing all its money.

Second, we want there to be sufficiently little exploration that the agent does not sacrifice a nontrivial amount of value to it. If actions only have short-term effects, then it is enough for the probability of exploration to approach \(0\) as \(n\rightarrow\infty\), in order for the agent to behave optimally in the limit (if actions can have lasting consequences, then this is not enough; for instance, if there is an action that destroys all value forever if it is ever taken, then that action needs to never be taken; this directly conflicts with the first criterion). For example, if the agent \(\varepsilon\)-explores on step \(n\) for some fixed \(\varepsilon>0\), then it never gets any closer to acting optimally, but if it \(2^{-n}\)-explores on step \(n\), then its probability of acting optimally will approach \(1\) as \(n\rightarrow\infty\).

Fortunately, there are sequences that converge to \(0\) but whose sum diverges, like \(\frac{1}{n}\), so it is possible to satisfy both of these criteria. However, if there are important differences among what the agent should do for different steps, then this might not be enough. For example, if the agent \(\frac{1}{n}\)-explores on step \(n\), and it is particularly important what action the agent takes on steps that are powers of \(2\), then a wealthy malicious trader could bet against good actions on every step that is a power of \(2\), then on the \(k\)th time the malicious trader does this, the agent \(2^{-k}\)-explores, and the malicious trader will only lose a finite amount of money doing this. Thus we should strengthen the first criterion to ensure that a wealthy malicious trader cannot bet against a good strategy infinitely many times, rather than that it cannot bet against a good strategy every time. Thus, for every efficiently computable infinite subset \(X\subseteq\mathbb{N}\), we want an infinite amount of exploration to occur on steps in \(X\), so that no malicious trader can bet against a good strategy on every step in \(X\) without running out of money (we only need to consider efficiently computable subsets because only efficiently computable traders participate in the market, and they cannot pick out sets that are not efficiently computable).

To do this, pick some computable probability distribution \(p\) over all efficiently computable subsets of \(\mathbb{N}\), which does not assign probability \(0\) to any of them. This can be done computably by picking a probability distribution over programs that provably run in polynomial time. For each efficiently computable set \(X\subseteq\mathbb{N}\), we want to explore with probability at least \(\frac{p\left(X\right)}{k}\) on the \(k\)th element of \(X\). Formally, let \(X_{n}\) be \(\frac{1}{k}\) if \(n\) is the \(k\)th element of \(X\), and \(0\) if \(n\notin X\). On step \(n\), we explore with probability \(\sum_{X}p\left(X\right)X_{n}\). Since we explore with probability at least \(\frac{p\left(X\right)}{k}\) on the \(k\)th element of \(X\), this satisfies the strengthened version of the first condition. Since \(\sum_{X}p\left(X\right)=1\), \(X_{n}\leq1\) for every \(X\) and \(n\), and \(X_{n}\rightarrow0\) as \(n\rightarrow\infty\) for every \(X\), \(\sum_{X}p\left(X\right)X_{n}\rightarrow0\) as \(n\rightarrow\infty\), so the second condition is also satisfied.



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

I found an improved version
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

RSS

Privacy & Terms