Intelligent Agent Foundations Forumsign up / log in
A Difficulty With Density-Zero Exploration
discussion post by Alex Appel 62 days ago | 1 comment

Summary: If exploration rates decay to zero, the obvious way of ensuring that exploration occurs infinitely often (have a trader that sells the sentence saying that exploration will happen) may fail when there are long delays before you get feedback on whether exploration happened, because the trader can go indefinitely into (possible) debt due to slow feedback. And, if the trader budgets itself, then it won’t do enough trading to acquire unbounded money from the slowly decaying exploration rate.

So, Density-Zero Exploration was motivated by the following concern: A trader could mess up conditional utilities, and the way in which it did it left the trader capable of taking the same action next turn, as detailed here Of course, \(\epsilon\)-exploration takes care of this issue, and further, you don’t really need the \(\epsilon\) to remain constant over time, you can have it drop as \(\frac{1}{n}\) on the \(\overline{\mathbb{P}}\)-generable weighting which corresponds to the trades of the enforcer trader.

The obvious hope is that exploration would happen infinitely often along the subsequence, so any enforcer trader would lose eventually. Intuitively, that’s how it works. But it’s a bit harder to get than I naively thought at first.

My first attempt at proving it (modulo some finicky details about \(\overline{\mathbb{P}}\)-generable weightings and how they aren’t always 1) was something along the lines of “in the limit, the probability of exploration on the \(n\)’th element of the subsequence will be very close to \(\frac{1}{n}\), and because of this, if there’s only finitely much exploration, it’s possible for a trader to get infinite money by selling stocks of the exploration sentence.”

However, there’s a problem when you don’t get immediate feedback on whether the exploration step occurred. If you have to wait a very long time to hear whether the exploration step happened, then the strategy of “sell stocks in the exploration sentence” may leave the trader unboundedly in debt.

For most of the theorems in the logical induction paper, it was acceptable to take a very long time to exploit an arbitrage opportunity, because by assumption, arbitrage opportunities occurred infinitely often. However, because the frequency of exploration on a subsequence drops as \(\frac{1}{n}\), you can’t get a contradiction with the logical inductor criterion if the trader only exploits a sparse subsequence of those days.

Therefore, you can’t guarantee infinite exploration steps occur with \(\frac{1}{n}\)-exploration, and sparse feedback on exploration steps, if you’re using the path of “ooh there’s infinite money available by selling stocks in the exploration sentence.” The trader’s plausible value will either be unbounded below (by selling a bunch of overpriced stocks, but selling faster than they get feedback of whether they were worth anything or not), or bounded above (because waiting for feedback for budgeting purposes is slow enough that the trader cannot accumulate infinite money)

I still very strongly expect that on any \(\overline{\mathbb{P}}\)-generable weighting of the sequence of days, there will be infinite exploration steps, but the obvious way of showing it fails.

by Alex Appel 61 days ago | link

Update: This isn’t really an issue, you just need to impose an assumption that there is some function \(f\) such that \(f(n)>n\), and \(f(n)\) is computable in time polynomial in \(f(n)\), and you always find out whether exploration happened on turn \(f(n)\) after \(\mathcal{O}(f(n+1))\) days.

This is just the condition that there’s a subsequence where good feedback is possible, and is discussed significantly in section 4.3 of the logical induction paper.

If there’s a subsequence B (of your subsequence of interest, A) where you can get good feedback, then there’s infinite exploration steps on subsequence B (and also on A because it contains B)

This post is hereby deprecated. Still right, just not that relevant.






Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

If you drop the
by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like


Privacy & Terms