Intelligent Agent Foundations Forumsign up / log in
by Jessica Taylor 135 days ago | link | parent

The true reason to do exploration seems to be because the agent believes the action it is taking will not lead to an irreversible trap, and because it believes that the action will reveal information about the true environment that enables a better policy later on, which in expectation up to the time horizon, outweighs the temporary loss incurred due to exploring.

My understanding of logical inductor exploration (e.g. in asymptotic decision theory) is that the exploration steps the agent learns from mostly don’t happen in its own lifetime, rather they happen in the lifetimes of similar but simpler agents. This allows exploration to work for single-shot problems such as 5 and 10. Intuitively, if you are in a 5 and 10 problem and your brain has size 10^1000, then you can simulate someone whose brain has size 10^999 doing a 5 and 10 problem, and thereby learn the relation between the agent’s action and how much utility they get. So each particular agent has some chance of exploring irrecoverably, but in aggregate not many of them will (and it’s hard to predict which will and which won’t).

As far as I can tell, the only strategy that doesn’t have some sort of targetable exploration behavior is Thompson sampling.

Thompson sampling still randomizes (it randomizes its belief about the world it’s in) and is therefore vulnerable to troll bridge.



by Alex Appel 134 days ago | link

A: While that is a really interesting note that I hadn’t spotted before, the standard formulation of exploration steps in logical inductor decision theory involve infinite exploration steps over all time, so even though an agent of this type would be able to inductively learn from what other agents do in different decision problems in less time than it naively appears, that wouldn’t make it explore less.

B: What I intended with the remark about Thompson sampling was that troll bridge functions on there being two distinct causes of “attempting to cross the bridge”. One is crossing because you believe it to be the best action, and the other is crossing because an exploration step occurred, and Thompson sampling doesn’t have a split decision criterion like this. Although now that you point it out, it is possible to make a Thompson sampling variant where the troll blows up the bridge when “crossing the bridge” is not the highest-ranked action.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

There should be a chat icon
by Alex Mennen on Meta: IAFF vs LessWrong | 0 likes

Apparently "You must be
by Jessica Taylor on Meta: IAFF vs LessWrong | 1 like

There is a replacement for
by Alex Mennen on Meta: IAFF vs LessWrong | 1 like

Regarding the physical
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think that we should expect
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

I think I understand your
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

This seems like a hack. The
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

After thinking some more,
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yes, I think that we're
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

My intuition is that it must
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

To first approximation, a
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Actually, I *am* including
by Vadim Kosoy on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

Yeah, when I went back and
by Alex Appel on Optimal and Causal Counterfactual Worlds | 0 likes

> Well, we could give up on
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

> For another thing, consider
by Jessica Taylor on The Learning-Theoretic AI Alignment Research Agend... | 0 likes

RSS

Privacy & Terms