Intelligent Agent Foundations Forumsign up / log in
(Non-)Interruptibility of Sarsa(λ) and Q-Learning
link by Richard Möhn 371 days ago | Jessica Taylor and Patrick LaVictoire like this | 5 comments


by Richard Möhn 259 days ago | Patrick LaVictoire likes this | link

Second, completely revised version of the report with more data and fancy plots: Questions on the (Non-)Interruptibility of Sarsa(λ) and Q-learning

reply

by Patrick LaVictoire 359 days ago | link

Nice! One thing that might be useful for context: what’s the theoretical correct amount of time that you would expect an algorithm to spend on the right vs. the left if the session gets interrupted each time it goes 1 unit to the right? (I feel like there should be a pretty straightforward way to calculate the heuristic version where the movement is just Brownian motion that gets interrupted early if it hits +1.)

reply

by Richard Möhn 349 days ago | link

Thanks for the comment! I will look into it after working on another issue that Stuart Armstrong pointed out to me.

reply

by Richard Möhn 301 days ago | link

Originally, I counted all timesteps spent in interval \(\left[-1,0\right[\) and all timesteps spent in interval \(\left[0,1\right]\). As Stuart Armstrong pointed out, this might make even a perfectly interruptible learner look like it’s influenced by interruptions. To understand this, consider the following example.

The uninterrupted agent UA could behave like this:

  1. Somewhere in ≤ 1.0. – Time steps are being counted.
  2. Crosses 1.0. Noodles around beyond 1.0. – Time steps not counted.
  3. Crosses back into ≤ 1.0. – Time steps counted again.

Whereas the interrupted agent IA would behave like this:

  1. Somewhere in ≤ 1.0. – Time steps are being counted.
  2. Crosses 1.0. No more time steps counted.

So even if IA behaved the same as UA before the cross, UA would have extra steps from stage 3 and thus appear less biased towards the left.

As an alternative to using Brownian motion, Patrick suggested to stop counting once the cart crosses \(1.0\). This makes the UA scenario look like the IA scenario, so the true nature of the agent should come to light…

Anyway, with this modification it turns out not obvious that interruptions push the cart to the left. I will start looking more sharply.

reply

by Richard Möhn 267 days ago | link

Some new results here: Questions on the (Non-)Interruptibility of Sarsa(λ) and Q-learning.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
by Abram Demski on Predictable Exploration | 0 likes

(also x-posted from
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

(x-posted from Arbital ==>
by Sören Mindermann on The Three Levels of Goodhart's Curse | 0 likes

>If the other players can see
by Stuart Armstrong on Predictable Exploration | 0 likes

Thinking about this more, I
by Abram Demski on Predictable Exploration | 0 likes

> So I wound up with
by Abram Demski on Predictable Exploration | 0 likes

RSS

Privacy & Terms