by Vadim Kosoy 306 days ago | link | parent Your confusion is because you are thinking about regret in an anytime setting. In an anytime setting, there is a fixed policy $$\pi$$, we measure the expected reward of $$\pi$$ over a time interval $$t$$ and compare it to the optimal expected reward over the same time interval. If $$\pi$$ has probability $$p > 0$$ to walk into a trap, regret has the linear lower bound $$\Omega(pt)$$. On other hand, I am talking about policies $$\pi_t$$ that explicitly depend on the parameter $$t$$ (I call this a “metapolicy”). Both the advisor and the agent policies are like that. As $$t$$ goes to $$\infty$$, the probability $$p(t)$$ to walk into a trap goes to $$0$$, so $$p(t)t$$ is a sublinear function. A second difference with the usual definition of regret is that I use an infinite sum of rewards with geometric time discount $$e^{-1/t}$$ instead of a step function time discount that cuts off at $$t$$. However, this second difference is entirely inessential, and all the theorems work about the same with step function time discount.

### NEW DISCUSSION POSTS

I found an improved version
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

I misunderstood your
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 0 likes

Caught a flaw with this
 by Alex Appel on A Loophole for Self-Applicative Soundness | 0 likes

As you say, this isn't a
 by Sam Eisenstat on A Loophole for Self-Applicative Soundness | 1 like

Note: I currently think that
 by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
 by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
 by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
 by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
 by Alex Appel on Smoking Lesion Steelman | 1 like

 by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
 by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
 by Jessica Taylor on Musings on Exploration | 0 likes