The main issue I have with UDT is that it neglects the meta-reasoning problem of: “how much should I think before I act?”
Is there anything I should read / know about WRT this?
What are people’s opinions on whether this is a serious issue, and how it could be resolved? What is the relation to logical updatelessness?
There’s generally an opportunity cost to deliberating.
Solving the UDT planning problem would take infinite compute, so it seems like we should be considering agents that can start acting without having solved this planning problem.
Maybe they should converge to doing what UDT would do. Alternatively, maybe it’s better to do empirical updates in this situation.