Note: This describes an idea of Jessica Taylor’s.
The usual training procedures for machine learning models are not always well-equipped to avoid rare catastrophes. In order to maintain the safety of powerful AI systems, it will be important to have training procedures that can efficiently learn from such events. 
We can model this situation with the problem of exploration-only online bandit learning. We will show that if agents allocate more of their attention to risky inputs, they can more efficiently achieve a low regret on this problem.