by Paul Christiano 450 days ago | link | parent What kind of object is $$Q$$? (I assume its not a string.) Are you directly specifying a distribution of preferences conditioned on observations? Are you specifying a distribution over observations conditioned on preferences and then using inference? I assume the second case. So given that $$Q$$ is a predictive model, why wouldn’t you also use $$Q$$ as your model for planning? What is the advantage of using two separate models? Has anyone proposed using separate models in this way? To the extent that your model $$Q$$ is bad, it seems like you are just doomed to perform badly, and the you either need to abandon the model-based approach or come up with a better model. Adding a second model $$P$$ doesn’t sound promising at face value. It may be interesting or useful to have two models in this way, but I think it’s an unusual architecture that requires some discussion.

### RECENT COMMENTS

This is exactly the sort of
 by Stuart Armstrong on Being legible to other agents by committing to usi... | 0 likes

When considering an embedder
 by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
 by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Looking "at the very
 by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Without reading closely, this
 by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
 by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
 by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
 by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
 by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
 by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
 by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
 by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
 by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Privacy & Terms