My recent posts
discussion post by Paul Christiano 82 days ago | Ryan Carey, Jessica Taylor, Patrick LaVictoire, Stuart Armstrong and Tsvi Benson-Tilsen like this | discuss

Over at medium, I’m continuing to write about AI control; here’s a roundup from the last month.

Many of these seem like interesting things to discuss here; would it be better to post each of these as a link when I write it?

# Strategy

• Prosaic AI control argues that AI control research should first consider the case where AI involves no “unknown unknowns.”
• Handling destructive technology tries to explain the upside of AI control, if we live in a universe where we eventually need to build a singleton anyway.
• Hard-core subproblems explains a concept I find helpful for organizing research.

# Terminology and concepts

### NEW DISCUSSION POSTS

Why wouldn't it work? The
 by Jessica Taylor on True understanding comes from passing exams | 0 likes

It would be weird if the
 by Jessica Taylor on Are daemons a problem for ideal agents? | 0 likes

The second AI doesn't get to
 by Stuart Armstrong on True understanding comes from passing exams | 0 likes

Fixed the $\varepsilon$,
 by Scott Garrabrant on Entangled Equilibria and the Twin Prisoners' Dilem... | 0 likes

I think you meant to divide
 by Vadim Kosoy on Entangled Equilibria and the Twin Prisoners' Dilem... | 0 likes

Yup, this isn't robust to
 by Patrick LaVictoire on Censoring out-of-domain representations | 0 likes

I don't think "honesty" is
 by Paul Christiano on How likely is a random AGI to be honest? | 2 likes

Discussed briefly in Concrete
 by Daniel Dewey on Minimizing Empowerment for Safety | 2 likes

I reason as follows: 1.
 by David Krueger on Does UDT *really* get counter-factually mugged? | 1 like

I agree... if there are
 by David Krueger on Censoring out-of-domain representations | 0 likes

Game-aligned agents aren't
 by Vladimir Nesov on Does UDT *really* get counter-factually mugged? | 0 likes

The issue in the OP is that
 by Vladimir Nesov on Does UDT *really* get counter-factually mugged? | 0 likes

This seems only loosely
 by David Krueger on Does UDT *really* get counter-factually mugged? | 0 likes

OK that makes sense, thanks.
 by David Krueger on Does UDT *really* get counter-factually mugged? | 0 likes

It's not the same (but
 by David Krueger on Learning Impact in RL | 0 likes