Intelligent Agent Foundations Forumsign up / log in
by Jessica Taylor 317 days ago | link | parent

  • For #5, it seems like “capable of pivotal acts” is doing the work of implying that the systems are extremely powerful.
  • For #4, I think that selection pressure does not constrain the goal much, since different terminal goals produce similar convergent instrumental goals. I’m still uncertain about this, though; it seems at least plausible (though not likely) that an agent’s goals are going to be aligned with a given task if e.g. their reproductive success is directly tied to performance on the task.
  • Agree on #2; I can kind of see it both ways too.
  • I’m also somewhat skeptical of #1. I usually think of it in terms of “how much of a competitive edge does general consequentialist reasoning give an AI project” and “how much of a competitive edge will safe AI projects have over unsafe ones, e.g. due to having more resources”.


by Owen Cotton-Barratt 317 days ago | Jessica Taylor and Nate Soares like this | link

For #5, OK, there’s something to this. But:

  • It’s somewhat plausible that stabilising pivotal acts will be available before world-destroying ones;
  • Actually there’s been a supposition smuggled in already with “the first AI systems capable of performing pivotal acts”. Perhaps there will at no point be a system capable of a pivotal act. I’m not quite sure whether it’s appropriate to talk about the collection of systems that exist being together capable of pivotal acts if they will not act in concert. Perhaps we’ll have a collection of systems which if aligned would produce a win, or if acting together towards an unaligned goal would produce catastrophe. It’s unclear if they each have different unaligned goals that we necessarily get catastrophe (though it’s certainly not a comfortable scenario).

I like your framing for #1.

reply

by Jessica Taylor 317 days ago | link

I agree that things get messier when there is a collection of AI systems rather than a single one. “Pivotal acts” mostly make sense in the context of local takeoff. In nonlocal takeoff, one of the main concerns is that goal-directed agents not aligned with human values are going to find a way to cooperate with each other.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

When considering an embedder
by Jack Gallagher on Where does ADT Go Wrong? | 0 likes

The differences between this
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Looking "at the very
by Abram Demski on Policy Selection Solves Most Problems | 0 likes

Without reading closely, this
by Paul Christiano on Policy Selection Solves Most Problems | 1 like

>policy selection converges
by Stuart Armstrong on Policy Selection Solves Most Problems | 0 likes

Indeed there is some kind of
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

Very nice. I wonder whether
by Vadim Kosoy on Hyperreal Brouwer | 0 likes

Freezing the reward seems
by Vadim Kosoy on Resolving human inconsistency in a simple model | 0 likes

Unfortunately, it's not just
by Vadim Kosoy on Catastrophe Mitigation Using DRL | 0 likes

>We can solve the problem in
by Wei Dai on The Happy Dance Problem | 1 like

Maybe it's just my browser,
by Gordon Worley III on Catastrophe Mitigation Using DRL | 2 likes

At present, I think the main
by Abram Demski on Looking for Recommendations RE UDT vs. bounded com... | 0 likes

In the first round I'm
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

Fine with it being shared
by Paul Christiano on Funding opportunity for AI alignment research | 0 likes

I think the point I was
by Abram Demski on Predictable Exploration | 0 likes

RSS

Privacy & Terms