Intelligent Agent Foundations Forumsign up / log in
by Jessica Taylor 484 days ago | link | parent

  • For #5, it seems like “capable of pivotal acts” is doing the work of implying that the systems are extremely powerful.
  • For #4, I think that selection pressure does not constrain the goal much, since different terminal goals produce similar convergent instrumental goals. I’m still uncertain about this, though; it seems at least plausible (though not likely) that an agent’s goals are going to be aligned with a given task if e.g. their reproductive success is directly tied to performance on the task.
  • Agree on #2; I can kind of see it both ways too.
  • I’m also somewhat skeptical of #1. I usually think of it in terms of “how much of a competitive edge does general consequentialist reasoning give an AI project” and “how much of a competitive edge will safe AI projects have over unsafe ones, e.g. due to having more resources”.


by Owen Cotton-Barratt 483 days ago | Jessica Taylor and Nate Soares like this | link

For #5, OK, there’s something to this. But:

  • It’s somewhat plausible that stabilising pivotal acts will be available before world-destroying ones;
  • Actually there’s been a supposition smuggled in already with “the first AI systems capable of performing pivotal acts”. Perhaps there will at no point be a system capable of a pivotal act. I’m not quite sure whether it’s appropriate to talk about the collection of systems that exist being together capable of pivotal acts if they will not act in concert. Perhaps we’ll have a collection of systems which if aligned would produce a win, or if acting together towards an unaligned goal would produce catastrophe. It’s unclear if they each have different unaligned goals that we necessarily get catastrophe (though it’s certainly not a comfortable scenario).

I like your framing for #1.

reply

by Jessica Taylor 483 days ago | link

I agree that things get messier when there is a collection of AI systems rather than a single one. “Pivotal acts” mostly make sense in the context of local takeoff. In nonlocal takeoff, one of the main concerns is that goal-directed agents not aligned with human values are going to find a way to cooperate with each other.

reply



NEW LINKS

NEW POSTS

NEW DISCUSSION POSTS

RECENT COMMENTS

Note: I currently think that
by Jessica Taylor on Predicting HCH using expert advice | 0 likes

Counterfactual mugging
by Jessica Taylor on Doubts about Updatelessness | 0 likes

What do you mean by "in full
by David Krueger on Doubts about Updatelessness | 0 likes

It seems relatively plausible
by Paul Christiano on Maximally efficient agents will probably have an a... | 1 like

I think that in that case,
by Alex Appel on Smoking Lesion Steelman | 1 like

Two minor comments. First,
by Sam Eisenstat on No Constant Distribution Can be a Logical Inductor | 1 like

A: While that is a really
by Alex Appel on Musings on Exploration | 0 likes

> The true reason to do
by Jessica Taylor on Musings on Exploration | 0 likes

A few comments. Traps are
by Vadim Kosoy on Musings on Exploration | 1 like

I'm not convinced exploration
by Abram Demski on Musings on Exploration | 0 likes

Update: This isn't really an
by Alex Appel on A Difficulty With Density-Zero Exploration | 0 likes

If you drop the
by Alex Appel on Distributed Cooperation | 1 like

Cool! I'm happy to see this
by Abram Demski on Distributed Cooperation | 0 likes

Caveat: The version of EDT
by 258 on In memoryless Cartesian environments, every UDT po... | 2 likes

[Delegative Reinforcement
by Vadim Kosoy on Stable Pointers to Value II: Environmental Goals | 1 like

RSS

Privacy & Terms