discussion post by Stuart Armstrong 128 days ago | discuss

A putative new idea for AI control; index here.

In a previous post, the marginal gain of one utility in terms of another was given by a crisp formula.

This is obviously unrealistic, but how unrealistic is it? This post will look at various features we might expect from utility functions and acausally trading agents, and what makes them easy or difficult to trade with.

# Independence

The ideal trade is something like agent $$A_1$$ likes staples, agent $$A_2$$ likes green objects, and they both decide to make green staples.

Here the two utilities are almost independent: increasing one costs almost nothing from the perspective of the other.

In general, independence (which could also be called low trade-offs) is very desirable between different utilities. This is not a transitive property: staples and green colour can be pretty independent, staples and blue colour as well, but obviously green and blue are in direct tension.

It’s plausible that there will exist whole families of mutually independent utilities, coming together as a package that can get maximised together or separately.

# Popularity

An obvious point, but utilities that are very popular - very widely shared by different agents - are more likely to be traded with, since there are more agents willing to trade in return. This willingness is non-linear in “double decrease” situations, for instance those that do not derive from some sort of timeless, pre-existence deal.

# Complexity and coordination

Then there’s the complexity of the preference. Simple utilities, or those whose existence is simple to deduce, are much easier to figure out and hence maximise.

This is partially a question of popularity, but it’s also an artefact of the fact that agents may have different priors (or, even if they have the same priors, different posteriors after updating on the agent’s own existence). In all those situations, simple preferences are likely to come up often.

These another type of complexity, which is complexity in increasing that utility. Suppose $$U$$ counts the number of named nodes in a network, where every node knows the name of every other node. This utility is not hard to describe, but is useless for acausal trade, as a second network would not be connected to the first one in terms of knowing the names of the first nodes. So $$U$$ can easily be maximised by causal trade, but not acausal.

# Total, average, diminishing returns

Total utilitarian-like utilities are particularly easy to trade with: simply create more of the desired object. Average utilitarian-like utilities are almost un-tradable: extra copies of the desired object can be actively pernicious, and the best is for a single agent to be maximising them.

In between, we have utilities that exhibit diminishing returns in the number of desired objects. For instance, a $$U$$ that counts the number of unique, flourishing humans, will discount identical copies, which are are likely if the number of humans created gets very large.

Utilities that exhibit diminishing returns also interact differently with uncertainty about the number of agents out there. For a total-style utility, a $$1/\Omega$$ chance of $$\Omega$$ agents maximising $$U$$ is the same as one agent maximising $$U$$ with certainty, while for utilities that exhibit diminishing returns, this can be very different.

# Negatives and extortion

So far, we’ve been considering utilities that would be independent, given infinite resources: the only trade-offs are those where different utilities want to maximise different things given the same resources.

But some utilities are more strongly opposed than that. Blue and green maximisers don’t oppose each other directly, but an agent that wants more green is opposed to one that wants less green. A “human flourishing” type utility would be opposed to a utility that values human suffering (or, more realistically, a different definition of flourishing that values some state of affairs that exhibits what we’d call suffering among human-like agents).

A utility has a negative side if the value of that utility can be decreased by someone’s actions in a causally disconnected way. The utilities $$U$$ and $$V$$ are opposed if each agent would prefer a trading partner maximise nothing rather than maximising some mix $$aU+bV$$. If two utilities are opposed, then at least one must have a negative side.

Obviously utilities that don’t oppose others, will find trading easier. Those that directly oppose others may find that what they are offered in trade, is simply that agents don’t maximise the opposed utilities.

This is also connected with issues of extortion. Only utilities with negative sides are vulnerable to extortion (though all utilities and agents are vulnerable to being extorted in the sens of getting a bad trade deal).

Because it’s likely that different agents will have different trading algorithms, large acausal trade networks will likely include agents who’s concept of fair deals differs from that of the majority of the network.

This could be implemented similarly to Scott’s idea here, or possibly by some alien and currently unimaginable fairness procedure.

In any case, agents whose conceptions of “fair” are the broadest are likely to be included in the largest trade networks (the cost being, of course, that they are likely to derive less profit per other agent in the network).

# Update rules and unlikely agents

Once an agent $$A_1$$ updates on their own existence, they have a distribution of other likely agents. They can therefore model what happens when other agents update on their own existence.

Suppose that $$A_1$$ predicts that the agents it’s most likely to be able to trade with are of type $$A_n$$; but suppose it also predicts that $$A_n$$ agents will conclude that $$A_1$$ agents are rare. Then it will have difficulty trading. In contrast, if $$A_1$$ expects that all its favourable trading partners predict its own existence, then it will trade advantageously. This phenomena is also subject to the usual double decrease.

Because of all the above, it’s possible that agents and utilities could be part of multiple acausal trade networks, joining them (in part) for different reasons. Or maybe, because of the double decrease problem or threats between trade networks, it will turn out to almost always be better to be member of a single one.

### NEW DISCUSSION POSTS

Note that the problem with
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Typos on page 5: *
 by Vadim Kosoy on Open Problems Regarding Counterfactuals: An Introd... | 0 likes

Ah, you're right. So gain
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> Do you have ideas for how
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I think I understand what
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>You don’t have to solve
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

 by Vadim Kosoy on Delegative Inverse Reinforcement Learning | 0 likes

My confusion is the
 by Tom Everitt on Delegative Inverse Reinforcement Learning | 0 likes

> First of all, it seems to
 by Abram Demski on Smoking Lesion Steelman | 0 likes

> figure out what my values
 by Vladimir Slepnev on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I agree that selection bias
 by Jessica Taylor on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

>It seems quite plausible
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

> defending against this type
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

2. I think that we can avoid
 by Paul Christiano on Autopoietic systems and difficulty of AGI alignmen... | 0 likes

I hope you stay engaged with
 by Wei Dai on Autopoietic systems and difficulty of AGI alignmen... | 0 likes