1. The Importance of AI Morality
The “artificial intelligence” programs in practical use today are sufficiently primitive that their morality (or otherwise) is not a serious issue. They are intelligent, in a sense, in narrow domains -- but they lack autonomy; they are operated by humans, and their actions are integrated into the sphere of human or physical-world activity directly via human actions. If such an AI program is used to do something immoral, some human is to blame for setting the program up to do such a thing.
Not too far in the future, however, things are going to be different. AI’s will possess true artificial general intelligence (AGI), not necessarily emulating human intelligence, but equaling and likely surpassing it. At this point, the morality or otherwise of AGI’s will become a highly significant issue.
The speculative notion of the “hard takeoff” makes the AGI morality issue even more critical. This term refers to a projected phenomenon in which a roughly-human-level AGI focuses some of its attention on modifying its own codebase, with the goal of making itself more intelligent. In theory, this kind of intelligence-increase-directed self-modification should lead to an exponential increase in intelligence. Of course, we have no way to estimate the exponent in this hypothesized growth curve, and there is always the possibility that some unforeseen limits to the intelligence achievable by intelligence-increase-directed self-modification will be discovered. In spite of these caveats, however, the “hard takeoff” scenario does seem to be a plausible one, and its mere plausibility gives some added urgency to the issue of AGI morality. The term “seed AI” is sometimes used to refer to an AGI that carries out intelligence-increase-directed self-modification, leading to a hard takeoff. Among believers in the plausibility of a hard takeoff, intuitive estimates of the actual speed of the envisioned takeoff vary. Some say minutes or hours, other say years.
The “hard takeoff” ties in with the more general notion of “the Singularity” (Kurzweil, 2002; Yudkowsky, 2001) – a term which refers, roughly speaking, to a point in future history at which technological change occurs so fast that the individual human mind can no longer follow what’s happening even a broad level. Plotting curves of technological progress in various areas shows that, if current trends continue, we will reach some sort of technological Singularity around 2040-2060. One may argue that the inertia of human systems will cause the rate of technological progress to flatten out at a certain point. On the other hand, if a point is reached at which most progress is carried out by AGI’s, then from the point on the “human inertia factor” would seem not to apply anymore. There are many uncertainties, but at very least, I believe the notion of a technological Singularity driven by AGI’s discovering new technology and science is a plausible and feasible one. It certainly might happen, and personally, I hope very much that it will. But if, at some point in the future, most progress is going to be achieved by AGI’s, this makes it all the more important to think about the morality of these AGI’s that are going to lead us into the future.
I will introduce here the notion of a “value hierarchy” and a “value dual network,” and propose that value systems consistent with the general “dual network” structure are more likely to survive successive intelligence-increase-directed self-modifications. I will also present a specific set of values, discuss how it fits into the value dual network framework, and then how it might pragmatically be instilled in an AI via a combination of explicit programming and experiential teaching. The value system of my own Novamente AI system will be used briefly as an example (Goertzel, Pennachin and Bugaj, 2002).
Eliezer Yudkowsky has recently put forth a fairly detailed theory of what he calls “Friendly AI,” which is one particular approach to instilling AGI’s with morality (Yudkowsky, 2001a). The ideas presented here, in this (much briefer) essay, are rather different from Yudkowsky’s, but they are aiming at roughly the same goal. I will discuss Yudkowsky’s ideas in a little more detail in Section 4 below. My basic conclusion is that Yudkowsky’s approach involves the artificial creation of a value hierarchy that violates the basic dual network structure of mind, and hence is very unlikely to succeed at producing an AGI value system that is robust enough to survive successive, drastic intelligence-increase-directed self-modifications.
Of course, given that we don’t actually have any AGI systems yet, all the ideas discussed here are highly speculative and hypothetical. This essay should be considered as an early attempt by an AGI-focused researcher to come to grips with some very thorny issues, which he realizes will be much more thoroughly addressable once AGI research itself is further along.
2. Two Questions About AGI Morality
The study of AGI morality involves many related questions, of which I will here consider only two:
Question 1: Which moral systems are likely to survive over time among AGI’s?
Question 2: How can we create AGI’s so as to render them likely to act in accordance with some particular moral system that we consider desirable?
Regarding the second question, different people may well have different goals for the morality of AGI’s. For instance, a member of the Jewish faith might consider it highly important for an intelligent AGI to obey Jewish moral strictures.
Personally, I do not adhere to any particular, rigidly defined cultural or religious moral system. But as a human being with his own individual value system, involved in an attempt to create a “seed AI” with indefinitely-expanding artificial general intelligence, there are nevertheless general moral principles that I would like to see an AGI have.
For one thing, I would like an AGI to consider human beings as having a great deal of value. I would prefer, for instance, if the Earth did not become densely populated with AGI’s that feel about humans as most humans feel about cows and sheep – let alone as most humans feel about ants or bacteria, or instances of Microsoft Word. To see the potential problem here, consider the possibility of a future AGI whose intelligence is as much greater than ours, as ours is greater than that of a sheep or an ant or even a bacterium. Why should it value us particularly? Perhaps it can create creatures of our measly intelligence and complexity level without hardly any effort at all. In that case, can we really expect it to value us significantly? This is not an easy question.
Beyond my attachment to my own species, there are many general values that I hold, that I would like future AGI’s to hold. For example, I would like future AGI’s to place a significant value on:
Of course, these are not very precisely defined values, and in many cases they will contradict each other, giving rise to difficult pragmatic choices. But intuitively, I feel that an AGI with these values is going to be a positive force in the universe – where by “positive” I mean “in accordance with Ben Goertzel’s value system”.
The rest of this essay will refer frequently to the specific values I have just listed. Of course, this is not a complete listing of my own personal value system, and I realize that other humans, including many whom I respect, have somewhat different value systems. Much of the discussion here is not tied to the precise details of this list of values. However, in thinking about these issues, I found it necessary to have a particular list of values in mind.
Having articulated a set of values that I would like future AGI’s to possess, the two questions posited above may then be case in more particular form:
Question 1: Suppose these values are embodied in an AGI; then how likely are these values to persist through AGI’s natural, self-modification-driven “evolution”?
Question 2: How can one go about causing the AGI’s one creates to possess these values?
3. The Structure and Diversity of Human Value Systems
The values that I have articulated above are extremely general in nature, and this is intentional. My view on AGI morality arises from a philosophical distinction I like to make, between
There may of course be tremendous variation in derived values among minds sharing roughly the same basic values. Among humans, there is wide (though by no means universal or perfect) agreement on a set of basic values, but the derived values in use are highly culture-dependent, often with a very arbitrary feel to them.
Of course, this division into basic versus derived values is an approximation to the nature of real value systems. Real value systems are inevitably heterarchies as well as hierarchies, with “basic” values gathering some of their strength from “derived” values as well as vice versa. However, it is my hypotheses that the hierarchical structure of value systems is both significant and important, and it is this aspect of their structure that I choose to focus upon here.
The notion of a value hierarchy/heterarchy is a special case of the notion of a dual network (Goertzel, 1993, 1994, 1997), which I have proposed to be one of the essential high-level emergent structures of the mind. It is fairly clear that the dual network structure is important to the human mind; that it will be important to future AGI minds as well is a conjecture that I believe highly plausible. One aspect of the dual network structure will be important for the discussion of value systems to be presented here: generally, in a dual network, the higher-up elements in the hierarchy are more abstract. Another aspect is that, in a dual network, control relationships are roughly aligned with informational relationships. Thus, the part of the mind that deals with the concept A, which is an abstract superconcept of concepts B and C, will also likely have a high degree of control over the parts of the mind dealing with concepts B and C. The value hierarchy/heterarchy I am proposing here is a subnetwork of the overall mental dual network that I have proposed before; and the basic values are the values near the top of this particular subnetwork of the dual network.
Articulating a list of the basic values of the human species is a large task, which over the next few decades will be best approached through a combination of brain science and cross-cultural anthropology. I feel sure that the basic “Ben Goertzel” values I have listed above are part of the story, though not all of it. Our basic values tie in closely, in some cases, with our biological heritage. For instance, “value and protect children” may be a basic human value; and if it is, it has a clear root in our biological heritage. It is important for us, as for all mammals, of caring for our offspring during their tender years; hence evolution wisely has supplied us with “value and protect children” as a basic value. Furthermore “value oneself extremely highly” is obviously a basic human value – all in all, we humans are very selfish creatures. When basic values like “value self” and “value children” collide, we have tough situations, of the sort that are frequently dramatized in our literature and cinema.
On the other hand, assembling a list of the derived values of various human cultures and subcultures, would be a truly Herculean labor. The diversity that occurs here is tremendous. For instance “valuing human life” is a basic value among nearly all humans. This basic value leads to a lot of particular moral rules, such as “Thou shalt not kill” and “Abortion is wrong.” The basic principle of valuing human life is common to all human cultures. But the particular moral rules emanating from this basic value are extremely various, and often quite arbitrary in appearance. For instance, in the US, capital punishment is considered acceptable by the majority of the population; in many other nations it’s considered barbaric and absolutely unacceptable. The strong opinions on either side of the capital punishment and abortion debates are an indication of the depth of the basic value of “value human life” in the human psyche, and also of the difficulty of interpreting such an abstract moral principle in practical terms.
We all know how strongly attached a human mind can become to its own culture’s moral system, often forgetting that humans from prior cultures, even deeply admired prior cultures, obeyed entirely different moral codes. For example, the basic value of “value and protect children” is connected, in our culture, with a very strong moral rule against pedophilia. To us, pedophilia is a terrible sin. And yet, homosexual sex with teenage and preteen boys was a common part of life in ancient Greece – in the time of Socrates, Plato, Democritus, Aristotle, and other early heroes of Western philosophy. Theirs was a different moral world – and yet, in a very broad sense, they did share the same basic values as we do. They just interpreted them differently.
Friedrich Nietzsche, in The Genealogy of Morals and other writings, posed a devastating critique of human moral codes, showing how in very many cases, these codes are effectively social regulations serving the function of maintaining social stability, and in particular, reinforcing class distinctions. The roots of the words for “good” in modern Western languages, he observed, are words pertaining to aristocracy. The roots of the words for “evil” tend to be words denoting the lower classes. In a similar vein, feminists have observed how many of our moral rules seem to exist primarily to enforce the patriarchal structure of society. For instance, why is adultery, in many cultures, considered a much worse moral crime for women than for men? There are sociobiological roots for this difference, but the reason why this particular sociobiological pattern has been so effectively enshrined in culture mores throughout much of history, probably does have something to do with its beneficial nature from the perspective of those in power in society (men). American human biology is roughly the same now as it was 150 years ago, yet female adultery is no longer considered nearly so direly immoral -- and this change appears connected with the generally greater value and power assigned to women in modern society. One culture or era’s morality is another’s outrageous, unethical oppression.
How is the tremendous variation in detailed moral codes among human cultures relevant to AGI morality? It speaks to Question 1 above: what kind of morality is likely to spontaneously emerge among AGI’s. Clearly, the answer is: if different human cultures have such different moral systems, then it would be bizarre and foolish to expect an AGI or a community of AGI’s not to have a very different moral system as well. Humans all have the same essential biology, and there are many commonalities across all human cultures, but even so, our moral systems vary tremendously. One would expect a human uploaded into a computer to rapidly undergo changes in moral orientation as a result. An AGI system built on totally different principles from the human brain, cannot reasonably be expected to spontaneously share any of the specific moral rules associated with modern Western culture.
A subtle point that arises here is that one species’s basic value may be another one’s derived value. This goes along with the observation that the common set of basic values existing among humans, may not extend to AGI’s. For instance, to us humans, “value human life” is a basic value, and “value life” is a largely separate (and much weaker) basic value. How much, for instance, fetuses or cows (or cow fetuses!) are valued, is culturally and individually variant. For an AGI that has “value life” as a basic value, on the other hand, “value human life” may well be a specific, derived moral rule, with a similar status to “value fetuses” or “value cows” in human psychology and culture.
4. Wiring in Moral Rules, or Friendliness to Humans
Now, let us grapple with Question 2 above: How can we induce or encourage AGI’s to maintain value systems in accordance with our own?
If one is dealing with AGI’s resulting from direct emulation of the human brain in silico, then the job is fairly easy. An AGI with a human brain will start out with a human value system. It may drift over time, but at least the right starting point will be there. Of course, human morality obviously has its flaws, and these will also be inherited by our digital offspring, in this scenario. Some might view this as extremely dangerous. One may envision a digital creature with all our aggressive, selfish and immoral tendencies, in the possession of a system with direct mental control over all the world’s weapon systems.
Next, what if one is dealing with AGI’s evolved from simpler artificial organisms in an “artificial life” like manner? Then one has the fascinating question of whether a self-centered value system is the inevitable consequence of evolution by natural selection. The more selfish aspects of the human value system are clearly tied to our origins in evolution by natural selection. The fittest survived, and those who valued themselves more, to a great extent, were fitter (i.e. better at surviving). But would it be possible to run an Alife evolutionary system in which compassion rather than selfishness was the fitness criterion? I don’t see why not. If a human or simple AGI were serving as the mating-master, determining whom in the evolving population gets to reproduce, it would be simple enough for that mating-master to explicitly reward compassionate behavior. Of course, there could be subtle phenomena preventing this kind of process from leading to complex, intelligent evolved systems; contemporary science does not give us the tools to refute or verify such a conjecture.
Thirdly, the case that interests me most – and the one I will focus on here -- is that in which AGI is explicitly engineered. My greater interest in this case is partly for selfish reasons: I myself am currently spending much of my time on an AGI-oriented engineering project. Also, I believe that engineered AGI is probably going to succeed more rapidly than the other two approaches. In this case, there are a variety of possible strategies one may take in order to induce one’s engineered AGI to possess values one considers desirable.
Most simply, one might try to force the issue, and explicitly program one’s AGI in a way that causes it to hold specific moral rules that are humanly important (or important in some specific human culture). This could be done in a variety of ways, some oversimplistic and totally unworkable, others more complex and possibly workable.
The most exaggeratedly simplistic approach, in this vein, would be to try to “program moral rules” into an AGI system. The problem with this is that an AGI is likely to be a highly complex, highly dynamic self-organizing system. It might delete the moral rules one programmed in, or spontaneously reinterpret the terms used in the moral rules, etc. It is of course conceivable that an AGI system could be created according to a design that would support the programming-in of rigid moral rules; but to me this doesn’t seem likely to me, because I believe that intelligence is all about fluidity and self-organization, and that it does not and cannot operate by the application of a set of fixed rules.
Furthermore, what happens when the AGI that has fixed moral wiring carries out extensive intelligence-increase-directed self-modification? What’s to stop it from self-modifying away the moral rules that were programmed into it?
Eliezer Yudkowsky has sketched a possible way around this objection. I cannot do justice to the breadth and depth of his thinking on the topic here, so the interested reader is directed to the online reference (Yudkowsky, 2001a). In brief, he wishes to create an AGI with a goal system that has Friendliness – essentially, the goal “Be good to humans” – as its top-level goal. The AGI he envisions will carry out all its actions with the Friendliness goal in mind. Other goals like creating new ideas, and keeping itself alive, will be viewed as subgoals of the Friendliness supergoal: e.g. it will keep itself alive because it reasons that keeping itself alive will help it be more Friendly in the future. He realizes that something as subtle as Friendliness cannot be programmed in to an AGI as set of rules; once a basic goal system of this nature is set up, he reckons, the AGI holding the goal system must be taught what is Friendly and what is not through interaction with humans. To avoid situations such as “Bad baby AI!! Dropping nuclear weapons is not Friendly!! Haven’t you learned anything yet??! No Tetris for you for a week!”, he suggests that AGI’s should be insulated from the outside world (including the Internet) until they have reached an adequate understanding of Friendliness.
Yudkowsky recognizes that self-modification could cause a system to drastically modify its goal system. However, he argues that if the system’s top-level goal is Friendliness, then the system is going to be modifying itself in ways that make it more Friendly. Of course, this does not rule out the possibilities that:
We do not currently possess a conceptual, scientific or mathematical framework adequate to assess the probabilities of these occurrences. Yudkowsky apparently intuitively estimates these probabilities as fairly low, whereas I intuitively estimate them as extremely high.
My essential problem with Yudkowsky’s approach is that it goes against the basic principles of the dual network. Of course, this is not a definitive rejection of his ideas by any means, since the dual network model of the mind is itself merely an hypothesis, bolstered by an intuitive piecing-together of diverse bits of information from various disciplines. But nonetheless, my objections seem to me at least as strongly founded as his arguments in favor of his position.
The dual network structure involves hierarchy/heterarchies of entities that become more abstract as one ascends the hierarchy. Yudkowsky’s proposed value hierarchy, on the other hand, has a very concrete value (“Friendliness” to humans) at the top, with more abstract and general values represented as derived values from this top-level value. To me, this is an intrinsically unworkable scheme. The dual network structure, with more abstract concepts at the top, seems to me a vastly more natural structure – it’s a structure that is seen throughout mind and nature. I believe that the dual network structure is a very general “attractor” for complex systems, and that self-modification will before long morph an AGI’s value system into a dual network form. Thus, I consider it unwise to initiate an AGI project by pressing a non-dual-network-consistent value hierarchy on an AGI. By doing so, in my view, one is virtually guaranteeing a total upheaval of the value hierarchy during future self-modifications. Whereas if one builds a dual-network-structured value hierarchy – then, even if this value hierarchy is not as ideal, at least one has a value hierarchy that has the right “form,” making it more likely that it will survive through numerous and copious self-modifications.
So, placing Friendliness at the top of a value hierarchy a la Yudkowsky violates the dual network principle. And furthermore, it violates it by placing at the top of an AGI’s value system, a particular value that is intuitively quite unnatural to any radically nonhuman AGI.
This gets back to Question 1 raised above: What moral systems are most likely to survive among AGI’s? I think that moral systems involving basic values that are “natural” for AGI’s are more likely to survive; and I think that value hierarchies consistent with the dual network are more likely to survive. If I am right, the goal of engineering AGI morality should then be to craft AGI’s with moral systems that are
5. Instilling an AGI with Basic Values
I believe that, if we are going to create a moral AGI with a set of values reasonably agreeable to our own, the best place to focus is on instilling the AGI with the right set of basic values – similar (though not necessarily identical) to the some of basic values that many humans share. The AGI can then be taught derived values like “be good to humans” in the context of its basic values. I think that the more general and abstract a value is, the more likely it is to survive through an ongoing process of intelligence-increase-directed self-modification. Even some of the relatively abstract “basic values” mentioned above seem to me to be sufficiently concrete that they will have to be taught rather than wired-in, and that they have only a modest chance of surviving successive intelligence-increase-driven self-modifications.
Above I outlined a specific “Ben Goertzel” value system. I said that I believe it is important to give an AGI a value system that causes the it to carry out activities that:
Ideally, one would like these basic values to be somehow “wired into” an AGI system. In reality, however, there may be severe limitations on the extent to which this is possible; one focus of the remainder of this essay is on the exploration of these limitations. Values that cannot be wired into a system, may still be taught to a system. I believe that the teaching of values will be more effective when it involves teaching more specific values that are compatible with more general wired-in values. I.e., wire in the top levels of the value dual network, and teach the lower levels.
At one extreme, some AGI architectures may not support explicit wiring-in of any values, even very broad and general ones like this. Of course, from a moral perspective, AGI architectures that do not support some sort of wiring-in of basic values should be avoided -- unless, as AGI engineering science progresses, this really seems to be impossible, for reasons that we cannot now foresee. Fortunately, I consider it fairly unlikely that an AGI architecture in which no wiring-in of values is possible will emerge. All current approaches would seem to support some level of wiring-in of basic values. This includes logic-based systems (in which values may explicitly be represented), neural net systems (which may emulate the basic-value-embodying chemical and electrical reinforcement systems of the human brain, on an appropriate level of abstraction) and hybrid systems like Novamente. Having said this, an AGI architecture not supporting wiring-in of values would not necessarily lead to disaster; in theory it’s possible for a complete value system to be taught.
A couple of the basic values I’ve listed are worthy of specific comment. First, note that I have included here a “keep itself healthy” basic value. Humans clearly have a strong “selfishness” basic value but this is a direct result of our evolutionary heritage, and in theory, it seems to me it should be possible for an AGI to do quite well without it.
One could reason as follows: “The other basic values listed above will induce an AGI to be good to others and to value the creative and productive things it does. Based on these values it will come to value itself insofar as it observes itself being good to others and doing creative and useful things. Furthermore, keeping oneself healthy is not intrinsically a very abstract notion; it can effectively be represented in a dual network as a child of some more abstract values. Hence, it’s not clear that a basic anthropomorphic self-preservation ‘basic value’ is necessary.”
On the other hand, in our recent experimentation with prototype early-stage AGI systems, we have found it pragmatically necessary to give these systems the goal of keeping themselves healthy. For an early-stage AGI, it’s important that it knows using up all available memory is bad automatically and immediately – as opposed to being able to derive, in principle, that using up all available memory is bad because it would be bad for other living and intelligent beings and for the universe, in the long run. Conceivably, the “keep oneself healthy” basic value is needed as a basic value only during the early stages of development of an AGI, and may be removed later on – this is the kind of issue we’re just not going to be able to resolve until we have a lot more experience with AGI’s and their relationships to their basic and derived values.
Next, Number 5 on the above list of values, “compassion,” is a particularly critical one. Compassion, broadly conceived, is the feature of the human mind that causes us to indirectly feel what others feel. This, I believe, is going to be the key to having future AGI’s treat humans well. One can imagine an AGI with no compassion whatsoever, one purely motivated by selfishness. Some humans appear to be made this way; but most humans are not, presumably because a nontrivial degree of compassion was evolutionarily desirable for humans during much of their early history. If an AGI has the basic value of compassion, it will generally desire other AGI’s, humans, and cows and ants as well to be happy.
Roughly speaking, the above-listed basic values may be divided into two categories: Easy and Hard. This categorization is based on my intuition as to which values will be easy to wire into an AGI system.
What is in common among the values in the Easy category is: these are very abstractly defined values. It is possible to give an AGI a purely mathematical approach to recognizing the extent to which these values are fulfilled at a given point in time, in a given situation.
On the other hand, the Hard values are in a sense less basic. For an AI, defining and recognizing “life” and “happiness” is a lot harder than defining and recognizing “my own health”, “diversity”, or “new patterns.”
I am fairly confident that it is possible to create an AGI, from the start, with an in-built value system encompassing the values on the Easy list. But what about the values on the Hard list? The problem is that, in its baby phase, an AGI will probably not have any concepts of “life” or “happiness”. My view is that these concepts are too subtle and multidimensional to “wire into” a system; these concepts almost surely have to be learned through experience. Of course, it is conceivable someone will build an AGI that comes into the world with pre-supplied fund of concepts (see www.cyc.com for one attempt), but my view is that this approach has a very low probability of success. If I’m right about this, then there is no way to give an AGI the values on the Hard list until it has already grown its own concepts of life and happiness, through its own experience. And yet, once it has grown its own concepts of life and happiness, it will probably also have grown an extremely complex self-organizing mind, which may be very difficult for us to manipulate in a useful way.
This issue is tied in with the issue of “value drift.” Of course, there is a real possibility of “basic value drift” through successive self-modifications of an AI, even for the Easy values listed above. However, my intuition is that these relatively abstractly-defined basic values are less likely to drift significantly than the Hard values, and that the Hard values are less likely to drift significantly than narrower moral rules like “be good to humans” or (yet more narrowly) “abortion is wrong”.
In terms of the dual network structure, I think that the Easy values are more abstract than the Hard values, and hence a “cognitively natural” value system would be one in which the Easy values were further up in the value hierarchy than the Hard ones. This is the opposite of Yudkowsky’s approach, in which it is proposed to place one of the Hard values (“Friendliness” to humans) at the top of the value hierarchy.
These issues may be elucidated by considering how values are wired into my own AGI system, Novamente (Goertzel, Pennachin and Bugaj, 2002). The system contains an object called a PleasureNode, which is the loose equivalent of the chemical and electrical signals that feed pleasure into the human brain in appropriate circumstances. The simplest goal the system has is to increase the “satisfaction” of its PleasureNode. This simple goal ramifies automatically into more complex goals, because different activities may maximize PleasureNode satisfaction over different time scales. Learning how to maximize long-term PleasureNode satisfaction is a tough optimization problem which requires the creation of numerous subgoals, many of which have no transparent relationship to pleasure. The system’s “happiness” is different from its PleasureNode activation, and is a complex spatial and dynamical pattern of activity, but as in humans, mind-level happiness is closely related to body-level pleasure; the complex patterns constituting happiness will often involve the PleasureNode in one way or another.
In Novamente, the PleasureNode is wired so that its satisfaction increases when the Easy values mentioned above are satisfied: when new patterns are created, when a diverse set of patterns is known to the system, and when old patterns that have been of value to the system in some way are preserved. This is easy. But what about the Hard values mentioned above? These are trickier. There are basically two approaches: wiring and training.
The wiring approach is hard but not necessarily impossible. The reason that it’s hard should be fairly obvious. We cannot currently put code in the Novamente system telling it to experience pleasure when the humans it interacts with are happy, because it does not know how to recognize human happiness. We cannot put code into the system telling it to experience pleasure when it creates or experiences life forms, because it does not know what life forms are. Now, we humans are born with an intuitive sense of how to recognize life forms, and how to recognize when other humans are happy, which is why we can experience these things as basic values. But these intuitive senses that we have are tied in with a huge amount of perceptual wiring. For instance, our brains are packed with circuitry aimed specifically at improving our skills at human face recognition. Very useful for recognizing whether other humans are happy or not, hence for the practical exercise of compassion.
Now, perhaps one could tinker with the mind of a mature Novamente AGI, and explicitly link its Pleasure FeelingNode to the concepts that it had created, representing happiness of humans. However, my intuition is that this kind of tinkering is relatively likely to be undone via successive systemic self-modifications, as opposed to more basic values that are more directly and thoroughly integrated with the system’s overall functionality. The basic problem is that linking the Pleasure FeelingNode to the “concepts representing happiness of humans” is not an easy matter. Most concepts in Novamente are represented in a distributed way, by fuzzy sets of basic Novamente “nodes and links”, and the whole-system dynamic patterns that they induce. Accurately finding the concepts representing happiness of humans in the mind of a Novamente would not be an easy problem. Going into the system’s mind and artificially linking these concepts to the Pleasure FeelingNode could cause complex and tricky alterations in the concepts themselves.
One thing that’s fairly clear is that, even in the wiring approach, if we want to have even a moderately decent chance of getting the Hard basic values mentioned above into our AGI, we need to make sure our AGI has the ability to easily recognize life and to recognize the happiness of humans and other living and intelligent creatures. Otherwise the system could well achieve superhuman intelligence without ever learning how to tell if a human is happy or not, let alone a mouse. It could well achieve a superhuman intelligence that is oriented in completely different directions: say, toward mathematics, algorithms, and meteorological and astronomical data analysis, and interaction with other AGI’s.
My intuition is that the wiring approach will not prove viable for the Hard values, and that the training approach will prove more feasible. In this case, one simply creates a direct mechanism by which the system’s teachers can increase the amount of pleasure felt by the system’s Pleasure FeelingNode. Repeated instances of feeling pleasure when in the presence of living beings, and when in the presence of happy humans and happy human beings, will teach the system that these are valuable things. Of course, this is much more likely to work if the system has appropriate perceptual mechanisms, attuned for the recognition of cues indicating the presence of life and the emotional state of living beings. Currently Novamente does not have such mechanisms, however, I feel that the provision of such mechanisms (accurate visual and acoustic sensors for example) is a smaller problem than other problems that we face in bringing our would-be AGI into being; this perception-level work is thus being deferred at least for a couple years.
This sort of explicit morality training is described in a fair amount of detail in (Yudkowsky, 2001a). However, Yudkowsky focuses on training the system for the value of Friendliness – “be good to humans.” I think that explicit Friendliness training may be worthwhile, but because of my intuition that more general values are more likely to be preserved through successive self-modifications, I would be inclined to focus at least as much on training regarding values like “life is valuable” and “happiness of intelligences and living beings is valuable.”
So: Suppose that the training approach works out for the Hard values, and the wiring-into-the-PleasureNode approach works out for the Easy values. What we have then is a system with a reasonably well-integrated basic value system. Some very general values, and some more specific values that have been hammered into its head by its human teachers. The value system has a proper dual network structure, in which more abstract values are at the top of the control hierarchy as well as of the conceptual hierarchy.
What happens when the system revises itself over and over again, improving its intelligence until we can no longer control or understand it? Will it retain the values it has begun with? Realistically, this is anybody’s guess! My own guess is that the Easy values are more likely to be retained through successive drastic self-modifications – but the Hard values do have at least a prayer of survival, if they’re embedded in a robust value dual network with appropriate basic values at the top. Only time will tell.
In sum, the approach that I advocate for creating morally acceptable AGI’s, is:
Of course, this approach provides no guarantee that once an AGI becomes 10000 times smarter than humans, it won’t throw out the basic values it started with, along with all that Friendly-to-humans nonsense. Perhaps it will come into contact with superintelligent transdimensional beings that will rewire its value system completely, to agree with their own value system, completely different from anything we humans are even capable of conceiving. But I think this approach stands a better chance of succeeding than wiring in particular moral rules, or explicitly creating an AGI with a Friendliness goal system, because I believe a value system which has a dual network structure, with relatively abstract and “AGI-natural” values at the top, is far more likely to be stable over time.
Finally, at risk of becoming tiresome, I will emphasize one more time that all these remarks are just speculations and intuitions. It is my belief that we will gain a much better practical sense for these issues when we have subhuman but still moderately intelligent AGI’s to experiment with. Until that time, any definitive assertions about the correct route to moral AGI would be badly out of place.
Many of the ideas presented here originated in discussions on the SL4 futurist e-mail discussion group, during the period 2000-2002, and in face-to-face discussions with Eliezer Yudkowsky and Peter Voss. Of course, these individuals do not fully agree with the ideas presented here; in fact, my interactions with these colleagues exemplify the way in which respectful disagreement can often be more stimulatory than agreement.
· Goertzel, Ben (1993). The Evolving Mind. New York: Gordon and Breach
· Goertzel, Ben (1994). Chaotic Logic. New York: Plenum
· Goertzel, Ben (1997). From Complexity to Creativity. New York: Plenum
· Goertzel, Ben, Cassio Pennachin and Stephan Vladimir Bugaj (2002). The Novamente AGI Engine: An Artificial General Intelligence in the Making. www.realai.net/article.htm
· Kurzweil, Raymond (1999). The Age of Spiritual Machines. New York: Penguin Books
· Kurzweil, Raymond (2002). The Singularity is Near. To appear. Excerpts available on www.kurzweilai.net/
· Yudkowsky, Eliezer (2001). Staring into the Singularity. www.sysopmind.com/singularity.html
· Yudkowsky, Eliezer (2001a). Creating Friendly AI. http://www.singinst.org/CFAI.html
 See (Kurzweil, 1999; 2002) for detailed arguments as to why AGI will exist and will likely surpass human intelligence.
 Some of these strictures, like avoiding the commission of adultery, would be easier for AGI’s than others.
 Of course, this kind of cross-species intelligence comparison is highly qualitative; even our current attempts to quantitatively compare intelligence within-species (IQ tests etc.) are still crude and dubious
 In case the reader is considering calling the FBI to have the author arrested, please rest assured that the author is a heterosexual male, married with 3 children, and is just as personally outraged by pedophilia as anyone else. Unlike many others, I recognize that my personal outrage in this as in other instances is largely due to my cultural conditioning. Yet, this won’t stop me from smashing in the face of anyone who messes with my kids….
 About which Jello Biafra, the intensely moral leader of the band The Dead Kennedys, made (in the song “Halloween”) the immortal comment: “Why don’t you take these social regulations, shove ‘em up your ass!”
 I will use the phrase “wiring in” here in a colloquial sense, to mean “explicitly modifying the hardware of a customized AGI hardware system, or explicitly modifying the program code of an AGI software system”. In fact, here we are thinking most directly about AGI software rather than customized AGI hardware, though the discussion is relevant to either option.
 The intrepid reader is referred to the works of Terrence McKenna, who reported frequent contact with machine-elves from another dimension (largely while under the influence of the psychedelic drug DMT)