On the Integration of Embodied and Unembodied Approaches
to Artificial General Intelligence
Human intelligence is hard to separate from human embodiment. We learn to think, as infants, largely in the course of learning to use our bodies. Even our most abstract goals can largely be viewed as sublimated versions of bodily goals. So many of our cognitive patterns and linguistic forms can be traced back to perception and action metaphors – from the inner visualizations with which many of us represent abstract knowledge, to the spatial relations implicit in pronouns like “above”, “by” and “through.”
A natural question, therefore, is: What about embodiment and artificial intelligence? More specifically, what about embodiment and artificial general intelligence (AGI)? It’s clear that some cognitive capabilities currently considered under the loose heading of AI – for instance, chess-playing – can be carried out just fine by unembodied software programs. But these are what Kurzweil (2000) calls “narrow AI” programs. What about the grand dream of AI – software programs that can autonomously learn about new problems and domains themselves, create new inventions and problem-solving strategies, reflect and spontaneously communicate. Suppose one accepts the “strong AI” claim that software programs, in principle, can achieve these things. Then the question still remains: Can these dreams be fulfilled by unembodied software programs, or do they require software programs embedded in sophisticated robot bodies, with sensors and actuators approaching, rivaling or exceeding the human body in diversity and precision?
Expert opinions on this have ranged all over the map. Some distinguished AI theorists and practitioners believe that embodiment is thoroughly unnecessary for AI; others argue that it’s absolutely critical, and any AI system that’s not embodied doesn’t have a prayer of approaching true AGI.
In this essay I’ll give my own opinion on the issue, which starts out somewhere in the middle of the two extremes, and ends up in the closer regions of outer space. First of all, I think embodiment is valuable but not necessary for mind in general, and for AGI in particular. I’ll give a detailed analysis of exactly why embodiment is incredibly useful for AGI. But then, I’ll argue that an exclusive and obsessive focus on embodiment is actually counterproductive for AGI. In fact, a lot of the technologies developed by the anti-embodiment crowd can be extremely useful for AGI – if they’re integrated into an embodiment-encompassing framework. This leads me to the notion of post-embodied mind – intelligence that possesses a body (or more than one), but also possesses knowledge not derived from its body’s sensorimotor capabilities in any way.
Post-embodied mind is not humanly natural, but I will argue that it’s actually a more effective approach to intelligence than the one embodied in our brains. Furthermore, it may actually be the condition that we’re evolving towards. A human with a special chip in their brains connecting them to the Internet and a massive network of interconnected databases – that would be a post-embodied human. It seems obvious that a post-embodied human would have a lot easier time learning and thinking than a plain old embodied human. Similarly, a post-embodied proto-AGI system is going to have a lot easier time getting trained to be really smart than a plain old embodied proto-AGI.
In addition to making these general points in some detail, I’ll also describe here -- very loosely, as this is not a technical article -- how this approach is being pursued in the context of my own current practical AGI project, the Novamente AI system (Goertzel et al, 2003; Looks et al, 2004). Basically, we’re pursuing a combination of natural language based teaching, loading-in of formally encoded knowledge, and control of a simple embodied agent. The AI architecture is specifically designed to enable this kind of broadly integrative, post-embodied approach. While most of the points I make in this essay are fairly abstract – and most of them seem pretty obvious to me now – some of them are ideas that occurred to me only in the course of the my practical work on Novamente during the last couple years.
Before launching into my own views, I’ll give a quick overview of what others believe about embodiment and AI. I make no pretense to completeness – I’m just surveying the major perspectives, and giving a couple examples of well-known adherents to each.
First of all, among the “embodiment is unnecessary” crowd, there are several subspecies. The “knowledge encoding” crew believe that it’s possible to write a long list of all the “common sense” facts about the world that every human learns through their embodiment, and simply supply this list to an AI, to fill up its brain with the basic knowledge it would get if it had a humanlike body. There are various opinions about how to write the list. Some, most notably Doug Lenat and the Cyc team (see www.cyc.com or Lenat, 1995), believe that it should be done in a formal mathematical language like predicate logic. Others, such as Chris McKinstry, founder of mindpixel.com, believe it should be done in a natural language like English.
On the other hand, some believe that explicit encoding of commonsense knowledge is unnecessary, and an AI system can pick up all it needs to know about the world through linguistic means – through conversations with humans. Jason Hutchens’ HAL project at a-i.com (Graham-Rowe, 2001) was an example of this.
The most impassioned and articulate defense of the opposite position – that embodiment is critical for AI – was made by Hubert Dreyfus in his famous book “What Computers Can’t Do” (1979). Dreyfus drew on Continental philosophy to argue that human intelligence is fundamentally situated in the body, and that considering intelligence as separated from embodied-ness is about as sensible as considering cognition as separate from memory. He considered the separation between intelligence and embodiment to be an artificial distinction made by a flawed research programme, without basis in reality. In “The Embodied Mind” (1992), Francisco Varela and his collaborators deepened Dreyfus’s critique by connecting it with Eastern philosophies of mind and being.
In terms of practical modern research, the embodiment-focused approach to AI is, not surprisingly, closely associated with robotics. Rodney Brooks (1999) and Hugo de Garis (De Garis and Korkin, 2002) are examples of well-known robotics researchers whose goal is to begin with simple robots carrying out simple embodied cognition, and then gradually make the robots and the corresponding cognitions more complex.
An alternative strategy, however, is to work with AI systems that control simulated bodies. At the present time, simulation worlds don’t provide the kind of richness of sensation and action that the physical world does. On the other hand, they’re a lot easier to play with: work on robot cognition has a dangerous habit of getting bogged down in the engineering and tuning of robotic systems, and never exploring the “cognitive” aspect all that fully. John Santore’s (2003) PhD work used Stuart Shapiro’s SnEPs AI system to control an agent called “Crystal Cassie” in the “Crystal Space” simulated environment (an open-source environment created for 3D gaming). Similarly, John Laird (2002) has initiated a project using the SOAR AI system to control agents in 3D gaming environments.
As hinted above, my view is an intermediate one. As is usually the case in AI, the truth lies somewhere between the two extremes. Clearly, embodiment is not really necessary for AGI, in a theoretical sense. However, equally clearly, embodiment makes the task of teaching a proto-AGI system a heck of a lot easier – to such a great extent that trying to create a totally unembodied AGI would be a foolish thing.
I’ve just used the phrase “teaching a proto-AGI system.” What I mean by this is as follows. I divide the task of creating an AGI into two parts: first building the initial software system, and then teaching this “baby mind” how to think, reason, feel, etc. and filling its mind with knowledge. The proto-AGI is this initial software system.
This bipartite division of the AGI-creation problem is valid for almost all, but not all, approaches to AGI. In some approaches to AGI, the first part is almost trivial, because it’s assumed that a very simple architecture can give rise to intelligence through repeated self-modification and self-analysis (Juergen Schmidhuber’s (2004) OOPS system is an example of this). In most approaches to AGI, however, both of the stages are substantial and important, and the initial software system prior to learning has much of same the structure and dynamics that the system will have after a substantial amount of learning has taken place. Learning, at first, just provides content, refinement to the initially given components, and additional components existing within the initially given components. Eventually of course an AGI system may learn enough to rewrite all its source code and become something totally different than its creators intended. However, in most approaches to AGI, it is assumed that this kind of total self-modification will occur only after the system has become a highly sophisticated general intelligence using its original architecture. In this context, what I mean by a proto-AGI is a software system that has the overall software structures and dynamics needed to support general intelligence – but lacks the specific knowledge needed to operate in the world, and lacks the specific control structures needed to operate in various (practical and cognitive) situations.
How then can one turn one’s proto-AGI into a genuine AGI? How does one turn an AI baby into an AI child? How critical is embodiment for this process?
Firstly, I think it’s clear that the knowledge encoding approach – taken on its own, without integration with other fundamentally different approaches -- is intrinsically problematic. The Cyc knowledge base (the biggest one out there) has well over a million predicate logic relationships in it, each representing a piece of commonsense knowledge, but it’s nowhere near complete enough to represent the knowledge in the mind of a small child (though it far exceeds the knowledge of most children or adults in various specialized areas such as geography, weights and measures, etc.). According to theoretical computer science, predicate logic has universal expressive power – so if the scope of human knowledge is finite (which is guaranteed, for instance, if the universe is computable or quantum computable), then it’s possible in principle to encode all human knowledge in predicate logic. However, this “in principle” observation doesn’t address the question of how to get all the knowledge out of human minds and into predicate logic. It may be that most of our commonsense knowledge is implicit, so that “we” (in the sense of our conscious linguistic minds) don’t even know what we know. In that case, producing a commonsense knowledge base may not be possible until neuroscience can scan the human brain with tremendous accuracy, and use a highly refined theory of brain function to read the commonsense knowledge out of the scanned images. Most likely the scanning technology will be there in a few decades (see Broderick, 2002 and Kurzweil, 2000 for reasoned estimates), but when such a refined theory of brain function will emerge is anyone’s guess.
On the other hand, the idea of teaching an AI system commonsense knowledge and cognition through the process of conversation is not fundamentally flawed, at least not in the obvious way that formal knowledge encoding is. Conversation may get across both explicit and implicit knowledge – often we say far more than we know we’re saying. My suspicion, however, is that teaching a proto-AGI system via conversation alone will be very, very painfully slow.
One factor that could speed up the teaching-by-talking process is the judicious use of formally encoded knowledge. Databases built by formal knowledge encoding may be very helpful to AI’s as they engage in conversations – providing explicit knowledge to help them anchor the implicit and explicit knowledge they obtain from their dialogues. For this purpose, it seems, commonsense knowledge bases built up using natural language will probably be more useful than knowledge bases like Cyc, created using formal logic. The reason is that a conversational AI system, to apply database knowledge in the context of conversation, must match conversationally-derived information with database information. If the database information is similar in form to the conversationally-derived information, then this matching problem will be relatively easy, and won’t require complex, computationally expensive and hard-to-tune inferencing. On the other hand, if the database information is in a form very different from the conversationally-derived information, then the matching problem can be quite subtle.
This is an issue my colleagues and I have already encountered in practice, in our work with the Novamente AI system. We have a system component that maps natural language sentences into “Novamente nodes and links” – Novamente’s internal knowledge representation. We can also load in knowledge from Cyc or other structured databases, translating their formal languages into Novamente nodes and links. However, the problem of matching natural language derived knowledge with database-derived knowledge turns out to be more irritating than one would initially think.
For instance, suppose one tells Novamente a simple sentence such as “Ben just gave Izabela the red ball.” One would like Novamente to be able to infer that, shortly after that, Izabela had the red ball. In order to make that inference, Novamente requires the commonsense knowledge that “After X gives Y to Z, then shortly after that, Z has Y.” This commonsense knowledge is implicit in Cyc, SUMO and some other available knowledge bases. However, extracting this knowledge from any of these databases in such a way as to provide value for language understanding, requires a substantial amount of logical inference – nothing Novamente can’t do, but more work than one would like one’s AI system to need to do to figure out something so very simple. On the other hand, if one simply supplies Novamente with the relevant commonsense knowledge by telling it the sentence “After X gives Y to Z, then shortly after that, Z has Y”, then Novamente will automatically map this sentence into nodes and links in a way that matches the way it maps the sentence “Be just gave Izabela the ball” into nodes and links. The commonsense background knowledge will match the conversational linguistic knowledge in form, making the matching between the two of them almost immediate, as it should be. For this reason, in the Novamente project, we would have much more use for a Cyc-like compendium in simple English than for Cyc itself.
Now, you might argue that a Cyc-like
compendium in English is unnecessary, because all that information is
implicitly there in the vast amount of text already present on the
Internet. But, while this is true, it’s
also true that extracting commonsense knowledge from general texts requires
sophisticated linguistic understanding – and sophisticated linguistic
understanding, as much recent work in computational linguistics suggests,
requires commonsense knowledge. So there
is a chicken-and-egg problem here, which may be resolved by formally encoding
commonsense knowledge in simple English.
The Novamente system already “understands” simple English (in the sense
of successfully mapping it into internal nodes and links, using an interactive
user interface that allows a human helper to correct its mistakes), so it can
understand a simple English database.
This commonsense knowledge can then help it to expand its knowledge of
English, which enables it to better understand free text, which builds up its
knowledge base, etc.
One of the reasons often given for the necessity of embodiment in AI is the need for “symbol grounding” to aid with language understanding and cognition. The idea here is that, for a system without any sensors or actuators, the word “apple” is defined solely by its relations to other words, and to abstract entities such as database records. On the other hand, to a system with a body that can see, smell, hear, taste, grab and throw apples – or carry out some subset of these interactions with applies – the word “apple” will be associated with a host of nonlinguistically based patterns. The word “apple” may in other words be “grounded” (Harnad, 1990). This grounding, in essence, consists of a lot of commonsense facts about apples – not just abstract facts like “apples are usually red or green or yellow and “apples are edible fruits”, but also a lot of more specific facts, such as information about the distribution of lengths and curvatures of apple stems, information about how bitter various apples have tasted at various stages of ripeness (and how similar their particular variety of bitterness has been to the bitterness of various other fruits), etc. In principle, all these facts could be encoded in a knowledge base using predicate logic or natural language, but it would be a heck of a pain – because most of these facts are implicit in the human mind, only articulated via great and unnatural effort. Most of these facts are things that emerge naturally and pre-verbally in any reasonably powerful pattern-recognition system supplied with a large number of examples of applies, and are cast into a verbal or otherwise easily-communicable form only awkwardly and difficultly.
Creating an AI system that can ground concrete terms like “apple” isn’t such an amazingly difficult thing. At this stage, given the advanced state of various relevant sorts of narrow-AI research, it’s basically an exercise in integrated systems design and the tuning of statistical learning: one has to take a perception component and a linguistics component and hook them up together, via some kind of cognition component that’s able to recognize correlations between words and patterns among perceptions. There is nothing here that strains current technology. Research projects such as the Robot Brain Project
(www.rdc.imi.i.u-tokyo.ac.jp/robotbrain/e/ ; see e.g. MacDorman et al, 2001) have sought to push in this direction.
A question here is: How can you tell if your AI system has correctly grounded “apple”? Being able to correctly distinguish apples from non-apples isn’t really good enough. You need to test whether the system can draw useful conclusions about apples – and about other things, using apples as a metaphor – with roughly the same kind of fluidity as a human.
What’s a bit subtler is the grounding of linguistic “glue words” like prepositions. Words like “through,” “past”, “over” and “near” embody subtle patterns of spatial relationship, which are hard for humans to articulate yet easy for us to manipulate implicitly. The semantics of these words is a classic case of implicit knowledge. Cyc is fairly weak in this area in spite of giving numerous different senses for each preposition and attaching predicate logic expressions to each. Cyc gives 14 different kinds of “in”, each defined by different logic expressions, but the “essential meaning” of “in” seems not to be contained in any of these definitions exactly – it’s a fuzzier kind of implicit knowledge. It’s not that the “real definition” of “in” can’t, in principle, be expressed in logical form – of course, it can be. But the real definition is a mixture of abstract formal concepts like those given in Cyc, with a load of specific examples of in-ness, familiar to us all from everyday life, and a load of perception-and-action-heavy patterns abstracted from these specific examples. Enumerating all these cases, for a system without a library of embodied experiences in its mind, would be extremely laborious. On the other hand, for a system with a library of embodied experiences, it’s easier: there are thousands of examples of in-ness in storage, and plenty of patterns on different levels of abstraction emergent from these. A new in-ness situation may be understood by reference to these various examples and the low and mid-level patterns abstracted from them, not only by the very abstract patterns among these examples that are captured in formal, Cyc-style definitions of the senses of “in.”
Note, I’m not arguing that Cyc-style abstract definitions don’t exist in the human mind, nor that they shouldn’t exist in an AI mind. The point is that these definitions, in an embodied mind, are only the top of a pyramid of more and more abstract patterns emergent from sensorimotor data. Sometimes questions are best resolved by reference to the top of the pyramid, and sometimes by some of the patterns lower down, even sometimes very-low-level patterns close to the level of the original sensorimotor data.
And now we have reached another significant problem with unembodied AI. The network of patterns on various levels of abstraction, which emerges in an embodied system from sensorimotor data, is not only directly useful to a mind, it’s also useful as a metaphor for thinking about non-directly-sensorimotor‑related things – and as a training ground for teaching the system to manipulate hierarchical/heterarchical networks of patterns (this sort of pattern-network is called a “dual network” in (Goertzel, 1994, 1997)).
Regarding the “metaphor” issue: It’s no coincidence that so many of our prepositions are spatial metaphors – to a large extent, we’ve learned to think by learning to think about the spatiotemporal world all around us. But it’s not only that we humans learn the meaning of “through” by reference to a database of experienced examples of through-ness, but also that we learn how to manipulate and interrelate abstract relationships like “through”, “in” and “because” by learning to manipulate and interrelate their concrete instantiations in the perceived and enacted world. Without this “playground” for learning to interrelate such relationships, it will be hard for an AI system to get the knack. Not impossible, just substantially more difficult.
Cognition requires a host of heuristics, many which are too subtle and implicit for us to explicitly program into our AI system
s – our AI systems have got to learn them. Reasoning about time and space, about other minds, about our own goals and actions, about plans over a short term versus a long time, and dozens of other examples – all these sorts of reasoning involve generic inferential methods, but they also involve case-specific inference control heuristics, which must be either learned by experience, or supplied by some very subtle sort of cognitive/computing science that is nowhere near existence yet. There is plenty of “AI” work in these areas at present, but it’s all extremely crude compared to what human children do. Contemporary AI planning systems, for example (see e,g, Russel and Norvig, 2002), exceed most adults in planning the operations of a factory when all the factors and conditions involved are well-defined – but perform worse than the average young child in planning problems where the situations and entities involved are nebulously defined and/or poorly understood.
The example of planning hints at some other important points. Embodiment isn’t just about getting a rich field of sensorimotor data in which to ground one’s symbols, it’s also about having a body to move around and generally control. The process of controlling one’s body with simple multi-time-scale goals in mind, is an excellent training ground for learning how to achieve goals by controlling systems in the context of diverse data rich in patterns on multiple scales. Furthermore, the process of modeling one’s own physical self is excellent practice for modeling one’s own mental self – and modeling one’s own mental self is critical, if one wishes to modify and improve oneself.
In none of these cases is embodiment strictly necessary for learning how to think – there are other approaches to learning planning, other approaches to learning introspection and self-control, other approaches to learning to balance patterns on multiple space and time scales, etc. The point is that embodiment provides a way of learning all these things, and more, in a richly interconnected way.
Another important point is that, since we are embodied, we have reasonable intuitions for the way various sorts of learning go in embodied systems. If a purely non-embodied learning mechanism is plausible – which I doubt, but I’m not sure about this – it would be more alien to us and hence will likely be harder for us to monitor and tune, and fill with our various human insights into how the world operates and how thinking may usefully be done. In fact, even an AGI embodied in a humanlike body would be fairly alien to us due to its different cognitive architecture. Given the current state of robotic technology, the most likely case to come about is an AGI embodied in a not-very-humanlike body – which will obviously have a psychology highly different from ours, but hopefully similar enough that a real psychological connection can be built between it and its human teachers.
One way to realize that it’s a mistake to equate “embodiment” with “precisely human-style embodiment” is to remember cases like Helen Keller – a woman who was blind, deaf and had a very limited sense of smell. Mostly she experienced the world through touch, and her teacher “spoke” to her in language by tracing out the shapes of letters on her hand. Helen Keller was intelligent and articulate and, it would seem, basically developed a full understanding of the world. Of course, touch – together with kinesthesia, and inner feelings like hunger, thirst, pain, sexuality and so forth – provides a lot of bits of data at every point in time. But there’s nowhere near the complexity of the data coming in through vision. A key point to notice is that touch is the sense that truly provides the sense of embodiment. The skin divides the self from the other, and allows one to tell when one is touching some other object, to feel the nature of the environment in which one is immersed, etc. To embody an AI, I’d rather have a digital Helen Keller – a robot with a body with skin and sensations all around it – than a typical modern mobile robot with a camera eye and some bump sensors and sonar.
There are two aspects to “learning how to think” that are hard to fully distinguish from each other: learning how to think at all, versus learning how to think like a human. All humans share, to a certain extent, a common world-model. Parts of this world-model may be explicitly stored in the fetal brain, at an abstract level, but I doubt this is a very significant factor. More important, I suspect, a lot of the world-model comes out of the relationships between our in-built human drives and feelings: hunger, thirst, pain, sexuality, movement, heat, cold, etc. Some of the rest comes out of the relations between our particular sensors and actuators – for instance, a mind with sonar would tend to build different world-models, as would a mind that controlled wheels or wings instead of legs. And some of the rest comes from the social relationships that ensue from our biology, e.g. parent-child relationships, sexual relationships, issues of trust and deceit and the signals that go along with these, etc. In order to share the human world-model in the sense that a human would, an AI really would need to have a human-like body. Helen Keller shared most of the basic human world-model, because she had a human body (and also note that her brain was wired to think in terms of the common human senses even though some of her sensory input devices didn’t work – for instance, she still had her visual cortex to think with, though it couldn’t be applied directly to visual sensations, it could still be applied to think about other tasks in a vision-like way).
Human patterns of cognition are closely tied to the human world-model, not just to the fact of embodiment. For this reason, an embodied AGI, unless it has a body modeled very closely after the human body, is not going to think like us – not even if its head is filled with Cyc-ishly or conversationally imparted “human common sense.” It will take this human commonsense and integrate it with its own nonhumanly-embodied experience and come up with something different. The closer its embodiment is to ours, the more easily we’ll be able to guide it as it learns how to think.
This notion of the “human world-model” is described by Eric Baum (2004), from a computational-learning-theory perspective, as inductive bias: a predisposition to recognize certain types of patterns in the world. For instance, it’s fairly well demonstrated empirically that, although the human brain doesn’t come with any specific linguistic knowledge at birth, humans do have some strong inbuilt biases to recognize certain kinds of linguistic patterns (Pinker 2000; Calvin and Bickerton, 2001). If we create an AGI system and give it neither linguistic knowledge nor linguistic-pattern-oriented biases, and then try to teach it human language, then we are placing it at a severe disadvantage relative to humans – even if we’ve given it an accurately human-like body to move around in! For this sort of reason, Baum argues that real AGI is a long way off: he believes that we need to wait for neuroscience to fully interpret the brain, so we can read the inductive biases out of it and program them into an AI system. On the other hand, I believe that we can work around the “inductive bias” problem via a messy, creative, integrative, post-embodied approach.
So what’s an AGI designer to do?
I believe there is a strong argument for creating embodied AI, as opposed to taking a wholly unembodied approach. However, I’m not a member of the “AI should be driven by robotics” camp. In fact I’m currently involved in a lot of work – on things like unembodied natural language processing – that most hard-core embodied-mind enthusiasts believe is worthless so far as AGI is concerned.
The practical fact is that, right now, given the technologies at our disposal, embodying AI systems is a pain. Furthermore, modern robot technology focuses on things like vision processing and arm movement, which are “far from the essence” where embodied intelligence is concerned – there’s been disappointingly little work on robot skin, for example. For these reasons, there’s a strong real-world motivation to go as far as one can possibly go with unembodied methods – and integrate embodiment only where avoiding it would lead to truly absurd inefficiencies. With this in mind, I like to distinguish two approaches to embodied AI: pure embodiment, in which the AI knows only what it learns via its body (though it may of course have predispositions toward certain types of knowledge programmed into it from creation), and impure embodiment, in which the AI has one or more bodies but may also have other important sources of knowledge (generally, some sources present in it from creation, some interactively presenting it with new information as it learns). What I advocate is impure embodiment, or, to give it a more positively-spun name, post-embodiment.
Post-embodiment is a path of pragmatism – but interestingly (and this is one reason for the name), it’s also perhaps a more forward-looking perspective than the classical embodied-mind view. We humans are embodied, and our minds exemplify the usefulness of embodiment for cognition – but, we humans are obviously far from optimal intelligent systems. More and more, as culture and technology advance, we voyage far beyond our bodies, and gain knowledge from remote regions of the outside world. It’s not too long until we’ll be able to graft computer chips into our brains and tap directly into quantitative, relational, textual and multimedia databases of various kinds – thus gaining vast amounts of information that have no direct connection to our physical bodies … information coming directly into our brains, not through our senses. Furthermore, virtual reality technology will in time mature and allow us to have the sense of occupying multiple “physical bodies” in multiple “places.” In short, there is an argument to be made that human intelligence will become less embodied during the next decades as technology advances. And clearly this will be an enhancement rather than a regression of our intelligence.
Post-embodiment for humans is a futurist hypothesis, but for AGI’s, it can be the initial condition. AGI’s should have bodies, because embodiment provides a highly efficient medium for learning a lot of valuable things in an interconnected way. However, AGI’s should also use every other means of knowledge-gathering and learning-stimulation at their disposal.
What does this mean in practice? The most sensible course for AGI seems to be threefold.
First, one should embody one’s proto-AGI system in one or more worlds so it can have rich sensorimotor experience. Simulated worlds may be useful for grounding of various relational concepts, and learning self-control and self-modeling. However, given the simplicity of current simulated worlds, real-world perceptual data seems to be very valuable as a complement, due to its greater richness, leading to a richer hierarchical/heterarchical network of emergent patterns
Second, one should converse with one’s proto-AGI system regarding all sorts of things, including what it sees and does in the worlds it interacts with
Third, one should feed one’s proto-AGI system as much prepared knowledge as possible -- preferably in simple natural language, but also in the form of quantitative, relational and logical data, since this is more readily available than “simple natural language” based databases
So for instance, if a post-embodied AI system is told “Ben gave Izabela the ball,” then it may know that shortly afterwards, Izabela has the ball, for one of two reasons. It may know this because someone explicitly told it that “If X gives Y to Z, then shortly afterwards, Y has Z.” Or it may know this because it observed a number of instances of real-world interaction associated with the word “give,” and noticed this as a pattern in several cases with actual X’s, Y’s and Z’s. The presence of formally-given commonsense knowledge may be useful, but this knowledge can only go so far and eventually experientially-based learning has got to take over. For instance, how does the system know that, after the giving is done, Ben no longer has the ball? This is not part of the intrinsic nature of giving, because if Ben gives Izabela a cold, he may still have the cold; and if a wife gives her husband a child, she still has the child. Again, it may be that the AI system has explicitly been given knowledge of the form “Balls are solid physical objects” and “If X gives Y to Z, and Y is a solid physical object, then shortly afterwards, Y has Z and X does not have Z.” On the other hand, it may be that the system has learned this pattern from observing a number of situations with a number of real X’s, Y’s and Z’s. In any particular example, one can retroactively say: “If the system had this and that pieces of commonsense knowledge fed explicitly into it, it could figure that out.” But the problem is that there are far too many examples, and extracting all this implicit knowledge from the human mind is quite slow and difficult. The above formula regarding solid physical objects, though it may look fine-grained to the untutored, is in fact an overgeneralization, and needs to be replaced by more refined formulas – et cetera, ad nauseum.
The middle ground proposed here is well-illustrated by this simple example. A purely embodied approach doesn’t give the system any explicit background knowledge, it lets the AI system learn things like the relation between giving and having on its own, at the same time as it learns linguistic syntax, learns to move its body, and so forth. A purely unembodied approach tries to encode every last rule in some formal way (using e.g. predicate logic or natural language). A post-embodied approach tries to have its cake and eat it too – for instance, in this example, this might involve explicitly encoding “If X gives Y to Z, then shortly afterwards, Y has Z” and letting the system figure out “If X gives Y to Z, and Y is a solid physical object, then shortly afterwards, Y has Z and X does not have Z” from experience. Note that the experiential learning problem here is significantly easier than in the purely-embodied case, because all the system needs to learn is a modification of a pattern it’s already been told. Or, in the post-embodied approach, one might tell the system the latter, more complicated formula about embodiment and physical objects, and let it figure out the refinements on this based on experience. The important point isn’t exactly where the fed-in knowledge leaves off and the experientially-learned knowledge begins, the point is that both kinds of knowledge exist and are allowed to synergetically feed off each other.
For another example, consider Cyc’s 14 senses of the word “in” – which a careful analysis shows not to be sufficiently fine-grained to provide a thoroughly human-like understanding of the types of in-ness in the world. In a purely embodied approach, the AI would learn some rough equivalent of these senses – and also acquire a more refined understanding -- through experience. In a purely unembodied approach, if the 14 senses are inadequate, they must be either improved by the creation of a larger database with 37 or 61 senses, or else improved via a series of conversations in which various examples of the many different shades of in-ness are discussed. In the post-embodied approach, we can begin with the notions of in-ness provided by resources like Cyc and online dictionaries, and allow them to be elaborated via insights into in-ness obtained by observing – and conversing with humans about -- various examples of in-ness in the perceived, interacted-with world. It seems clear that, if it can be pulled off, the post-embodied approach provides a more efficient approach to learning – as it brings more information into the AGI’s mind more quickly.
This approach is “impure” because it doesn’t require that the proto-AGI learns everything from experience. Rather, it assumes that one’s proto-AGI system has a flexible enough knowledge representation, and a powerful enough set of cognitive dynamics, that it is able to integrate diverse forms of knowledge (formal, experiential and linguistic) and use each form to assist its understanding of other forms. More specifically, for instance, it assumes that the perception module of one’s proto-AGI system is set up so that the emergent patterns that form within it can be easily and naturally matched up (by the system’s “inference” module, whatever form that may take) with the patterns that form in the system upon interpretation of linguistic utterances.
If there is a reasonable objection to this integrative approach, it’s that it may be too hard to get the emergent perception-patterns to match up to the linguistic-understanding patterns. In a purely embodied mind, the linguistic-understanding patterns will have largely emerged from perception-patterns, therefore this matching is almost guaranteed. In a post-embodied mind, this matching is not guaranteed, and the system must be engineered carefully to ensure that it occurs. Otherwise the kind of linkage shown in the above example – where perception-based learning elaborates patterns explicitly told to the system – won’t happen in practice, and the post-embodied approach doesn’t work.
I believe the representations and dynamics in the Novamente system are adequate to this task – and, over the next few years, we’ll find out if I’m right or not. At the moment we have a roughly half-complete proto-AGI Novamente, with the rest of the design worked out in moderate detail and awaiting implementation and testing. We’re just beginning the process of feeding the system knowledge in natural language, and transforming knowledge from Cyc and other knowledge repositories into Novamente’s knowledge representation. And a project to embody Novamente in a simulation world (based on Crystal Space (crystal.sourceforge.net), the same technology used for Crystal Cassie, and on the VOS framework (interreality.org) as well) has just begun last month, in May 2004.
The perspective on embodiment I’ve outlined here is highly controversial when considered in the context of contemporary, conventional AI theory. Even so, though I consider it to be basically “just common sense.” However, at least it’s not sterile “common sense” – in the context of Novamente, it has led me to some fairly interesting insights regarding the useful fusion of linguistic, formal and perceptual knowledge – some of which I’ve hinted at above. So the ideas given here constitute a conceptual perspective that has proved useful for guiding practical work on at least one proto-AGI system.
But enough about Novamente -- my main point here isn’t the Novamente approach in particular – it’s the general approach, the concept of post-embodied AI. Embodiment is important, it’s incredibly useful as a learning mechanism for minds – but we shouldn’t get carried away with it and assume that all non-embodied mechanisms for getting information into AI’s are Bad Things. Rather, in a sufficiently flexible AGI framework, it’s possible to have embodiment and also utilize the approaches typically associated with anti-embodiment philosophies. This may have the effect of making both the pro- and anti-embodiment schools of thought unhappy with one’s work. However, it may also provide the maximum rate of progress toward actually creating AGI.
Thanks are due to Mark Waser, Izabela Freire, Moshe Looks and Deborah Duong for recent conversations on these issues.
n Baum, Eric (2004). What Is Thought? MIT Press
Broderick, Damien (2002). The Spike.
n Brooks, Rodney (1999). Cambrian Intelligence: The Early History of the New AI. MIT Press.
n Calvin, William and Derek Bickerton (2001). Lingua ex Machina. MIT Press.
n DeGaris, Hugo and Michael Korkin (2002). "THE CAM-BRAIN MACHINE (CBM) An FPGA Based Hardware Tool which Evolves a 1000 Neuron Net Circuit Module in Seconds and Updates a 75 Million Neuron Artificial Brain for Real Time Robot Control", Neurocomputing 42, Issue 1-4
n Dreyfus, Hubert (1979). What Computers Can’t Do. Harper-Collins.
Ben, Cassio Pennachin, Andre Senna, Thiago Maia and Guilherme Lamacie (2003). Novamente: An Integrative Approach to Artificial
General Intelligence. In R. Sun, Ed., Papers of the IJCAI 2003 Workshop
on Cognitive Modeling of Agents and Multi-Agent Interactions,
n Goertzel, Ben (1997). From Complexity to Creativity. Plenum Press.
n Goertzel, Ben (1994). Chaotic Logic. Plenum Press.
n Kurzweil, Ray (2000). The Age of Spiritual Machines. Penguin.
n Harnad, Steven (1990). The Symbol Grounding Problem. Physica D 42, 335-346
n Laird, John (2002). “Research in Human-Level AI Using Computer Games,” Proceedings of the ACM
n Lenat, D. B. (1995). "Cyc: A Large-Scale Investment in Knowledge Infrastructure." Communications of the ACM 38, no. 11
Looks, Moshe, Ben Goertzel and Cassio Pennachin (2004). Novamente: An Integrative Approach to
Artificial General Intelligence. AAAI
Workshop on Human-Level Intelligence, October 2004,
MacDorman, K. F., Tatani, K., Miyazaki, Y., Koeda, M. & Nakamura, Y.
(2001). Protosymbol emergence based on
embodiment: Robot experiments. ICRA 2001: Proceedings of the IEEE
International Conference on Robotics and Automation, May 21-26,
n Pinker, Steven (2000). The Language Instinct. Perennial
n Russell, Stuart and Peter Norvig (2002). Artificial Intelligence: A Modern Approach. Prentice-Hall.
John F. and Stuart C. Shapiro, Crystal
Cassie: Use of a 3-D Gaming Environment for a Cognitive Agent. In R.
Sun, Ed., Papers of the IJCAI 2003 Workshop on Cognitive Modeling of
Agents and Multi-Agent Interactions,
n Schmidhuber, Juergen (2004). Optimal Ordered Problem Solver. Machine Learning, 54, 211-254
n Varela, Francisco, Evan Thompson and Elanor Rosch (1992). The Embodied Mind, MIT Press
 Though unfortunately, the Mindpixel database of commonsense knowledge facts appears to be mostly useless. It’s been entered in mainly by Web-surfers with childish senses of humor, and the collaborative-filtering rating mechanism on the site doesn’t seem to have worked well enough to ensure reasonable quality.
 In fact for this particular purpose (preposition meanings), the Novamente project has built our own knowledge resource which we call the Preposition WordNet.