Wild Computing -- copyright Ben Goertzel, 1999

Back to Wild Computing Table of Contents

Chapter 9:
Toward an Agent Interaction Protocol

1. Linguistics versus Pragmatics

Intelligent software agents on the Internet will, like humans, need to communicate via language. What language this will be, however, is not entirely clear. It might be English, or a computer-ized dialect of English; it might be a form of logic or mathematics; or it might be an evolved language totally incomprehensible to humans. Standardizing the discourse of intelligent agents is not something we should be trying to do at this stage. Standardizing the pragmatics of intelligent agent discourse, on the other hand, is an important task, and one that may well be a prerequisite for the emergence of Internet intelligence.

Humans communicate via a variety of different languages, all of which exist within the shared pragmatic/semantic space of physical discourse. If I were placed on a desert island with a person who spoke only Quechua, we would gradually learn each others' languages, by reference to physical objects (point at a tree and say "tree", etc.). On the other hand, if I were strapped against the wall in a dark room, and a person who spoke only Quechua were strapped against the wall on the other side of the room, we might, after months or years, begin to make progress -- again, by virtue of the shared physical reality. I would whisper "quiet" and yell "loud", etc. It would be difficult, but possible: this is the power of pre-linguistic, pragmatic reality, the common stream of perceptual data shared by various linguistic and nonlinguistic beings.

The notion of an Agent Communication Language, as presented e.g. in Fikes and Farquhar (1998), embodies this distinction between linguistics and pragmatics. Agent communication is broken down into two layers: an outer "agent interaction protocol" (AIP) layer, and an inner "content communication" layer. The AIP layer plays the role of pragmatics, giving a basic framework in which linguistic communications are framed. The content communication layer plays the role of language. The dissociation of these two layers is important, because right now, different commercial and academic AI developers radically disagree on what kinds of languages should be used to communicate between intelligent agents. However, agreement on an outer layer of pragmatics, an agent interaction protocol, is more closely within reach.

Taking a slightly different tack, the importance of agent interaction protocol may be seen by analogy to the birth of the Internet. The emergence of the Net as we know it required two main things, engineering-wise:

The same may be true of the World Wide Brain. Proprietary AI engines acting over intranets and extranets will be the equivalent of ARPANET, but to achieve a truly global brain we will need an equivalent of TCP/IP to allow the emergent intelligent network to grow beyond the bounds of the customers of a particular firm.

2. An Agent Interaction Protocol

The most promising agent interaction protocol proposed to date is a very simple message-handling protocol called KQML (Knowledge Query and Manipulation Language; see Fikes and Farquhar, 1998). In this section I will describe KQML, and suggest some revisions that would simplify KQML and render it more general, less tied to the formal-logical reasoning systems that gave rise to it.

KQML contains the following basic "performatives":


Each of these takes a certain number of parameters, drawn from:

	:content <expression>
	:language <word>
	:ontology <word>
	:in-reply-to <expression>
	:force <word>
	:sender <word>
	:receiver <word>

This is a fairly general and powerful framework, but it is not quite perfect, and its imperfections reveal its roots in logic-based AI. KQML was designed to work with KIF, a knowledge representation format based on first-order logic, with the addition of some object-oriented primitives (class, subclass of, slot, slot-value-type, etc.). Though in principle KQML can work separately from KIF, it contains a number of features that are really not meaningful in the context of AI agents based primarily on self-organization rather than logical deduction.

The presupposition of KIF is that knowledge is represented in terms of logical formulas, and that "Sets of expressions necessarily bottom out in expressions containing undefined (primitive) symbols" (Fikes and Farquhar, 1998). However, this is not a universal presupposition of intelligence: in a later chapter we will see that, in some AI systems, sets of expressions bottom out in highly complex combinations of nonsymbolic, nonlinguistic data. Just as, in the human mind, "chair" bottoms out in a collection of perceptual data and perceptual-action schema. This means that chair is a "primitive" symbol, but it does not mean that nothing can be communicated about "chair." Rather, it means that communication about chairs may possibly take place on a pragmatic, pre-linguistic level as well as on the linguistic level. KIF does not permit for pre-linguistic discourse; it operates only on the level of logic. However, if intelligent agents are to communicate in a truly intelligent way, they must be allowed to interact prelinguistically and prelogically as well as on the level of logical formulas. Imagine trying to teach a child to speak, think and interact using only logical formulas!

For instance, how, in the KQML language, would one agent point to a picture and say "pretty"? There is a primitive for "tell" but no primitive for "show" -- a glaring omission given that showing is perhaps the most essential aspect of pragmatic, pre-linguistic communication.

Also, the inclusion of a parameter for "ontology" is fairly artificial. When two humans communicate, they do not specify the ontology to which they are referring: ontology is implicit for humans, and it is implicit for many types of AI systems too. A better name for the "ontology" parameter would be "context." A formal-logic-based ontology would be one example of a context, a document would be another.

To explore this issue a little more deeply, let us observe what is really meant by an "ontology" by members of the research group involved with designing this protocol. Fikes and Farquhar (1998), advocating the OKBC ontology protocol, which fits in naturally with KIF, state that
"We consider ontologies to be domain theories that specify a domain-specific vocabulary of entities, classes, properties, predicates and functions, and a set of relationships that necessarily hold among these vocabulary items"
Clearly, this is a highly restrictive notion of an ontology, and not all intelligent agents are going to operate in such terms. In fact, any system that does operate according to such a restrictive notion of ontology is guaranteed not to be very intelligent. A useful theory of a given domain of knowledge is not going to consist of a set of necessary logical relationships, but a fluid, self-organizing network of possible relationships. Recognizing this, Koller and Pfeffer (1998) are seeking to generalize the logic-based notion of ontology to deal with uncertainty; but this is really only a tiny step in the direction of psychological plausibility.

The inclusion of "force" as a parameter in the agent interaction protocol is probably worthwhile. The forcefulness of a statement is important, witness the importance of voice volume in human speech. Of course, there are many other parameters besides forcefulness that are important in establishing the pragmatic context for communication. Even in the case of myself and the Quechua-speaker strapped at opposite ends of a dark room, we have a lot more than force of expression to go on! But force has a very pragmatic use in the case of questions: it indicates how badly the questioner wants an answer. Furthermore, it is something that has meaning in a variety of different AI architectures: neural network based systems, for example, will lead to different conclusions with different "activations", activation essentially being a version of "force" in the sense of AIP.

"Evaluate" might also seem to be a good thing to have -- one agent will often want another to evaluate something for it, although "evaluation" may not always be meant in the sense of logical evaluation. Something sent along for evaluation may be evaluated in many different ways, the result not necessarily being a truth value or logical expression, but possibly a whole body of information. But, evaluation in this sense is really a form of asking. The distinction between "evaluate" and "ask-if" is only meaningful in the context of logic-based AI systems.

Finally, "untell" is clearly not a generally necessary part of an AIP. What does it mean for a person to "untell" someone something, as opposed to denying that thing? This is a distinction that is meaningful only in the context of specific logical formalisms, not in general intelligent discourse.

On the other hand, since we have statements (tell) and questions (ask) in the protocol, it seems odd not to have a way to command an agent to do things. Commanding is clearly a much more natural behavior than untelling! Telling is informative; commanding is what the central machine in a local network does to the other machines in the local network. Computers do a lot of commanding in networks now, so an AIP should support this.

Finally, and more controversially, I would argue for one more additional performative in the Agent Interaction Protocol: mate. We have seen above the potential that exists in the "Internet as evolving ecology" concept. The best way to encourage this along would be to have genetic-algorithm-type crossover as a standard operation of agent interaction. Of course, each agent is responsible for determining what it can mate with and what the result will be like. Radically different agents will not be able to mate.

In conclusion, my preferred AIP would have the following performatives:


The omission of a few performatives from the KQML specification is not important, as it was not the intention of the designers of KQML that every agent use every performative. However, the addition of the "show" and "command" performatives is crucial. Without these, the protocol is really not complete. Show, tell, ask, reply and command are very basic features of communication, and the omission of showing and commanding in favor of untelling and denying really makes KQML inadequate as a general agent interaction protocol. Furthermore, the addition of "mate" is an explicit support of emergent intelligence that is not at all present in KQML.

Basically, KQML only allows agents to talk together and to evaluate logical expressions together. But this is not enough. We interact with each other in many ways beyond conversation, and artificial Net agents should have the same ability: at minimum, they should be allowed to show each other things on the Net, and to mate with each other. Showing, talking and mating are the basic operations out of which a society of intelligent agents may be formed, and may form the substrate of an emergent electronic mind.

Next, along these same lines the parameters that I think should be generally used are:

	:content < expression>
	:data <array of pairs (data type, data item)>
	:language <word>
	:context <array of URL's>
	:in-reply-to <expression>
	:force <float>
	:chance <float>
	:sender <word>
	:receiver <word>

Note the changes here. "Ontology" is replaced by "context", and "content" is augmented by "data", which is not merely an expression but may be any collection of data (e.g. any MIME type). "Force" is supplemented by "chance" -- the amount of randomness that is desired in the response. In a mating operation "chance" gives the mutation rate; in a query processing context, "chance" gives the adventurousness of the reply. These changes are small but important. Showing does not work unless content can be drawn from beyond the realm of logical expressions. And, context must be expressible as a list of multiple relevant items, not merely as a pointer to a formal, explicit "ontology." The URL's that "context" points to may be ontologies, or anything else; and the recipient may do with them what they wish. In general, it seems reasonable to allow specific agent types to use custom parameters within messages. As with tags in SGML, if an agent receives a parameter type that it does not know how to process, it may simply ignore it.

KQML with performatives and parameters thus modified I propose to call SAIP, for "Simple Agent Interaction Protocol."

In the following section I will explore some interesting uses of SAIP, but first it should be observed that SAIP also has down-to-earth applications. It is general enough that it could be used as a wrapper around very mundane "agent" interactions such as Web browsing. What a Web browser sends to a Web server is, on the AIP level, a "show" message. In a distributed search system, on the other hand, a search engine might send an "ask" command to a Web server, asking for information about particular topics. An e-mail is a "tell" message; an e-mail with attachments is a "tell" message bundled with a "show" message. The inclusion of the "show" command makes apparent the continuity between the sophisticated agent interactions of the future Internet, and the simple but highly useful interactions between the programs that serve on the current Net as "agents" for human desires.

3. Four Levels of Communication

To make these ideas more concrete, I will now explore some specific types of interactions that agents on the Net will need to make in the near future. This list of projected interactions is systematized according to the four levels of being as introduced above. It is not intended too be an exhaustive list, merely to indicate some very-near-future possibilities.

Note that the different levels of being do not correlate in any straightforward way with the different performatives of the SAIP. Interactions on any level of being may involve ask, tell, show or reply messages, in various combinations. Only the mate message is special: it lives on the level of wild being.

Agents interacting on the level of static being means agents simply exchanging information. The action of a Web browser or an e-mail client falls into this category. Information is being pushed from here to there. Standard search queries also fall into this category -- they involve a search for what is there, what is present. Finally, "push technology" in its simplest form is of this type as well. One agent is pushing information that it judges relevant -- pushing it to an agent representing a human, such as an e-mail program or an "active desktop," pushing it to an AI system, or wherever.

Note that the static-ness of these interactions is dependent on the breadth with which one defines the "system", the overall context of consideration. These interactions are static when considered purely on the Internet level. An e-mail is a static object transferred from one Internet agent to another, and when it is received, it just sits there, rather than entering into any particular dynamic. On the other hand, if one defines the overall system to include human computer users as well as computer programs -- justifiably, as humans are the most flexible and powerful Internet agents existing today -- then things work out quite differently. The human who reads the e-mail may enter the e-mail into his mind, where it may well trigger off all sorts of complex dynamics. For instance, it may cause him to send more e-mails, in which case it was a "static being" interaction with the e-mail client, but a process being interaction with the human/e-mail client joint system.

Agents that trigger processes are quite common today. The Web browser, when it activates a CGI script, is triggering a server process. Standard client-server data mining software resides on this level as well. If one is simply accessing database data, then one is in a sense triggering a process -- a lookup process -- but the essence of the interaction is static, accessing what is there. But if one is asking for a clustering to be performed, or for the future value of a time series to be predicted, then one is asking for the on-the-fly creation of new information: it is really the process one is paying for, as much as the information itself. As online databases become larger and larger, information retrieval necessarily becomes more a matter of process being than one of static being. The abundance of information means that the premium is not on mere possession of information, but on filtering of information, an intelligent process.

Hyper being agent interaction goes beyond triggering of processes -- now we are talking about the seeding of creative self-organization. If the querying of a large database has a focus on the "mining" process, the querying of an intelligent, self-organizing database has a focus on the complex, information-building processes seeded by the query. Asking an intelligent information system for knowledge about "AI technology as related to bovine menstruation" doesn't merely trigger off a search process; it triggers off a learning process within the AI system, which may potentially change the AI system's dynamic memory for good.

Hyper being queries may be specific requests for the creation of new self-organizing systems -- as in the case of a manager who wants to build a simulation of his organization within an Internet AI system, knowing that the running of this simulation will structure the way the AI system gathers information from other systems, etc. Or it may be implicit, as in the case of an ordinary query to a self-organizing database. The point regardless is that, rather than just a process, one wants to trigger off a complex process resulting in a system of interlocked informational causes and effects to create a new subsystem of the target system.

And of course, hyper being interactions need not be queries. Telling another system something can seed off self-organizations on its own. Preachers, teachers and marketeers are always in search of that magic formulation that will seed off self-organization in the minds of their listeners. But, one thing they lack is the ability to directly implant ideas into their audience's brain. Instead of pushing data to another agent, one Internet may push a small agent, intended to interact with the recipient agent and encourage it in certain directions. The "data" parameter in the Simple Agent Interaction protocol supports this: part of the data passed along may be an agent itself, e.g. in the form of a Java applet.

What "wild being" agent interactions means is nothing short of mind meld. This is not so much a query as an undirected open-ness: "Allow my brain to merge with yours and let's see what happens." The "show" command is essential here. One agent may show another portions of its own mind, and in return obtain portions of another agent's mind. The self-organization processes on either end may result in an agent-transcending feedback that binds the two agents into a combined emergent intelligent system.

Among appropriately sophisticated agents, wild being interaction could be triggered by a simple "ask" message: Do you want to mind meld? If the "reply" were "Go for it", then a repeated exchange of data would ensue. The data exchanged would not be static information, but rather networks of complex processes. Each agent, having taken some of the other's complex processes into itself, would absorb the other's tacit ways of thinking as well as its data. Of course, this kind of interaction can only occur among agents that have sufficiently similar internal workings to be able to exchange interior segments.

And mating, of course, falls into this same category. Mating is a process by which parts of two or more agents come together to form new agents. This is an extreme version of mind meld: instead of one agent taking in a small part of another agent's mind, an agent is formed of roughly equal parts from two others. Again, mating implies that the two agents mating are of the same "species", that they are close enough that combinatory operations are defined. But this specificity requirement is no different from what is found in the biological world: different species coexist, but do not interbreed.

If we count humans into the system, wild being interaction happens from time to time on the Net already. E-mail listservers and discussion groups, and Usenet groups, at their best, are a kind of collective thinking. Most often they are collective stupidity, but on rare occasions they can achieve a true emergent intelligence; I have experienced this myself on several occasions.

Purely electronically, wild being agent interactions do not seem to be happening yet, because the agents out there are too unsophisticated. But, there are no fundamental obstacles in the way of this happening. All that is needed is the adoption of a sufficiently flexible agent interaction protocol, and the creation of various special-purpose-driven agents using the protocol to communicate. This is something that will arise in the next decade purely due to market pressures, rather than out of any desire on the part of agent engineers to create emergent global intelligence. Just as static being querying gradually gives way to process being querying, as databases become large enough that filtering is more important than raw information; so process being interactions will gradually give way to hyper being interactions, which will gradually give way to wild being interactions. All four levels will remain important, of course, just as they do in biological systems. But wild being interaction will occur in what now seem the most unlikely places.

For example, an e-mail/web-browser type client will in the future be a mini-mind, a self-organizing network of user-relevant information. The process of querying a database will be a process of melding the mind of the user's client with the mind of the database, and then submitting simple process or static queries into the resultant emergent information store. And the relevance of this data will be assured by the accuracy with which the client's mini-mind reflects the user's mind, an accuracy obtained through routine daily wild-being interaction between the user and his/her computer.