The Evolving Mind -- Copyright Gordon and Breach © 1993
The self-organizational view of evolution presented in Chapter 3 is sufficiently general to apply to many different complex systems: not just ecosystems, but immune systems, brains, minds and societies as well. However, every evolving system has its own peculiar self-organizing dynamics that constrain and interact with the processes of reproduction and natural selection. In this chapter we will explore, on a very general level, the self-organizing dynamics of mind.
There are dozens of competing theories of mental structure and dynamics. And there are dozens of competing theories of neural structure and dynamics. Nice as it would be to accommodate them all, it is impossible to do so. The only way to meaning-fully explore the relation between evolution and the mind/brain is to adopt one particular point of view from the outset, and pursue its consequences. Here, therefore, we will begin with the theory of computational psychology developed in SI.
In SI I took pains to relate computational psychology to the technical psychology and AI literature, and along the way I also discussed the philosophy of neural network theory at length. I do not repeat those passages here; I simply introduce the required ideas without extensive discussion. The reader desirous of more background is referred to SI.
Intelligence and Mind
Let us define the structure St[S;(r,s)] of the system S on the interval (r,s) as the (fuzzy) set of patterns in the ordered set (S[r],...,S[s]), where S[t] denotes the state of S at time t (once again, we are glossing over coding considerations). This is the actual structure of the system S, as opposed to the structure of the system S's behavior. In the case where S is a human or some other organism, through psychology we only have access to the structure of S's behavior, but through biology we can also study St[S;(r,s)].
We may define a mind as the structure of an intelligent system. This means that a mind is not a physical entity but rather a Platonic, mathematical form: a system of functions. Mind is made of patternsrather than particles. In order to make this definition of mind complete, we now need only to define intelligence.
One should note that, although this definition of mind is highly general, it does exclude some things. For instance, one could conceivably uphold the view that the mind of a person is not inherent in that person, but is defined only relative to the society in which that person lives. In this view, the mind of a person would be best defined not as the structure of that person's brain, but as the structure of that person's brain plus the patterns emergent between that person's brain and the brains of the other people in the same society. Actually, one could adopt this point of view, and it would not contradict any of the analyses to follow. However, I will stick by the definition given in SI. Social factors are essential to mind, but not as central as inherent brain patterns _ they are mainly essential insofar as they invade inherent brain patterns. A person's mind would remain basically the same if all other people were obliterated without her knowing it.
Let us now turn to intelligence. Intelligence is often associated with the I.Q. test, but virtually no one believes that the I.Q. test captures the essence of intelligence. The I.Q. test primarily gauges the ability to solve spatial and logical puzzles at high speeds. Clearly, intelligence involves much more than this. In SI I give a mathematical definition of intelligence, which builds on the intuitive idea that intelligence is the ability to achieve complex goals in environments which are unpredictable in detail but not in overall structure.
This intuitive idea is formalized by considering several different notions of predictability: L-sensitivity, the ability to predict the future state of a system from its past state; S-sensitivity, the ability to predict the future structure of a system from its past state; RS-sensitivity, the ability to predict the future state of a system from its past structure; and SS-sensitivity, the ability to predict the future structure of a system from its past structure (here "structure" is to be understood in the sense of the theory of pattern). It is proposed that real environments are strongly L, S and RS sensitive, but not so SS sensitive _ so that one can predict future structure from past structure, but neither draw conclusions from nor make predictions regarding particular states. The intelligence of a system is then defined, roughly speaking, as the average over all "goal" functions A of [the complexity of A]*[the ability of the system to execute A in an environment that is strongly L, S and RS sensitive but not highly SS sensitive].
The I.Q. test measures the ability to achieve certain complex goals, which are sufficiently abstract to apply to a variety of differentenvironments. However, as many psychologists have pointed out, it does not measure the ability to adapt one's thinking to changing environments, nor does it measure the ability to achieve those extremely complex goals that require sustained creative thought. Psychologists, most notably Robert Sternberg (1988), have developed alternate tests designed to measure these aspects of intelligence. The general definition given in SI encompasses these tests as well as the standard I.Q. test. It is a general (though highly abstract) gauge of ability at pattern recognition and appropriate-pattern creation.
It must be emphasized that, by providing formal definitions for intelligence and mind, I am not claiming to have solved the myriad puzzles which surround these concepts. I am simply making explicit one particular way of thinking about them. Some of the uses of this point of view will be seen in the following sections.
6.1 THE STRUCTURE OF MEMORY
According to the preceding definition, intelligence is essentially inductive: it requires recognizing patterns among past situations and modifying behavior accordingly. And induction presupposes memory _ how can one predict the future if one doesn't remember the past?
More specifically, induction requires a memory that stores patterns. Recall that, in Section 3, we expressed a pattern in the form (y,z), where y is a program and z is a binary sequence. Many other representations are possible. However, this one has the strength of pointing out that what memory stores are algorithms for computing one thing from another .
This brings us to what Israel Rosenfeld, in The Invention of Memory (1989), calls a myth that has probably dominated human thought ever since human beings began to write about themselves: namely, that we can accurately remember people, places and things because images of them have been imprinted and permanently stored in our brains; and that, though we may not be conscious of them, these images are the basis of recognition and hence of thought and action.
Rosenfeld proposes an anti-hard-data theory of memory. He stresses that, rather than storing traces, memory stores procedures.
No one denies that part of memory consists of procedures. For instance, every time we form a spoken word from its syllables, we are applying certain phonological procedures. However, many contemporary psychologists would agree with Broca, who argued in 1861 that there is a crucial structural difference between the image-based memory responsible for the storage of words and their meanings, and the procedural "memory for the movements necessary for articulating words." Against such arguments, Rosenfeld summons an impressive variety of evidence. In each case, a phenomenon which at first appears to depend on image-based memory is seen to require procedural memory. For instance, he refers to David Marr's demonstration that shapes can be recognized as shapes without any reference to previous knowledge, merely by executing certain procedures on the appropriate visual stimuli. This shows that shape recognition probably does not depend upon searching a memory store of shapes until a match is found _ it may, rather, be a matter of summoning appropriate procedures from the memory. But if shape recognition does not require a store of shapes, then why should memory contain such a store at all?
I take it for granted that memory is primarily a huge set of procedures. However, one may sidestep the question of whether there is any hard data at all, by considering a literal image as a "constant procedure" _ as a function which give the same output for every input.
So, long-term memory is a set of procedures _ this does not tell us very much. The essential question is: is this set simply amorphous, or does it have some structure attached to it? How are the various elements of the set related to one another? In order to answer this question, we must ask: in what manner and for what purpose is memory utilized? In SI I suggest that the main use of memory is for analogical reasoning. It is argued that analogical reasoning is the mind's main tool for optimizing goal functions in complex environments.
Two different entities X and Y are defined to be analogous if there is some common pattern relating them: if St(X) and St(Y) have a substantial intersection, or Em(X,Y) is large, or finally if both X and Y are elements of St(Z) or Em(U,V) for some other entities Z, U or V. This point of view ties in closely with recent work on analogical reasoning in psychology and computer science. The basic idea is an obvious and old one: in reasoning about a given situation, the mind will look up what has been true of similar situation in the past.
Analogical reasoning, it is clear, requires a specific type of memory structure. It requires that analogous patterns be stored near each other. This leads us to the concept of a structurally associative memory.
It is worth pausing to comment on the primary role assigned to analogy, as opposed to deductive reasoning. In SI it is argued that deduction is essentially part of the structurally associative memory. A deductive system consists of a set of axioms and a set of rules for deriving propositions from these axioms. It is, therefore, an extremely compact way of storing a large number of propositions. According to this analysis, deductive systems are among the most effective procedures stored in the structurally associative memory _ but other than this, deduction has no special role in intelligence. This is very different from the "logicist" point of view _ fairly common among AI researchers _ which holds that all thought is at bottom deduction.
Structurally Associative Memory
The concept of associative memory is both old and well-known. In its simplest form, it states simply that the mind _ like a library or a supermarket _ tends to stores related entities near one another. The key problem with theories of associative memory lies in the term "related." Computer scientists create associative memories called Quillian networks (Goertzel, 1992 and references given therein), but they judge relatedness based on their own intuitions. The structurally associative memory is a model of memory which assumes that memory is associative, and takes relatedness to be defined in terms of common patterns. This approach ties in naturally with the theory of analogical reasoning sketched above: a structurally associative memory supplies pattern-based analogical reasoning with exactly what it needs, nothing more and nothing less.
Formally, a structurally associative memory is a graph, at each node of which lies either a Boolean function or a binary sequence (i.e. a constant Boolean function). These vertices are connected triadically, as shown in Figure 28; and each triadic connection has a certain weight.
A structurally associative memory is a network of emergence if its connections are drawn according to the definition of pattern: x, y and z are interconnected with intensity I _ as shown in Figure 28 _ if (y,z) is a pattern in x with intensity I. In general, a structurally associative memory may be considered as an approximate network of emergence.
A network of emergence is a perfectly structured memory _ like the frictionless plane, it is a mathematical entity which does not directly correspond to reality. A structurally associative memory, as just defined, is also an abstract mathematical construction. But it does, to a greater extent, correspond to reality.
More precisely, when, in the following, I speak of brainspossessing structurally associative memories, I obviously do not mean that brains contain abstract graphs, or little dots and lines. What I mean is roughly as follows. Where p and q are two entities stored in the memory of a certain mind, let d(p,q) denote the average amount of time that a mental procedure, having just processed p, will have to wait before receiving q for processing. This is a measure of the practical proximity of p and q in the mind. Now, suppose that there is some structurally associative memory so that, for most p and q in the mind, the shortest path between p and q in the memory graph is roughly proportional to d(p,q). Then structurally associative memory is a significant element of the structure of the mind in question.
Natural Selection in the Memory Network
A mind is constantly receiving new data. Therefore it must constantly reorganize itself in order to maintain as close as possible an approximation to the network of emergence structure. In this sense, "memory" is not only a structure but a dynamic.
One possible algorithm for memory network reorganization, discussed in SI, is the following:
1) assign vertices to any newly-introduced procedures, and single out a number of pre-existing vertices, in such a way that the vertices assigned and selected are approximately evenly distributed across the network (so that no two are too close to each other). Call these functions xi and their vertices N(xi).
2) switch the vertex of each of these procedures xi with the vertex of one of the procedures y with which it shares a pattern _ that is, assign xi vertex N(y) and y vertex N(xi) _ and see if, according to the new arrangement, the total amount of pattern which the network indicates in xi and y is greater. If so, let N(xi)=N(y) and let N(y)=N(xi) _ that is, make the switch permanent _ and apply Step 2 to xi again. If not, proceed to one of the other procedures y with which xi shares a pattern (say, proceeding through the xi in order of decreasing intensity of shared pattern). Once all the procedures with which xi shares more than e (some small fixed number) worth of pattern have been exhausted, exit Step 2. This algorithm moves procedures through the network toward their "proper position." The problem is that, even if xi is not in its optimal position, its position could well be better for it than the positions of its neighbors. Ideally, one might like to try each xi out in every possible position; but of course, this is not feasible. In SI an approximate solution to the dilemma is proposed, similar in spirit to Achi Brandt's (1985)multilevel algorithm for the Ising spin problem.
In any case, whether by a method like this or something totally different, the mind must reorganize its memory network in such a way that the probability of an entity staying in its current position is roughly proportional to the degree to which its neighbors are related to it (the degree to which it is a pattern in its neighbors, its neighbors are patterns in it, it and its neighbors share common patterns, or it and its neighbors are patterns in common entities). It is very interesting to ask what this means in terms of physical memory structures. For instance, in the language of the previous sections, what if we assume that each entity in the structurally associative memory is represented by some neural map? Then each map is, roughly, near related maps. And the probability of a map N surviving in its current position is going to be roughly proportional to Em(N,N1,...,Nk), where N1,...,Nk are its neighbors. Because if the network of maps closely mirrors a network of emergence in the vicinity of N, then this emergence will be large. But if this emergence is small, then there is no way that N can have common patterns with its neighbors, hence there is no way that the network can mirror a network of emergence in the vicinity of N.
This is extremely interesting in the light of the theory of evolution presented above. It implies that, in self-organizing to retain its structural associativity over time, a memory network causes the entities stored in it to evolve by natural selection. This constitutes a strong psychological argument in favor of the speculative neuroscientific hypothesis presented at the end of Chapter 1. If, as Edelman and Rosenfeld suggest, concepts are stored as neural maps _ then the theory of structurally associative memory dictates that neural maps evolve by natural selection.
The Immune Network As An Associative Memory
Recall from Chapter Two that the immune system has a very long memory. Its impressive feats of memory are, it is believed, carried out partly by long-lived "memory B-cells" and partly by internal images. Internal images are what interest us here.
Let us use the "shape space" notation introduced in Chapter Two. Suppose one introduces Ag = 5,0,0,0,5 into the bloodstream, thus provoking proliferation of Ab1 = -5,0,0,0,-5. Then, after Ag is wiped out, a lot of Ab1 will still remain. The inherent learning power of the immune system may then result in the creation and proliferation of Ab2 = 5,0,0,0,5. For instance, suppose that in the past there was a fairly large population of Ab3 = 4,1,1,0,4. Then many of these Ab3 may mutate intoAb2. Ab2 is an internal image of the antigen.
Putting internal images together with immune networks leads easily to the conclusion hinted at by the title of this subsection: immune systems are structurally associative memories. For, suppose the antibody class Ab1 is somehow stimulated to proliferate. Then if Ab2 is approximately complementary to Ab1, Ab2 will also be stimulated. And then, if Ab3 is approximately complementary to Ab2, Ab3 will be stimulated _ but Ab3, being complementary to Ab2, will then be similar to Ab1. To see the value of this, suppose
Ag = 5,0,0,0,5
Ab1 = -5,0,0,0,-5
Ab2 = 5,0,0,-6,0
Ab3 = 0,-4,0,6,0
Then the sequence of events described above is quite plausible _ even though Ab3 itself will not be directly stimulated by Ag. The similarity between Ab3 and Ab1 refers to a different subsequence than the similarity between Ab1 and Ag. But proliferation of Ag nonetheless leads to proliferation of Ab3. This is the essence of analogical reasoning, of structurally associative memory. The immune system is following a chain of association not unlike the chains of free association that occur upon the analyst's couch. Here we have given a chain of length 3, but in theory these chains may be arbitrarily long. The computer simulations of Perelson and de Boer, and also those of John Stewart and Francisco Varela (1991), suggest that immune systems contain chains that are quite long indeed.
The big difference between the immune system and the brain is that, while the brain's structurally associative memory stores complex processes, the immune system just deals with regions of surfaces. Thus the patterns stored by the immune system are very simple, and the emergences between related elements of the immune system are very simple. The immune system, though vastly more intelligent than most systems of the body, is not really very smart at all. It solves a mathematically easy optimization problem _ matching a given sequence of amino acids _ and it does so under fairly predictable conditions. It is remarkable that, even when faced with a relatively simple task such as this, the body resorts to such a large portion of the abstract structure of intelligence.
6.2 THE MULTILEVEL MIND
According to the theory of mind presented in SI and outlined above, intelligence is induction, and induction requires analogy, which requires structurally associative memory, which gives rise to deduction as a special case. Using algorithmic information theory and the mathematical theory of pattern, one may make these connections mathematically rigorous.
But this picture leaves something out _ actually, two things: collecting patterns and deciding upon behaviors. Analogy only works once one has filled the structurally associative memory with patterns. And mind requires some specific algorithm for attaining complex goals by balancing out contradictory possibilities, for deciding what sorts of analogies to ask the structurally associative memory to look for.
In SI, it is suggested that perception and motor control are governed by the multilevel methodology (which was discussed above, on a general level). For example, it is proposed that the perception of a visual image P is controlled by a "perceptual hierarchy" of subroutines. Subroutines on the bottom level _ level 0 _ output simple patterns recognized in the input image P. And, for i>0, subroutines on level i output: 1) patterns recognized in the output of level i _ 1 subroutines, 2) instructions to the subroutines on the level below it as to what sort of patterns to look for, and how to go about recognizing these patterns.
Furthermore, it is hypothesized that the multilevel hierarchy involved with choosing actions is analogous to the perceptual hierarchy, but operative in the opposite direction. In the motor control hierarchy, the lower levels deal directly with muscle movements, with bodily functions; whereas the higher levels deal with patterns in bodily movements, with schemes for arranging bodily movements. But the primary function of a processor on a given level is to instruct processors on the level immediately below as to what they should do next. The most crucial information transmission is top-down. Bottom-up information transmission is in general highly simplistic: it is of the form "I can do what you told me to do with such-and-such an estimated effectiveness."
For instance, to figure out how to throw a certain object, we start out with the motions familiar to us from throwing similar objects. Then, partly consciously but mainly unconsciously, we modify the "parameters" of the motions: we change the speed of our hand or the angle at which the object is tilted. Based on trial-and-error experimentation with various parameters, guided by intuition, we arrive at an optimal, or at least adequate, set of motions.
In SI, this theory of perception and motor control is discussedmuch more fully; numerous references to neurological data are given. However, the theory should not surprise the reader too much; it is not particularly radical. In Chapter 7 we shall consider the neurological theories of Jason Brown, and we shall see that he has proposed a "perceptual-motor" hierarchy rather similar in form to the one just discussed.
6.3 GENETIC OPERATORS IN MULTILEVEL PERCEPTION AND CONTROL
Now, having introduced the perceptual-motor hierarchy, we are ready to do what was promised in Chapter 5: to show how crossover operations can emerge automatically from simple psychological structures and dynamics.
Let us look a little more closely at this "multilevel methodology." In a multilevel hierarchy of programs, we may denote the programs on the kth level by Pk,1,...,Pk,n(k). Let us suppose that each Pk,i connects to a set of N(i,k) << n(k-1) processors Pk-1,j, and furthermore that this set can change over time. That is, each Pk,i has the option of connecting to a greater variety of Pk-1,i than it can connect to at any given time.
In this context, a process dependency graph is a labeled tree, defined inductively as follows. Every level 1 program P1,i defines a process dependency graph consisting of one vertex labeled i. And, for k>1, every level k program Pk,i defines an process dependency graph consisting of a "root" vertex labeled i, connected to the root vertices of the process dependency graphs of all level k%1 programs to which Pk,i is connected in the multilevel hierarchy.
Now, suppose that for k>1 each Pk,i is studying a number of Pk-1,j learning what makes them more effective, and then modifying some of the Pk-1,j accordingly. Since most Pk-1,j are probably connected to more than one Pk,i, there may well be some conflicts here: Pk,i(1) will give Pk-1,j one instruction, but Pk,i(2) will give it another. For example, suppose that one of the strategies followed by the Pk,i is to take aspects of the most effective Pk-1,j which it has studied, and transfer them to some of the Pk-1,j which it controls. Then, if Pk-1,j is one of only a few (k%1)th level programs to which both Pk,i(1) and Pk,i(2) are connected, it is not unlikely that from their different experiences Pk,i(1) and Pk,i(2) will have gleaned different ideas about how to make Pk-1,j more effective.
When a program receives contradictory instructions, it must on some basis accept only one of them. Perhaps (as suggested in SI ) itkeeps a record of which higher-level programs have served it best in the past. In any event, let us analyze the consequences of this sort of decision. Suppose that both Pk,1 and Pk,2 control Pk-1,3, but Pk,1 and not Pk,2 controls Pk-1,1, and Pk,2 and not Pk,1 controls Pk-1,. Furthermore, suppose that at a given time Pk-1,1 and Pk-1,2 are working very effectively, but Pk-1,3 is not. Then Pk,1 may recommend Pk-1,3 to mimic Pk-1,1, whereas Pk,2 may recommend it to mimic Pk-1,2. Suppose that Pk-1,3, for whatever reason, chooses to listen to Pk,1. Then in the immediate future, Pk,2 will have an imitation of Pk-1,1 as a subroutine. The process dependency graph of Pk-1,3 will now, temporarily at least, resemble the process dependency graph of Pk-1,1. Therefore, crossover has taken place, on the level of process dependency graphs. Given two process dependency graphs, a new one has been formed consisting of part of one and part of another.
In Section 3 of this chapter it was suggested that crossover might often be effective in the context of multilevel perception. Here we have argued that, to a certain extent, the operation of crossover is inherent to multilevel perception and control. Put together, these two ideas indicate an intriguing similarity between biological evolution and psychological evolution. They provide a psychological justification for the hypothesis made earlier, that maps in the brain evolve by crossover.
Finally, let us tie this result in with the GSA algorithm, briefly mentioned above. It is clear that the process dependency graphs associated with multilevel perception and control reproduce by crossover. However, it is even more transparent that they reproduce by mutation. This follows from two simple facts: error, and self-organization. In biological systems, programs will change even when not explicitly instructed to, because of noisy components, and because biological programs are messy and not specifiable with mathematical precision. Therefore, to the extent that multilevel hierarchies evolve genetically, they do so by a combination of sexual and asexual reproduction. They are closer to the GSA algorithm than to the simple genetic algorithm, or to straightforward evolutionary mutation.
6.4 MIND AS A MULTIBOOLEAN FORM
In Section 5.6 we introduced simple, directed and generalized multiboolean forms, and related them with the process of epigenesis. In this section I will explain how these same constructions may be used to model mental process. This analysis provides a whole new way of looking at the perceptual-motor hierarchy.
Recall that, according to the theory outlined in SI and summarized above, mind is made of patterns. And patterns were defined as ordered pairs (y,z), where y is a function and z an argument.
In order to make the the concept of pattern concrete, one must specify the spaces in which the entities y and z are contained. In preceding sections we have assumed that the functions y are computable maps from binary sequences to binary sequences, and left it at that. In this section, however, I will suggest a slightly different way of looking at things. Mathematically, this new view does not contradict the statement that the y are Boolean functions and the z binary sequences, but conceptually it leads one along quite different pathways.
I suggest that the mind may be usefully modeled as a collection of patterns whose component functions are set multiboolean forms. Define a first order set multiboolean form, or a 1-form, to be a map from finite sets of sequences to finite sets of sequences; and for k>1 define a kth order set multiboolean form, or k-form, to be a map from finite sets of set multiboolean forms of order less than k to finite sets of set multiboolean forms of order less than k. This is a substantial extension of the simple and generalized multiboolean forms introduced in the previous chapter. Note the new terminology: the k-forms defined here are different from the k-boolean forms defined above. Every k-boolean form is a k-form, but not vice versa.
This framework does not contradict the assumption made in preceding sections, according to which mind is made of Boolean functions. For one may construct a low-algorithmic-complexity "code map" which assigns every Boolean function a unique code sequence, and in this way all types of multiboolean forms may be represented as Boolean functions. But multiboolean forms provide a handier way of looking at certain Boolean functions. For example, instead of saying something like "Boolean functions that take in sequences encoding Boolean functions that take in sequences encoding Boolean functions," one may as well dispense with the formalism of coding and just say "3-boolean forms." So long as one does not lose sight of the fact that one is still, at bottom, just dealing with plain old computers, multiboolean forms are handy conceptual tools _ there is much to be gained by working with them.
How do multiboolean forms help us to understand the mind? Let us begin by taking binary sequences to represent input and output. In order to act intelligently, a system must be able to predict the input that will likely result from a given output, as well as the input that will likely follow a given input. Therefore it will have to effectively compute patterns in sets of binary sequences, and functions that map binarysequences to binary sequences, i.e. 1-forms. This "effective computation" may be accomplished by biochemical processes that have nothing to do with what we think of as computation; nonetheless, computation in the formal sense is being done.
But how does a system know how to predict the effects of its actions, or to predict the future from the past and present? It must somehow determine which 1-forms are appropriate. This requires it to recognize patterns between 1-forms and collections of sequences. In order to do this, it must compute 2-forms. These 2-forms will help it to select the 1-forms with which it predicts: they will tell it how to, based on the past performance of various 1-forms, select 1-forms to use in the future. But the regress continues: how, then, does it know which 2-forms to use? 1-forms predict, 2-forms predict how best to predict. Similarly, a system may use 3-forms to tell it which 2-forms it should use in the future, and which ones it should use in which situations. 3-forms will tell it how best to predict how best to predict. A finite system cannot contain an infinite hierarchy of k-forms; for some K, the K-forms must be fixed in advance and not modified by (K+1)-forms. This infinite regress was, in essence, discovered by Plato; but needless to say, we have expressed it much more rigorously than he did.
To better interpret the formalism, let us introduce the terminology of "levels of learning." 1-forms, in themselves, do not represent learning; they are instinctively present in all animals. No mobile creature could long survive without being able to predict the results of its actions, to some degree. 2-forms, then, represent the first level of learning: one is learning how events relate to each other, by learning what sorts of functions can predict one event from another and hence simplify the total set of events.
For instance, a child learns that lightening almost always follows thunder, and rain often follows thunder, but dinner does not habitually follow thunder. She learns that if she throws a frisbee in a certain way, the frisbee will ascend into the air and then return to her. These observed patterns are 1-forms, which are obtained by experimenting with various 1-forms. The process by which these 1-forms are experimented with is called learning, and it is executed by means of 2-forms.
A 3-form, then, represents the second level of learning, which is learning how to learn. If a child becomes a frisbee champion, she will no longer need so much trial and error in order to learn new ways of throwing a frisbee. She will not only be better at throwing frisbees, she will have more of a knack for picking up new tricks.
Similarly, as a person learns more and more mathematics, sherequires less and less time in order to assimilate a new mathematical theory. For example, she learns that it is not wise to begin by reading every proof of every theorem in detail. One should rather begin by reading the major theorems, skipping most of the proofs but looking at a few important examples. Then, having obtained a basic intuitive understanding of what one is to be learning, one is ready to look at the details. This is a strategy for learning. To arrive at an effective strategy for learning, one must compare the efficacies of various strategies for learning. This requires that one compute functions of 2-forms; i.e. that one compute 3-forms.
The reader may have noticed that this "psychological hierarchy" of set multiboolean forms is an example of the multilevel methodology discussed above. Each level k form, k>1, selects strategies for certain level k-1 forms. The perceptual-motor hierarchy, as defined above, is a hierarchy of Boolean functions, each one taking as its input the output of some of those functions below it in the hierarchy. However, I suggest that, in this view, some of these "input/output" sequences must be considered coded versions of Boolean functions, or higher-order set multiboolean forms. In this sense, I propose, the perceptual-motor hierarchy consists partly of a hierarchy of k-forms of increasing order.
Gregory Bateson spent many pages vigorously promoting this same suggestion, albeit in a somewhat different language. In Steps to An Ecology of Mind, 1972 he analyzed many different cases of learning how to learn in humans and other animals. "Learning how to learn" is an excellent description of the function of 2-forms in the hierarchy described above.
In fact, learning how to learn may have been Bateson's greatest preoccupation. One of his most interesting proposals regarding higher-level learning had to do with our apparently limited capacity for it. In Mind and Nature, Bateson (1980) suggested that while humans can learn, and learn how to learn, and even learn how to learn how to learn _ anything past this level is beyond human grasp. In the present terminology, this means that the human brain cannot compute 5-forms.
This hypothesis seems reasonable to me _ I cannot believe that it underestimates by any more than one or two. One hears a lot about the "magic number" 7 plus or minus 2 _ the number of entities which human consciousness ("short-term memory") can contain at one time. Although it is not so easy to measure, Bateson's number is no less essential. It is a fundamental limitation on human adaptability and intelligence.
Epigenesis and Thought
Finally, let us pause to observe the close connection between the mental model just presented, and the epigenetic model of the Section 5.6. There, epigenesis was modeled as the successive application of generalized multiboolean forms to sequences and to each other, to create new sequences and new DMF's or GMF's. In the special case where only simple multiboolean forms are involved, the end product of this process is a certain sequence. In general, the "end product" is a system of constantly interacting GMF's.
Similarly, here mental process was modeled as the successive application of multiboolean forms to each other, to create new output sequences and new multiboolean forms. The "end product" here is either the set of output sequences, or the total self-structuring system of interacting multiboolean forms. On this extremely abstract level, epigenesis and thought are seen to be similar processes. They may both be modeled as multiboolean dynamical systems.
This insight is relatively uninformative, since very little is known about multiboolean dynamical systems. However, when one introduces crossover into the picture, it implies that two apparently very different questions may have similar answers. In Section 5.6 we modeled the sexual reproduction of organisms in terms of genetic programming on the level of process dependency graphs of epigenetic programs. In Section 6.5, however, we showed that subgraphs of the perceptual-motor hierarchy might reproduce sexually, according to the logic of genetic programming. In both cases, there is an open question of how effective the crossover operation is _ how amenable to crossover the relevant structures are. Now we have seen that the two structures are related in an interesting way. Perhaps the two different questions have the same answer, or similar answers?
In Section 7.1 we will explore Jason Brown's hypothesis that the perceptual-motor hierarchy and part of the hierarchy of embryological processes are closely related as a matter of evolutionary principle. If this is indeed the case, then the musings of the previous paragraph are more than abstract speculation; they are a most reasonable hypothesis, one deserving of serious attention.
6.5 T HE DUAL NETWORK OF MIND
We have looked at the mind from several different angles: as the analogue of an ecosystem which evolves by natural selection, as the seat of form creation by sexual reproduction, as a multilevel control structure.... Nowit is time to explicitly draw these different threads together, into a unifying model.
The central hypothesis of SI is that the structure of every intelligent system contains a certain prominent pattern called the master network. As portrayed in Figure 29, this is not a network of neurons but a coarse-grained network of processes: induction, deduction, analogy, optimization, perception, motor control, and structurally associative memory. It is suggested that, in order for a system to be intelligent, it must contain all these processes, and these processes must be interconnected in a way which facilitates extremely rapid communication.
As is pointed out in SI, the fact that these processes may be analyzed separately does not imply that they are physically distinct in real systems. In fact, in the final chapter of SI the possibility is raised that they can never be physically distinct _ that in order for suitably rapid communication between processes to occur, the different processes must actually occupy the same space. An example computer architecture called the nAMP is described, which contains all of the required processes; and in this architecture the different processes do physically overlap. In particular, in the nAMP architecture, the perceptual-motor hierarchy and the structurally associative memory are essentially identical _ they are complementary control structures acting on the same network of processors.
I now feel that this feature of the nAMP is almost certainly a necessary one. The perceptual-motor hierarchy and the structurally associative memory are functionally separate but their physical substrates are the same. This conclusion makes a great deal of sense in the context of the brain. Neural maps serve both as the patterns in struct-urally associative memory, and the processes in the perceptual-motor hierarchy. This fits in perfectly with Rosenfeld's idea that memory stores processes: of course it does, since the processes which it stores are exactly the same processes that take care of the work of perceiving and doing.
But, you may be wondering, how can the same network of processes be structured in two different ways? The perceptual-motor hierarchy must be structured hierarchically _ that's what the name says. In graph-theoretic terms, its graph of connections forms a tree, a cycle-less graph: each process has some processes that it controls and some that it's controlled by. And the structurally associative memory, on the other hand, is not a tree _ its graph of connections has multiple loops. In the structurally associative memory a process is connected to those processes which relate to it according to the logic of pattern, so it will be quite common to have A connected to B, B connected to C, and C connectedto A.
This puzzle can be easily solved, if one is willing to make a special assumption regarding the distribution of connections in the structurally associative memory. One must assume that, given almost any x in the structurally associative memory, the function d(x,y) _ the distance between x and y in the structurally associative memory _ is fractally distributed. This means that ideas come in relatively small clusters, and that these clusters come in relatively small clusters, and that these second-order clusters come in relatively small clusters, etc.
It means that the structurally associative memory is ordered like the physical universe: solar systems, then galaxies, then galactic clusters, then clusters of galactic clusters. The solar system is mostly empty space _ but the distance between planets in the same solar system is orders of magnitude less than the distance between solar systems in the same galaxy. Simliarly, the galaxy is mostly empty space _ but the distance between solar systems in the same galaxy is orders of magnitude less than the distance between galaxies in the same galactic cluster. And so on up _ how many levels, we're not sure.
Just consider the library card catalog. This has a very definite fractal structure: books are divided into subjects, and within each subject there are a number of specialties, and within each specialty there are a number of sub-specialties. To the extent that the distribution of the subjects of published books represents the distribution of concepts in the mind, this is evidence in favor of a fractal mind.
An Experiment More explicitly, suppose one gave a person a huge matrix of the form
big dog pig elephant beauty tire ..
big __ __ __ __ __ __
dog __ __ __ __ __ __
pig __ __ __ __ __ __
elephant __ __ __ __ __ __
beauty __ __ __ __ __ __
and asked her to fill each blank with a number between 1 and 10, indicating the relatedness of the concept at the top of the row containing the blank, and the concept at the left of the row containing the blank. "10" indicates maximally strong relatedness, and "1" indicates complete unrelatedness. For instance, the diagonal entries would all be "10"s, sinceevery concept is maximally related to itself.
Given a completed matrix, for each 1<n<10 one could obtain a graph by assigning a vertex to each concept, and drawing an edge between two vertices if and only if the two are related to a degree greater than or equal to n. If the matrix was formed from a large number of randomly selected concepts, and if structurally associative memory is indeed fractally distributed, then this graph should have a definite fractal structure to it.
This experiment would be very hard to do, since in order to see more than one or two levels of the fractal structure one would require at least several hundred concepts. From a hundred concepts one obtains ten thousand blanks; from a thousand concepts one obtains a million blanks. However, if one accepts that the judgement of relatedness is often more cultural than personal, then one might permit a number of subjects to collaborate on one matrix.
The Dual Network
So, suppose we accept the hypothesis that the structurally associative memory is fractally structured. Then it is no problem putting the structurally associative memory and the perceptual-motor hierarchy together into one graph. Define a level-0 cluster within the structurally associative memory as, simply, an element of the memory. Having done this, one may inductively define a level-k cluster within the structurally associative memory as a cluster of level-(k%1) clusters. And then one may simply identify each level-k clusters with a process on the kth level up in the perceptual-motor hierarchy.
This means that the subsidiaries of a process P in the perceptual-motor hierarchy deal with a subset of the region of structurally associative memory that P deals with. But this makes perfect sense in terms of everything said about the perceptual-motor hierarchy, here and in SI. The perceptual-motor hierarchy corresponds to the fractal structure of the structurally associative memory _ its tree structure is in fact an emergent pattern in the whole structurally associative memory.
Let us take yet another intuitive "toy" example. A level 2 cluster might consist of processes that recognize shapes of various sorts in visual inputs, together with a network regulating these processes. This cluster of shape recognition processes would be organized according to the principle of structurally associative memory, so that e.g. the circle process and the ellipse process would be closer to each other than to the square process. This organization would permit the regulating process toexecute systematic analogical search for a given shape: if in a given situation the circle process were seen to be fairly successful, but the square process not at all successful, then the next step would be to try out those processes near to the circle process.
I chose a low-level example for ease of discussion. Instead of an hypothetical level 2 cluster for recognizing shapes, we might as well have taken an hypothetical level 8 cluster of "techniques for simplifying algebraic equations," or an hypothetical level 4 cluster of "processes for lifting items with one's hand." Any such example would illustrate the utility of a combined fractal-structurally-associative-memory/ multilevel- perception-and-control-hierarchy _ the utility of a structure which evolves both by natural selection and by the multilevel methodology.
This, finally, completes our model of the brain. We have a collection of interacting, interpenetrating neural maps, a la Edelman. These maps are connected according to a hierarchical multilevel control structure _ as hypothesized back in Chapter 4 _ but, these connections are not the only ones there. There are also many other connections, and these represent the heterarchical control structure of the structurally associative memory. The process dependency graph of the brain is a dual network. The main goal of SI was to argue that the dual network structure is adequate for all forms of mental process: deduction, induction, analogy, perception, motor control, emotion, aesthetic sense, consciousness, and so on. Here, however, we are concerned more specifically with the role of evolutionary processes in the dual network. The heterarchical connections ensure evolution by natural selection in the sense of Chapter 3. But the hierarchical connections promote form creation by sexual reproduction in the sense of Chapter 5. We have a model of the mind/brain that incorporates both sexual and asexual reproduction, and differential survival based on fitness. However, these elements are interconnected in a unique way. Psychoneural dynamics are not a copy of ecosystem dynamics, but they do involve many of the same processes _ perhaps because these processes are universal to all self-structuring systems.
6.6 DIFFERENT APPROACHES TO THE EVOLVING MIND/BRAIN
At the end of Chapter 3, we paused to compare the pattern-theoretic view of evolution with other theories of self-organizing evolution, namely those of Odum and Prigogine. In the same spirit, now that we have given apattern-theoretic analysis of the evolving mind, let us see how it relates with other evolutionary psychologies.
As I have emphasized above, I think a great deal of Edelman's evolutionary neurobiology. But we have already discussed Edelman's speculations. Now, instead, let us turn to the two of the grand old men of twentieth-century evolutionary philosophy, Karl Popper and Gregory Bateson.
We have said a great deal about the relation between evolution and thought. However, we have not yet mentioned what is perhaps the best-known attempt to relate these two processes: Karl Popper's "evolutionary epistemology." A brief discussion of evolutionary epistemology may shed some light on the distinctive features of the present theory.
The basic flavor of the theory of evolutionary epistemology may be gleaned from the following passage in Conjectures and Refutations:
The method of trial and error is applied not only by Einstein but also, in a more dogmatic fashion, by the amoeba also. The difference lies not so much in the trials as in a critical and constructive attitude towards errors; errors which the scientist consciously and cautiously tries to uncover in order to refute his theories with searching arguments, including appeals to the most severe experimental tests which his theories and ingenuity permit him to design. (p.52)
Popper himself has not done much detailed work in this direction. However, Campbell (1974) has explored Popper's point of view quite thoroughly. As he put it,
Evolutionary epistemologists contend, simply, that (exosomatic) scientific knowlege, as encoded in theories, grows and develops according to the same method as (and is, indeed, adaptationally continuous with) the embedded (endosomatic) incarnate knowledge shown ... in other organisms, including man. In the second case there is an increasing fit or adaptation between the organism and its environment.... In the first case there is an increasing fit or adaptation between theory and fact.
... The highest creative thought, like animal adaptation, is the product of blind variation and selective retention ... or, to use Popper's phrase ... conjecture and refutation.
At first, this might seem to be the strictest sort of strict Darwinism. After all, the difference between the evolution of "ideas" in an amoeba and the evolution of ideas in a human being lies not in some vague ethic or "attitude" but in the fact that the human nervous system has a certain structure. This structure permits the results of various different trial-and-error experiments to interact in a complex and productive way. By continually emphasizing "blind variation and selective retention," the evolutionary epistemologists appear to be neglecting the role of structure in evolution.
However, evolutionary epistemology does not quite ignore structure entirely. In fact, Campbell himself has devised a sort of perceptual-motor hierarchy, consisting of ten levels. From the bottom up: nonmnemonic problem solving, vicarious locomotor devices, habit, instinct, visually supported thought, mnemonically supported thought, observational learning and imitation, language, cultural cumulation, and science (Campbell, 1974). This hierarchy is endowed with a sort of pattern-oriented multilevel logic. As Plotkin (1982) put it,
If all knowledge processes are part of a nested hierarchical system, then information that has been laboriously gained by a blind variation ... at one ... level of the hierarchy may be fed upwards to some other ... level where it can immediately operate as preset or predetermined criteria....
[T]hese [are] short-cut processes.
A pattern, as defined in Chapter 1, is precisely a "short-cut process." What Plotkin, following Campbell, is saying here is that higher levels of the hierarchy work partly by recognizing patterns in lower levels of the hierarchy. This is a crucial insight. But the evolutionary epistemologists do not pursue it very far, perhaps because of their emphasis on "blind variation." They give only the vaguest idea of how this hierarchy might operate.
Ecology of Mind
We began Chapter 1 with Gregory Bateson's concept of the Metapattern -with the idea that the stuff of the biological world is pattern. But we havenot yet said much of anything about the use to which Bateson put this ontological axiom. Let us now very briefly return to Bateson's evolutionary theory of mind.
Bateson argues that both genetic change and the process called learning ... are stochastic processes. In each case there is, I believe, a stream of events that is random in certain aspect and in each case there is a nonrandom selective process which causes certain of the random components to "survive" longer than the others.
And he follows up this general suggestion with a particular set of criteria- a set of properties which, he proposes, are shared by minds and ecosystems alike:
1. A mind is an aggregage of interacting parts or components
2. The interaction between parts of mind is triggered by difference, and difference is a nonsubstantial phenomenon not located in space and time; difference is related to negentropy and entropy rather than to energy.
3. Mental process requires collateral energy
4. Mental process requires circular (or more complex) chains of determination.
5. In mental process, the effects of difference are to be regarded as transforms (i.e. coded versions) of events which preceded them. The rules of such transformations must be comparatively stable (i.e. more stable than the content) but are themselves subject to transformation.
6. The description and classification of these processes of transformation disclose a hierarchy of logical types immanent in the phenomena.
I shall argue that the phenomena which we call thought, evolution, ecology, life, learning and the like occur only in systems that satisfy these criteria.
These axioms are rather abstract, and the second half of Mind and Nature is devoted to elucidating them in concrete contexts. But the main point that I want to make here is that nothing in this list of criteria is contradicted by the ideas of the present book. In fact, Criterion 6, the"hierarchy of logical types," has been discussed in detail using the terminology of multiboolean forms. Unlike the evolutionary epistemologists, Bateson is making a serious attempt to understand the structure of complex evolving systems. However, his conclusions thrust in a rather different direction than those of Section 6.6.
The key difference between Bateson's approach and my own is as follows. My definition of mind _ as well as my definition of natural selection _ explicitly involved pattern. On the other hand, Bateson's criteria are rather physicalistic in emphasis, using thermodynamic concepts such as energy and entropy. The theory of the present chapter agrees with Bateson's six criteria, but it is much more specific _ in no small part because it is based on explicit manipulations of the concept of pattern.
In other words, Bateson's criteria are certainly necessary conditions for a system to be a mind. But, as I see it, they are not sufficient. The necessary and sufficient condition, I believe, is the presence of an efficiently functioning dual network, incorporating evolution by natural selection and creativity by crossover. This condition cannot be stated without the formalism of pattern, as well as the logic of multilevel control, neither of which were part of Gregory Bateson working conceptual vocabulary.
Psychology and Darwinism
So, in conclusion, what can we say about these different approaches to the relation between thought and evolution?
Both Bateson and the evolutionary epistemologists are asking the same question: what is the common structure underlying both brains and eco-systems, which allows them to harness trial-and-error experimentation in a progressive, form-creating way? Bateson gives an answer in the form of a list of axioms. On the other hand, the evolutionary epistemologists mention hierarchy and short-cut processes _ but they do not pursue these topics very thoroughly. Instead, in true strict Darwinist form, theyrepeatedly refer to the power of blind variation and selection.
In this chapter we have asked the very same question and given a different answer, one complementary to Bateson's axiomatic scheme, to the strict Darwinism of the evolutionary epistemologists, and also to Edelman's neurophysiological ideas. Our modus operandi has been to identify certain important processes as common to both brains and ecosystems: an hierarchical structure, the process of form creation by sexual reproduction, and _ perhaps most crucially _ evolution by natural selection where fitness is defined in terms of emergent pattern.