The Evolving Mind -- Copyright Gordon and Breach 1993

Back to The Evolving Mind Table of Contents



The very first paragraph of this book, if you recall, was concerned with the ambiguity inherent in the phrase "the evolving mind." So far we have been considering the less obvious interpretation of the phrase, and studying the evolution of individual minds over short time periods. The previous chapter ended with a comprehensive model of mental/neural process as evolutionary process.

    In the present chapter, however, we shall briefly turn our attention to the more obvious reading of the phrase "the evolving mind." We shall consider the evolution of mind from generation to generation. This may appear to be a digression from our main line of argument, as summarized at the end of the previous chapter. However, Brown's (1988) concept that "microgeny recapitulates phylogeny" posits a logical connection between the evolutionary process involved in an individual's thoughts, and the evolution of mind from generation to generation. As we shall see in the final section, this connection has some very interesting consequences, which pertain quite directly to the conclusions of Chapter 6.

    The chapter is perhaps a little bit more sketchy than the ones which precede it. However, I would rather end with some suggestive questions than with a grand theoretical construction. Merely by framing the problem of the history of mind in terms of the model of Chapter 6, one arrives at significant new insights, and senses promising new directions for future investigation.

    The theory of the dual network allows us to break the question of the natural history of mind into two parts: how did the hierarchical network evolve, and how did the heterarchical network evolve. We shall begin with by considering the evolution of the hierarchical network, from a loosely "recapitulationist" perspective. Then we shall argue that, while the evolution of the hierarchical network was probably gradual, the evolution of the heterarchical network probably evolved a sudden phase transition, reminiscent of the phase transitions of physical chemistry (Prigogine and Stengers, 1984). Finally, this point will be combined with Brown's "microgenetic" theory to yield a novel theory of the phenomenon of sudden insight. A sudden insight, it will be suggested, is actually a recapitulation of the historical evolution of intelligence.


The main goal of this section is to discuss what I call Brown's microgenetic law, which states that microgeny (the development of individual thoughts) recapitulates phylogeny (the historical development of thought). But let us begin, not with microgeny, but with the relation between ontogeny and phylogeny.

    We all learned about it in school: the human fetus, at an early stage, looks sort of like a fish. And at a slightly later stage, it looks rather like a amphibian. We know that fish evolved into amphibians which evolved _ after a long process _ into mammals and humans. In biological lingo, this is an example of a parallel between "ontogeny" and "phylogeny" _ between the development of an embryo into an independent organism, and the evolution of species.

    There is no denying that ontogeny and phylogeny are parallel in a very crude and general sense _ both begin with single-celled entities, proceed to multicellular entities, proceed through "fishy" and "amphibiany" stages, etc. This observation dates back at least to Aristotle. But the interesting question is how far this parallel extends: is it merely a coarse analogy, like the Rutherford's model of the atom as a miniature solar system, or is it a useful tool for investigating details?

    Today this is not one of the central questions of biology. In the early nineteenth century, however, the work of the "Naturphilosophen" gave the problem of ontogeny versus phylogeny a great deal of prominence. These thinkers, including such then-famous figures as Lorenz Oken and J.F. Meckel, viewed development as a process of successive addition: beginning with nothing, one entity is created, then another is added on, then another is added on, and so on. In this view, the parallel between ontogeny and phylogeny may be summarized as follows: 1) a more complex animal is, to within a high degree of approximation, a simpler animal with some extra organs added on; 2) this is how the embryo of a complex animal develops: first it grows into a simpler animal, then it grows the extra organs. The Naturphilosophen posited various universal laws of development, which were supposed to determine what sorts of structures were added on at what stages.

    This work paved the way for Ernst Haeckel, who separated the biological hypothesis of recapitulation from the more extravagant speculations of the Naturphilosophen, and marshalled a great amount of biological data in its favor. Haeckel put the doctrine of recapitulation in aphoristic form with his famous Biogenetic Law: "ontogeny recapitulates phylogeny." Throughout the second half of the nineteenth century, mostbiologists accepted Haeckel's views as fact. Note Balfour's confident tone in the following passage, written in 1880:

    [Why do animals] undergo in the course of their growth a series of complicated changes, during which they acquire organs which have no function, and which, after remaining visible for a short time, disappear without leaving a trace ... The explanation is obvious. The stage when the tadpole breathes by gills is a repetition of the stage when the ancestors of the frog had not advanced in the scale of development beyond a fish. (p. 702)

    Haeckel and his supporters openly admitted that there were exceptions to their "law," but these were plausibly explained away: The rapid and brief ontogeny is a condensed synopsis of the long and slow history of the stem (phylogeny): this synopsis is the more faithful and complete as palingenesis has been preserved by heredity and cenogenesis has not been introduced by adaptation.

    "Palingenesis" was defined as "the science of repeating forms," and "cenogenesis" was defined as "the science of supervening structures." Exceptions to the Biogenetic Law were called "cenogenesis" _ and it was argued that "palingenesis" is the prevailing phenomenon. One of Haeckel's favorite examples of cenogenesis was the adaptation of free-swimming larvae to their own environment. But he also discussed a variery of more substantial examples, such as heterochrony, or displacement in time _ the process by which vertebrate embryos develop eyes and brains earlier than the phylogenetic comparison would predict.

    The most penetrating early critic of recapitulation was Karl Ernst von Baer (writing slightly before Haeckel). As translated by Gould (1977), the key tenets of his theory of development were as follows:

    1. The general features of a large group of animals appear earlier in the embryo than the special features

    2. Less general characters are developed from the most general, and so forth, until finally the most specialized appear

    3. Each embryo of a given species, instead of passing through the stages of other animals, departs more and more from them

    4. Fundamentally, therefore, the embryo of a higher animal is never like the adult of a lower animal, but only like its embryo

    This critique of recapitulationism is a powerful one _ even more powerful today than in Von Baer's time, since we now have genetics and the theory of natural selection. If the human embryo looks like a fish and then an amphibian, this is not because the embryo is repeating the evolutionary sequence adult fish % adult amphibian % human, it is rather because the evolutionary changes by which a human evolved from a fish and then an amphibian caused more changes in the later stages of embryonic development. What could be more sensible?

    Using the concepts of genetics and evolution, it is easy to make an intuitive argument as to why this would be the case. Let's consider an unbiological but suggestive "toy model." As a rough approximation, let us break the process of development down into three distinct stages -_ we may as well call them first trimester, second trimester and third trimester. And _ here's the "toy" part _ let us consider an organism whose DNA is broken up into three segments. The first segment controls those processes that take effect during the first trimester, the second segment controls those processes that take effect during the second trimester, etc. Then, we may write the development of the organism in three stages:

            X = product of segment 1

            Y = product of segment 2 acting on X

            Z = product of segment 3 acting on Y

This is the "ontogenetic sequence" that produces the organism Z.

    Now let us think about the effects of mutations. A change in segment 3 of the DNA will not affect Y, it will only affect the processes that act on Y to form Z. But a change in segment 2 of the DNA will affect Y, and also Z. And a change in segment 1 of the DNA will affect X, Y and Z _ all three of them. So, if we assume that each of the three segments is equally likely to be mutated, it follows that every mutation will affect Z, but only two thirds of them will affect Y, and only one third of them will affect X. After a large number of mutations, X will change less than Y, which will change less than Z.

    It is easy to see how this argument could be generalized beyond this toy model of epigenesis. For instance, it could be formulated in the general context of Section 5.6 _ in terms of multiboolean forms and GMF dynamical systems. But I will leave this as a straightforward exercise for the ambitious reader _ we have seen enough formalism already.

    In its details this toy model argument has little in common with Von Baer's hundred and fifty year old arguments _ but the concept issimilar. The basic idea is as follows: ordinary evolution, as combined with the normal embryological process, can be expected to leave the early embryonic human looking sort of like the early embryonic fish, but the later fetal human looking very little like the later fetal fish. This, Von Baer claimed, is the only explanation needed for those parallels that we observe between ontogeny and phylogeny. No additional "law" is needed. Embryonic humans resemble embryonic fish, not adult fish.

    Very convincing indeed. It is always satisfying to see a mysterious "coincidence" explained away as the consequence of well-understood, fundamental processes.

    However, as Gould (1977) has argued so forcefully, this is not the end of the story. Haeckel actually was onto something, although he drastically overstated his case. The phenomenon which Von Baer pointed out is a real one, and it explains a large percentage of the parallels between ontogeny and phylogeny. However, there are also cases in which embryonic organisms resemble adult, rather than embryonic, forms of some of the organisms from which they evolved.

    One of the favorite examples of Haeckel and his fellow recapitulationists was the development of more complex sutures on ammonites. Ammonites evolved more complex suture patterns over time, and this did indeed involve recapitulation _ the embryonic ammonite really does pass through the suture patterns of its adult ancestors, not only the suture patterns of the embryos of its ancestors. This is because the processes that form suture patterns make more complex patterns when they run longer. So apparently, as ammonites evolved over millions of years, some rate genes were mutated _ and the process of suture pattern formation was consequently sped up. As discussed in Chapter 3, this kind of mutation was championed by Goldschmidt (1940), who argued that most major evolutionary changes were caused by changes in the genes encoding the timing of some process.

    But, if true Haeckelian recapitulation is possible, so is its opposite, paedomorphosis. In his Ontogeny and Phylogeny, Gould (1977) gives dozens of examples in which adult organisms resemble juvenile versions of their ancestors. This phenomenon is particularly striking in the case of shellfish _ the shell of an adult from one species actually looks like the shell from a juvenile of its ancestor species.

    As Gould puts it,

        All parallels between ontogeny and phylogeny fall into these two categories: If a feature appearing at a standard point of ancestral ontogeny arises earlier andearlier in descendants, we encounter a direct parallel producing recapitulation.... If a feature appearing at a standardized point of ancestral ontogeny arises later and later in descendants, we encounter an inverse parallel producing paedomorphosis....

        ... Evolutionary changes must appear in ontogeny, and they can arise only by the introduction of new features or by the displacement of features already present. The second process produces parallels between ontogeny and phylogeny; the first does not. Together, they describe the course of morphological evolution. The continued relevance to modern biology of the great historical theme of parallels between ontogeny and phylogeny rests entirely upon the relative frequency of evolution by displacement rather than by introduction.

        Moreover, since displacement involves no more than a change of timing for developmental stages already present in ancestors, its genetic basis probably resides in the regulatory system.... (1977, p. 214)

Gould also points out that "the collapse of Haeckel's law prompted a confusing variety of complex classifications for relations between ontogeny and phylogeny" (1977, p. 8). In place of these classificatory schemes, which focus on results rather than mechanisms, he proposes a two-fold categorization of processes: acceleration, when an ontogenetic process occurs earlier in an organism than in its ancestors, and retardation, when an ontogenetic process occurs later in an organism than in its ancestors. Acceleration leads most directly to recapitulation, but it can also lead to paedomorphosis _ for instance, in some insects sexual maturation is accelerated, thus causing the insect to reach adulthood while still in an ancestrally juvenile form (a process called progenesis). And retardation leads most directly to paedomorphosis _ for instance, this is the case with humans. We retain juvenile features of their ancestors into adulthood _ this process is called neoteny and is responsible for our large brains and small jaws, the sparsity of our bodily hair, and the ventral pointing of the vaginal canal, among many other features. But retardation can also result in recapitulation _ by hypermorphosis, or the delay of maturation.

    In general, an organism in which some ontogenetic processes are accelerated or retarded is said to experience heterochrony. The closest thing to a modern version of Haeckel's doctrine is the hypothesis thatheterochrony is both common and important.


Now, what does all this have to do with the topic of the present chapter, which is the evolution of the mind? Jason Brown, in a series of groundbreaking papers collected in The Life of the Mind, has proposed a psychological theory loosely modeled after Haeckel's Biogenetic Law.

    In a very general way, Brown's theory of mind fits in well with the dual network model. As shown in Figure 31, he proposes a perceptual-motor hierarchy. And the control processes which he discusses, while they do not explicitly imply the multilevel methodology, do not contradict it either.

    In one important way, however, Brown goes far beyond the dual network model, and all other abstract, hierarchical models of mind. He proposes that lower levels of the perceptual-motor hierarchy are in some sense evolutionarily prior. He calls the process of percept- or object-formation microgenesis, and states that

    Microgenesis is a type of instantaneous evolution. Like evolution and maturation, it is assumed to proceed in a forward direction. Systems are entrained in cognitive activity in a sequence that mirrors their appearance in evolution. The idea of a unidirectional flow from archaic structures to those of evolutionary recency is a key element in microgenetic theory....

        Another basic element in microgenetic theory is the idea that subsurface events that immediately precede surface representations are part of the structure of the representation, and not purely anticipatory or preparatory stages, a concept idential to that in evolutionary theory....

For obvious reasons, I will call this hypothesis the microgenetic law.

    How does Brown's idea tie in with the preceding discussion of ontogeny and phylogeny? Well, suppose one posits that the human fetus's perceptual-motor hierarchy develops gradually from the bottom up - lowest level first, then next-lowest level, and so on. Then one may reason as in the toy model given above, and it follows that the lower levels of the perceptual hierarchy will have more in common with archaic mental forms than the upper levels. This is Von Baer _ type "recapitulation," which I will call pseudorecapitulation from here on. (Departing from Gould'susage, however, I will still consider pseudorecapitulation to fall under the general category of recapitulation. When I want to emphasize that something is not merely pseudorecapitulation, I will call it true recapitulation.)

    But Brown, following Haeckel, proposes something stronger than mere pseudorecapitulation. "Systems are entrained in cognitive activity in a sequence that mirrors their appearance in evolution" _ this is a very muscular statement. It seems to me that Brown's doctrine is actually rather similar to the early recapitulationism of the Naturphilosophen, who believed that ontogeny was a process of progressively adding more modern organs onto older structures. Brown's views suggest that the evolution of mind was, in part, a process of adding more sophisticated layers on top of older layers.

    The Naturphilosophen were flat out wrong, of course. But it is clear that there is at least an element of truth to the analysis of mental evolution as progressive addition. After all, a monkey is usually able to recognize the kinds of patterns that a dog recognizes _ plus others, more sophisticated. And a human is generally able to recognize the kinds of patterns that a monkey recognizes _ plus other more sophisticated ones.     Of course, neither Brown nor I would claim that this is all there is to it _ the lower levels of our perceptual-motor hierarchy are not identical to levels found in the perceptual-motor hierarchies of our ancestors. But it does seem plausible that there is some sort of progressive addition going on. After all, many of our needs are the same as those of our ancestors: those qualities of mind that are uniquely human are supplements to, rather than replacements of, ancestral mental qualities. And, on the biological level, the crude global maps of brain function that we now possess show very clearly that we have a "reptilian" brain stem, an early mammalian brain, and then a vastly overdeveloped "modern" neocortex.

    Let's get more concrete. Brown's medical work involves patients with various deficits _ for instance, speech disorders. Let us very briefly consider the fascinating speech disorder called jargonaphasia, and see what it has to teach us about mental evolution.

    A person with semantic jargonaphasia produces utterances like

    And I say, this is wrong. I'm going out and doing things and getting ukeleles taken every time and I think I'm doing wrong because I'm supposed to take everything from the top so that we do four flashes of four volumes before we get down low ... Face of everything. This guy has got tohis thing made out in order to slash immediately to put all of the wind-pails... (1988, p.73)

Neologistic jargonaphasia, on the other hand, leads to comments like "Your patebelin like the mother ... and his mothers of to go in his stanchen," or "Those are waggots, they have to be fribbed in." Finally, there is phonemic jargonaphasia, in which the patient utters sounds in unrecognizable combinations, not even forming words.

    In terms of the perceptual-motor hierarchy, one would say that semantic jargonaphasia is a dysfunction on a higher level than is neologistic jargonaphasia, which is in turn a dysfunction on a higher level than is phonemic jargonaphasia. The first involves sentence meanings, the second involves individual words, and the last involves individual phonemes _ and, after all, sentences are made of words, which are made of phonemes. The perceptual-motor hierarchy assembles words from phonemes and sentences from words according to the multilevel methodology.

    There is little doubt that, phylogenetically, the ability to produce phonemes developed first, followed by the ability to select appropriate words, followed by the ability to arrange words in meaningful sentences. To take a rough example, every dog owner knows that animals make sounds which have specific, often context-dependent, meanings _ these sounds might be called "words," or "pre-words." Chimps, on the other hand, seem to have the ability to group words together (this has been shown by teaching chimps sign language). This behavior is more sentence-like than what dogs do _ but still, chimps do not seem to have the ability to consistently form complete sentences. They are closer to humans, phylogenetically, than dogs are _ and they have also developed more levels of the linguistic portion of the human perceptual-motor hierarchy.

    So, in this case, the correspondence between microgeny and phylogeny appears to be real. Phonemic, neologistic and semantic jargonaphasia may indeed be interpreted as "regressions" to ancestral stages of phylogeny _ phonemic to an older stage than neologistic, neologistic to an older stage than semantic. Microgeny seems to recapitulate phylogeny, in this instance. It is not clear whether this is merely pseudorecapitulation, or whether it is actually the result of mental evolution by addition of extra levels. My bet, however, is with the latter hypothesis.

    Brown takes pains to distinguish his microgenetic theory from Hughlings Jackson's older theory of "regression." And there is indeed amajor difference _ Jackson never fully acknowledged that, just as humans have tailbones but do not retain their fetal tails, archaic mental forms are often integrated into more evolutionarily recent mental forms. But Brown's theory of linguistic, visual and motor dysfunction is a theory of regression nonetheless. The important point is that one never has true regression, because microgeny does not exactly recapitulate phylogeny _ like ontogeny it only incorporates certain aspects of phylogeny, mixed up with many other features.

Heterochrony in Microgeny: A Speculative Hypothesis

Modern biology rejects the old recapitulationist concept of ontogeny as progressive addition. But Gould has suggested that there is an element of truth in the writings of the old recapitulationists _ namely, heterochrony, evolution by alteration of rate.

    We have seen that the evolution of mind as progressive addition is actually a plausible hypothesis. However, we have not yet discussed the possibility of heterochrony in mental evolution. In this section I will make a specific hypothesis as to one type of heterochrony that I think might plausibly exist. This hypothesis is somewhat speculative; the main point here, however, is not to "sell" a particular mechanism, but rather to indicate what kind of heterochronic phenomena one should look for.

    The main assumption underlying the hypothesis of this section is that, in every mind, there are certain responsibilities attendant on the highest level of abstraction at which rapid mental function can occur. Namely, those processes at the highest level of abstraction must be adept at making decisions under extremely uncertain conditions, because they - unlike their lower-level counterparts _ do not have the option of passing the buck to a more abstract level.

    Let's begin with the basics. When an animal makes a decision as to what to do in a given situation, it responds based on induction with respect to previous situations (thus based on appropriately accessing its memory). But, as discussed in SI, induction is a tricky business. There are always many different analogies to past situations to be made; one wants to pick the strongest one. Lower levels of the perceptual-motor hierarchy are capable of picking out less wide-ranging, more simple and local analogies. Higher levels are capable of picking out more global and intricate analogies. This can even be seen in the linguistic example considered above: when one hears a spoken passage, one can respond on the low level of sound (e.g. rhyme, alliteration, onomatopaia), on the intermediate level of individual morphemes, or on the high level of overallsentence meanings. When one judges the similarity of a situation involving speech to other situations involving speech, one generally seeks similarities on all these levels.

    More precisely, it is in general incumbent on a mind, before making a decision about a certain situation, to adhere to the following guideline: either 1) check the higher levels of its perceptual hierarchy to see if they have detected anything; or 2) have a low-level process realize that the situation falls into a category which was previously judged, by high levels of the perceptual-motor hierarchy, not worthy of the attention of the high levels of the perceptual hierarchy.

    It is clear that in general it must often be worth a mind's while to check the higher levels of its perceptual hierarchy. The only exception to this rule would be an organism which evolved to survive in a difficult environment, and was then suddenly transported into an easy environment (this situation occurs in Jack Robinson's novel The Humanoids: infinitely benevolent but horribly overprotective robots arrange human lives so that higher-order thinking is no longer necessary). After all, one may argue that the higher levels would never have evolved in the first place unless they were of some use in making decisions. The higher levels of the perceptual-motor hierarchy certainly have no function other than to aid in making decisions, and it is hard to see how they would have arisen as a corollary to anything else.

    So, a mind often needs to check the higher levels of its perceptual-motor hierarchy before it makes a decision. But it can't keep pushing things upwards forever _ in each particular case, the buck has to stop somewhere. I suggest that the levels near the top of the hierarchy will have certain characteristic differences from those levels that are generally allowed to pass difficult problems up to their superiors.

    A good analogy is the way the job of a company president differs from the job of a vice-president, an upper-level manager, a middle-level manager, and so on. All the lower levels get to pass their most difficult problems on up, but the top level does not get this privilege. Instead of appealing to highers-up, the company president sometimes just makes a decision _ based on a hunch, a prayer, a coin toss, whatever. This is the sort of thing that a lower-level manager will do much less frequently. Similarly, when the chain of command stops at a certain level of the perceptual-motor hierarchy (because higher levels are nonexistent or, more likely, inappropriately inefficient), that level just makes a decision. It must make decisions under greater uncertainty than the levels beneath it, since a lower-level process, in case of uncertainty, can pass the burden to its immediate superior.

    So, finally, here is an argument in favor of one example of microgenetic heterochrony:

    1) There are certain processes appropriate for making decisions under great uncertainty

    2) These processes are in the greatest demand on the highest levels of the perceptual-motor hierarchy that are routinely used

    3) The "highest levels" of the mind of an organism's ancestor may well correspond to lower levels of the organism's mind

    4) Therefore _ choosing arbitrary numbers for concreteness _ suppose that level 16 of a human brain corresponds in some ways to level 16 of a primate brain, but level 16 of a primate brain is roughly "highest" whereas level 20 of a human brain is roughly "highest." Then in the evolutionary process one aspect of level 16 of the primate brain is being pushed up to level 20 of the human brain. This aspect is, namely, the collection of processes that deal with decision under great uncertainty.     Given the premises 1_3, the conclusion (4) follows easily. One might argue that nothing is being pushed up, that the evolution of humans from primates involved the re-evolution of processes for dealing with uncertainty. But how much simpler just to delay the timing of the installation of these processes, so that they are put in at the time that level 20 rather than level 16 is being formed!

    This hypothesis is perhaps excessively speculative. However, I hope that it can serve as a sort of exemplar of how heterochrony might play a role in mental evolution. The key point is that there is no paradox in a more intelligent organism pushing some aspect of its ancestors' perceptual-motor hierarchies up to a higher level of its perceptual-motor hierarchy. The use of a process can only be understood relative to its context _ if some process fits into in a certain context that happens to reside on a higher level in the descendant than in the ancestor, there is no reason for acceleration not to take advantage of this. "Decision under uncertainty" is one possible example of such a level-specific process, but it is certainly not the only possible example.


We have dealt with the evolution of the hierarchical network. Now let us turn to the other side of the dual network: the heterarchical network, the structurally associative memory. In this section I will outline an intuitive argument that slight changes in the ability of an organism to recognizepatterns can sometimes lead to huge changes in the usefulness of that organism's memory. The idea is that, until the pattern recognition ability of an organism passes a certain critical range, its associative memory network is almost certain to be too simplistic to support much intelligence. But once the pattern recognition ability becomes sufficiently high, its memory network all of a sudden becomes reasonably complex.

    This is very different from the situation with the hierarchical network, considered in the previous section. The hierarchical network was hypothesized to have evolved gradually, piece by piece, according to an evolutionary process involving pseudo-recapitulation, acceleration and retardation. And there is no reason to believe that effectiveness of the hierarchical network did not change just as gradually. But there is reason to believe, or so I will argue, that the effectiveness of the heterarchical network may have undergone a sudden jump at some stage in evolution.

    Incidentally, it should be noted that this thesis has important implications for artificial intelligence. It supports the view that the difficulty of writing intelligent programs for contemporary computers has more to do with lack of computing power than with the nature of intelligence. It suggests that once hardware advances permit us to implement sufficiently perceptive pattern recognition routines, we will be able to write programs that will "automatically" evolve useful associative memories.

The Structure of Random Graphs

In this subsection I will introduce a few well-known results from the theory of random graphs. Let me begin by saying exactly what a "random graph" is. Write C(n,k)=n!/k!(n-k)!, and consider a graph with n labeled vertices. Such a graph has C(n,2) slots available for edges. If we arrange these slots in some linear order, we may, in that order, independently, one slot at a time, introduce into each slot an edge with probability p. When this process has been completed, a "random graph" will have been generated. Other probabilistic models are also studied (Palmer, 1985), but the results obtained are mainly qualitatively similar (if not quantitatively identical) to those obtained from this very simple model.

    Clearly, increasing p increases the expected number of edges. More precisely, the following is not difficult to show: if G has n vertices and q edges, the probability of G is given by P(G) = pq(1-p)C(n,2)-q. This simple formula has some surprising results. For instance, Gilbert (1959) showed that for any nonzero p, the probability of connectivity approachesone as n becomes infinite.     

    In 1959, Erdos and Renyi (1959) obtained the following crucial results:

Theorem 1 (Erdos and Renyi): Assume p=c/n. Then the following statements are true of almost all graphs:

    1) If 0<c<1: all components are trees or unicyclic, and the largest component is a tree of order about log n.

    2) If c=1: there are certain to be cycles, and the largest component has order about n2/3.

    3) If c>1: there is a unique "giant" component of order about G(c)n, where    G(c)= 1 % [ kk-1(ce-2c)k/k!]/c.

Theorem 2 (Erdos and Renyi): Assume p=c log n/n. Then the following statements are true of almost all graphs:

    1) If 0<c<1: the graph is disconnected

    2) If c>1: the graph is connected and Hamiltonian.

    Theorem 1 tells us that, when p surpasses a certain point, the typical graph jumps from being "sparse" with only trees and unicyclic components, to having a "giant" component. As p is increased beyond this level, the giant grows larger and larger. Since log n/n > 1/n for large n, Theorem 2 informs us about a higher range of p, after the giant has already emerged. It tells us that up until p has a certain value, the graph will still be disconnected; but after p exceeds this value, the graph is all of a sudden connected with a Hamiltonian cycle.

    Even more information is given by another classical theorem of Erdos and Renyi (1961), which has recently been generalized by Bollobas (1982):

Theorem 3 (Bollobas): Let Xd be the random variable that counts vertices of degree d. If E(Xd) is finite, then Xd has a Poisson distribution.

Theorem 4 (Erdos and Renyi): If p= log n/n + d loglog n/n + x/n + o(1/n), then Xd has a Poisson distribution with mean e-x/d!.     

From the latter result it follows that, if we take

    p = (log n/n)[1 + (d%1) loglog n/log n + wn/log n]    (1)

where wn goes to infinity arbitrarily slowly, almost every graph hasminimum degree and connectivity d (Palmer, 1985). Furthermore, in 1981 Shamir and Upfal (1981) proved the following:

Theorem 5 (Shamir and Upfal): Where p is given by (1), and a degree sequence d1,...,dn is given with 0<di<d+1 for i=1,...,n, almost every graph contains a subgraph with this specified degree sequence.

    One could not ask for a much richer structure than this! All these results are asymptotic. But the physical networks that are modeled by graphs are, obviously, finite in size. It must be cautioned that no significant results on the rate of convergence to these asymptotic values are available in the graph theory literature, so that, strictly speaking, these theorems are of limited practical applicability. However, I suggest that the asymptotic r were briefly mentioned earlier). In SI it is shown how every structurally associative memory naturally induces a certain Quillian network. Roughly, the similarity between x and y, used in determining whether there is a connection between x and y in the Quillian network, is defined as the amount of pattern relating x and y in the structurally associative memory.

    In order to use this mathematical apparatus to come to a psychological conclusion, we must make one simple assumption: that in order to be useful for intelligence, an associative memory should be able to express the relation "A, B and C are similar in one respect, but B, C and D are similar in another respect". This implies that an associative memory network which contains no connected components other than trees and unicyclic graphs, is not adequate for intelligence.

    The next step is to ask: what determines p? One way to proceed is to make the a priori assumption that significant similarities are there to be recognized. As discussed in SI, in the absence of such an assumption there is no reason to believe that intelligence is possible. In this approach, one might set p=qr, where q is the a priori probability that two randomly selected entities x and y are significantly similar, and r is the probability that a randomly selected significant similarity is perceived.

    Another possibility is to make statistical assumptions about the entities stored in memory. In order to explore this possibility in a preliminary way, let us temporarily introduce an explicitly unrealistic simplifying assumption. Following Kanerva (1988), let us look at the Hamming distance d(x,y) = dH(x,y), i.e. the number of entries in which x and y differ. As should be clear from the preceding chapters, I find the pattern distance much more appropriate. However, to get decent numerical estimates out of the pattern distance would be an heroicmathematical feat; by comparison the Hamming metric is a delight to work with.

    For instance, assume that the significance threshold T represents the recognitive ability of the system: that the system can tell if d(x,y)<T, but is incapable of finding weaker similarities. And assume that the entities involved are drawn from a uniform distribution on Sn. In that case, where x=x1...xn and y=y1...yn, P(xi=yi)=1/2. We are dealing with a sequence of Bernoulli trials, binomially distributed, and we have p = 2-n [C(n,0)+...+C(n,T)].

    The giant component will emerge when p=1/n; the network will be connected when p=logn/n; the network will have minimum degree d when p is about (log n/n)[1 + (d-1) loglog n/log n]. So, in the context of this very rough model, the question is: for what values of T will p attain each of these values? How powerfully need the system detect similarities in order for its associative memory network to have each degree of richness? These questions are not analytically tractable for specific n, but in the limit one may obtain estimates using the normal approximation to the binomial distribution. For instance, one may show that to get p=c/n for large n, one must choose T close to the root of the equation

    x = c(2n)1/2 % .5n5/2 exp [%.5([2x%n]/n1/2)2]

From this it is easily seen that (2n)1/2 is a fairly good approximation to the value of T for which p=1/n. For large n, in order for its associative memory network to have a giant component, a system must be able recognize similarities involving at least about (2n)1/2 bits of a binary sequence of length n.

The Phase Transition

This statistical analysis is only illustrative, since it involves the Hamming distance, which is of dubious psychological meaning. But intuitively, the general point should be clear. Better pattern recognition ability implies a higher probability p. And the qualitative structure of the memory network depends discontinuously on p, when p reaches 1/n. Therefore, an associative memory network has a phase transition with respect to pattern recognition ability. After p passes 1/n, the network gradually attains more and more edges.

    So, if it is accepted that intelligence requires a "sufficiently rich" associative memory, then it follows that both pattern recognition abilityand the usefulness of associative memory undergo a discon-tinuous jump. Until they reach the stage where p=1/n, they are probably not much use. But after the emergence of the giant component, their effectiveness probably increases steadily, perhaps roughly proportionally with the minimum degree.

    As mentioned above, the existence of such a phase transition, if accepted, has important implications for artificial intelligence. It suggests that the attempt to produce intelligence with contemporary hardware is doomed to failure. The small size of contemporary machines places severe limitations on their pattern recognition ability. It seems very likely that they lie below the phase transition which marks the emergence of the giant component.

    In the previous chapter, we proposed a fairly specific graph structure for real structurally associative memory networks: the fractal dual network structure. This structure is compatible with the apparent connectivity of real brains, and it is also compatible with the concept that the human memory is above the graph-theoretic phase transition discussed above. A dual network structure is connected or almost-connected (thus it has a giant component), and not only that, it has a minimum degree much greater than one (with possibly a few exceptions). These few exceptions do not violate the theorems of Erdos, Renyi, Palmer, Shamir and Upfal _ neither in letter nor in spirit _ because the theorems refer to asymptotically large graphs, whereas real memory graphs are finite.

The Evolution of the Dual Network

We have considered the evolution of the heterarchical network and the evolution of the hierarchical network as separate processes. However, according to the dual network model, the two networks are not really separate. They are interwoven and largely overlapping in practice, so they must also be closely connected in origin.

    On the one hand, the hierarchical network relies on the heterarchical network as an aid in making decisions. For when a process needs to know what to tell its subsidiary processes to do in a given situation, where does it look for information, for appropriate analogies with the past? It has to look to the structurally associative memory. An effective hierarchical network of necessity relies on an effective structurally associative memory, an effective heterarchical network. Therefore, if there was indeed some point in the course of evolution at which the structurally associative memory underwent a leap in effectiveness, it is to be expected that the hierarchical network made a subsequent leap in effectiveness shortly thereafter, as a consequence of this first leap.

    But, on the other hand, the heterarchical network relies on the hierarchical network to detect the similarities that determine its structure. So, as a general rule, it seems most likely that the two networks evolved together, slight improvements in one encouraging slight improvements in the other. However, at some point slight, gradual improvements in the hierarchical network pushed its similarity recognition ability above the critical point, and caused a phase transition in the heterarchical network - thus catapulting the whole dual network into a higher level of operation, on which the two networks then continued their gradual symbiotic evolution.

    Let's get a little more precise. The main threshold involved is at pn = 1 _ when the probability of a connection between two memory elements, multiplied by the number of memory elements, passes one. So if we have a dual network structure, in which memory elements are also pattern-recognizers, then pn will naturally increase with n. For in general, the pattern recognition ability of a dual network will not decrease with its size. The only question is how rapidly the increase of pn occurs.

    A back-of-the-envelope calculation gives some meat to this idea. Suppose that one has a dual network whose hierarchical network is exactly pyramidal in structure: each process, except on the lowest level, supervising two other processes; and each process, except on the highest level, supervised by one process only. And suppose that each process on level k is "worth" at least c>1 more than each process on level k-1 toward total pattern recognition ability. Then the pattern recognition ability of the whole network is on the order of n logn, so that pn is on the order of n2 log n. Of course, the important factor here is the constant of proportionality hidden in the phrase "on the order of."

    This calculation is only heuristic. But its message is clear. Suppose that, on the average, adding a single process high up in the hierarchical network augments general pattern recognition ability significantly more than adding a single process low down in the hierarchical network. And suppose that the hierarchical network, as it develops, does so in a roughly "pyramidal" way, rather than just getting "bottom-heavy" _ this is suggested by the Microgenetic Law. Then pattern recognition ability should increase faster than linearly with network size, and the phase transition will be reached with a rapid rate of change of pn.

    This picture is, of course, rather speculative. However, it is conceptually very powerful, in the sense that it relies only on the generalidea that the mind/brain is composed of two semi-independent, symbiotically interacting components: an associative memory, and an analogy-utilizing perception and control mechanism. The hypothesis of neural network crossover, the idea that the mind's associative memory is a structurally associative memory, the concept that perception and control take place according to a multilevel hierarchy _ the argument of the present section uses none of these. It uses only an abstract, emasculated version of the hypothesis of the dual network, one which virtually anyone should be able to accept.

    To put all this in biological perspective, let us recall a point brought up in Chapter 3: the evolution of modern humans involved a gradual increase in brain size, and this increase was easily achieved by the strategy of neoteny. As the neoteny became more and more extreme, as the timing of developmental processes was pushed further and further ahead, the adult human brain became larger and larger - without fundamentally changing its structure. Eventually it became large enough that the "phase transition" process described above occurred. So, even though the evolution of brain size was probably gradual, the evolution of brain ability may well have been relatively sudden.

    Natural selection increased memory size, and the complex dynamics of memory turned this into a dramatic behavioral change. If this analysis is correct, it is perhaps the most striking possible example of structural instability and corresponding punctuated equilibrium _ it certainly dwarfs the standard examples which we trotted out in Chapter 3.     Let us close with a quotation. Regarding the relative ease of evolution of the large human brain, Stephen Jay Gould wrote:

    Perhaps the most amazing thing of all is a general property of complex systems, our brain prominent among them _ their capacity to translate merely quantitative changes in structure into wondrously different qualities of function. (1984, p. 133)

I have argued that this "amazing" phenomenon can be understood in terms of certain properties of graphs.


We have discussed Brown's concept that microgeny recapitulates phylogeny, and we have shown that graph theory suggests a "phasetransition" in associative memory structure, as pattern recognition ability increases gradually. But this does not complete our treatment of the natural history of mind. I have saved the most dramatic point for last. What happens when we put this phase transition together with the concept of recapitulation?

    On the one hand, we have the idea that, as the brain and its corresponding dual network grew bigger and bigger, at some point the effectiveness of this dual network jumped drastically upwards.

    On the other hand, we have the idea that the process leading up to (and hence constituting) a mental act is an approximate recapitulation of phylogenetic history: that its earlier stages are similar to mental acts of distant ancestors, its middle stages are similar to mental acts of less distant ancestors, and its later stages are similar to mental acts of recent ancestors.

    What do we get when we put these two concepts together? We get the hypothesis that, as a mental act is gradually forming, it experiences some sort of phase transition, some sort of "intuitive click."

    This rough idea may be refined somewhat. Consider the following argument:

    1) The structurally associative memory of a human being is vast and dense with connections (both previously recognized connections, and potential connections not yet recognized). But any particular mental process that "accesses" this memory is a) only aware of one small subnetwork of the memory, and b) only aware of a miniscule percentage of the connections within this subnetwork.

    2) Processes on higher levels of the perceptual-motor hierarchy are more sophisticated. Sometimes a higher-level process will be aware of more of the structurally associative memory than its lower-level counterparts _ let us call this "quantitative superiority." Other times, it will be aware of the same subnetwork as its lower-level counterpart, but will detect a greater percentage of the connections within this subnetwork _ let us call this "qualitative superiority."

    3) Suppose one has a hierarchy of processes, each one displaying qualitative superiority over its subsidiary processes. Then the graph theoretical results discussed above imply that eventually, as one proceeds gradually up this hierarchy within the perceptual-motor hierarchy, one will arrive at a sort of "magic level" _ a level below which processes experience acyclic or disconnected segments of the structurally associative memory, and above which processes experience densely connected segments of the structurally associative memory.

    4) When this "magic level" happens to coincide withconsciousness, might it not be experienced as a sort of clicking into place of disparate ideas, a sort of sudden intuition? _ an instant realization of how all the elements of the cluster of ideas one has been working with are interrelated?

    Note that this argument refers to a very special phenomenon _ not to every thought, but only to certain particular hierarchies of processes within the perceptual-motor hierarchy. The "phase transition" described will occur only in a hierarchy of processes, each of which displays qualitative superiority over its subsidiaries. How common this situation is, we have no empirical way of determining at the present time. But philosophically and logically, we have explained how a sudden leap in understanding might occur, within the evolutionary model of mind presented in the previous chapter.

    In fact, this argument could have just as well been placed at the end of Chapter 6 _ it was inspired by ideas regarding the natural history of mind, but it does not depend on them. However, there is something particularly satisfying about the correspondence between punctuated equilibrium in the historical evolution of mind, and sudden intuition in the evolution of individual thought processes. We have arrived at the startling conclusion that when a person has a "sudden feeling of insight," she may be, in a sense, re-living the ancestral evolution of intelligence!

    We have traveled rather far from contemporary biological data. Let us go no further _ let us stop here, at the most radical and sketchy point of our arduous conceptual journey. If nothing else, I hope that these final explorations have left the reader with a sense that the evolutionary psychology of Chapters 1-6 is not a sterile body of ideas. It is, rather, pregnant with new suggestions for resolving old dilemmas, and new directions for research.