From Complexity to Creativity -- Copyright Plenum Press, © 1997 |
Part I. The Complex Mind/Brain
CHAPTER 1. DYNAMICS, EVOLUTION, AUTOPOIESIS
CHAPTER ONE
DYNAMICS, EVOLUTION, AUTOPOIESIS
In this brief chapter, I will introduce a few preliminary ideas required for understanding the rest of the book: dynamics, attractors, chaos, genetic algorithms, magician systems, and algorithmic patterns in dynamics.
These concepts are diverse, but every one of them is a variation on the theme of the first one: dynamics. For "dynamics," most broadly conceived, is just change, or the study of how things change. Thus, for example, evolution by natural selection is a kind of dynamic. Comparing natural selection with physical dynamics, as portrayed e.g. by the Schrodinger equation, gives one a rough idea of just how broad the concept of dynamics really is.
Physics has focused on dynamics at least since the time of Newton. Psychology and the other human sciences, on the other hand, have tended to focus on statics, ignoring questions of change and process -- a fact which is probably due to the incredible dynamical complexity of human systems. Only now are we beginning to find mathematical and computational tools which are up to dealing with the truly complex dynamical systems that we confront in our everyday mental and social lives.
There is a vast and intricate body of mathematics called "dynamical systems theory." However, the bulk of the concrete results in this theory have to do with dynamical systems that are governed by very special kinds of equations. In studying very complex dynamical systems, as we will be doing here, one is generally involved with conceptual models and computer simulations rather than analytical mathematics. Whether this state of affairs will change in the future, as new kinds of mathematics evolve, is extremely unclear. But at present, some of the most interesting dynamical systems models are so complex as to almost completely elude mathematical analysis.
An example is the genetic algorithm, a discrete and stochastic mathematical model of evolution by natural selection. Another example is the autopoietic system, a model of the self-producing nature of complex systems. Both these dynamical systems models will be discussed in detail in later chapters, and briefly introduced in this chapter. It is these models, I believe, which are most relevant to modeling extremely complex systems like the mind/brain. In order to model psychological complexity, we must leave traditional dynamical systems theory behind, and fully confront the messiness, intricacy and emergent pattern of the real world.
In the last section of this chapter, the relation between dynamics and structure will be discussed, and dealt with in a formal way. The abstract theory of pattern, developed in my previous publications, will be introduced and used to give formal definitions of such concepts as "dynamical pattern" and "system complexity." The concept of pattern unifies all the various dynamics discussed in the previous sections. For these dynamics, like all others, are in the end just ways of getting to appropriate emergent patterns.
Finally, before launching into details, it may be worthwile to briefly reflect on these topics from a more philosophical perspective. Change and structure -- becoming and being -- are elementary philosophical concepts. What we are studying here are the manifestations of these concepts within the conceptual framework of conceptual science. Any productive belief system, any framework for understanding the world, must come to terms with being and becoming in its own way. Science has, until recently, had a very hard time dealing with the "becoming" aspect of very complex systems, e.g. living and thinking systems. But this too is changing!
Mathematical dynamical systems theory tends to deal with very special cases. For instance, although most real-world dynamical systems are most naturally viewed in terms of stochastic dynamics, most of the hard mathematical results of dynamical systems theory have to do with deterministic dynamics. And even within the realm of deterministic dynamics, most of the important theorems involve quantities that are only possible to compute for systems with a handful of variables, instead of the hundreds, thousands or trillions of variables that characterize many real systems. A course in dynamical systems theory tends to concentrate on what I call "toy iterations" -- very simple deterministic dynamical systems, often in one or two or three variables, which do not accurately model any real situation of interest, but which are easily amenable to mathematical analysis.
The great grand-daddy of the toy iterations is the logistic map, x_{n+1} = rx_{n}(1-x_{n}); a large portion of Devaney's excellent book Chaotic Dynamical Systems is devoted to this seemingly simple iteration. In the realm of differential equations, the Lorenz equation is a classic "toy model" (its discretization, by which its trajectories are studied on the computer, is thus a toy iteration). Implicit in the research programme of dynamical systems theory is the assumption that the methods used to study these toy iterations will someday be generalizable to more interesting dynamical systems. But this remains a promissory note; and, for the moment, if one wishes to model real-world systems in terms of dynamical systems theory, one must eschew mathematical theorizing and content oneself with qualitative heuristics and numerical simulations.
But even if the deeper results of dynamical systems theory are never generalized to deal with truly complex systems, there is no doubt that the conceptual vocabulary of dynamical systems theory is useful in all areas of study. In the following pages we will repeatedly use the language of dynamical systems and attractors to talk about psychological systems, but it is worth remembering that these concepts did not (and probably could not) have come out of psychology. They were arrived at through years of painstaking experimentation with simple, physics-inspired, few-variable dynamical systems.
So, let us define some terms. A dynamical system, first of all, is just a mapping from some abstract space into itself. The abstract space is the set of "states" of the system (the set of states of the real-world system modelled by the mapping; or the abstract dynamical system implicit in the mapping). The mapping may be repeated over and over again, in discrete time; this is an iterative dynamical system. Or it may be repeated in continuous time, in the manner of a differential equation; this is a "continuous" dynamical system. In general, continuous dynamical systems are more amenable to mathematical analysis, but discrete dynamical systems are more amenable to computer simulation. For this reason, one often has cause to transform one kind of dynamical system into the other.
A trajectory of a dynamical system is the series of system states that follows from a certain initial (time zero) state. For a deterministic dynamical system, a trajectory will be a simple series, for a stochastic dynamical system, it will be a constantly forward-branching collection of system states. When doing computer simulations of dynamical systems, one computes particular sets of trajectories and takes them as representative.
The key notion for studying dynamical systems is the attractor. An attractor is, quite simply, a characteristic behavior of a system. The striking insight of dynamical systems theory is that, for many mathematical and real-world dynamical systems, the initial state of the system is almost irrelevant. No matter where the system starts from, it will eventually drift into one of a small set of characteristic behaviors, a small number of attractors. The concept of "attractor" is, beyond all doubt, the most important contribution of dynamical systems theory to the general vocabulary of science.
Some systems have fixed point attractors, meaning that they drift into certain "equilibrium" conditions and stay there. Some systems have periodic attractors, meaning that, after an initial transient period, they lock into a cyclic pattern of oscillation between a certain number of fixed states. And finally, some systems have attractors that are neither fixed points nor limit cycles, and are hence called strange attractors. An example of a two-dimensional strange attractor, derived from equation (3) below, may be found in Figure 1. The most complex systems possess all three kinds of attractors, so that different initial conditions lead not only to different behaviors, but to different types of behavior.
The formal definition of "strange attractor" is a matter of some contention. Rather than giving a mathematical definition, I prefer to give a "dictionary definition" that captures the common usage of the word. A strange attractor of a dynamic, as I use the term, is a collection of states which is: 1) invariant under the dynamic, in the sense that if one's initial state is in the attractor, so will be all subsequent states; 2) "attracting" in the sense that states which are near to the attractor but not in it will tend to get nearer to the attractor as time progresses; 3) not a fixed point or limit cycle.
The term "strange attractor" is itself a little strange, and perhaps deserves brief comment. It does not reflect any mathematical strangeness, for after all, fixed points and limit cycles are the exception rather than the rule. Whatever "strangeness" these attractors possess is thus purely psychological. But in fact, the psychological strangeness which "strange attractors" originally presented to their discoverers is a thing of the past. Now that "strange attractors" are well known they seem no stranger than anything else in applied mathematics! Nevertheless, the name sounds appealing, and it has stuck.
Strictly speaking, this classification of attractors applies only to deterministic dynamical systems; to generalize them to stochastic systems, however, one must merely "sprinkle liberally with probabilities." For instance, a fixed point of a stochastic iteration x_{n+1} = f(x_{n}) might be defined as a point which is fixed with probability one; or a p-fixed point might be defined as one for which P(f(x) = x) > p. In the following, however, we will deal with stochastic systems a different way. In the last section of this chapter, following standard practice, we will think about the IFS random iteration algorithm by transplanting it to an abstract spaces on which it is deterministic. And in Chapter Six I will study the genetic algorithm by approximating it with a certain deterministic dynamical system. While perhaps philosophically unsatisfactory, in practice this approach to stochasticity allows one to obtain results that would become impossibly complicated if translated into the language of true stochastic dynamical systems theory.
Chaos
A great deal of attention has been paid to the fact that some dynamical systems are chaotic, meaning that, despite being at bottom deterministic, they are capable of passing many statistical tests for randomness. They look random. Under some definitions of "strange attractor," dynamics on a strange attractor are necessarily chaotic; under my very general definition, however, this need not be the case. The many specialized definitions of "chaos" are even more various than the different definitions of "strange attractor."
Let us look at one definition in detail. In the context of discrete dynamical systems, the only kind that will be considered here, Devaney defines a dynamical system to be chaotic on a certain set if it displays three properties on that set: sensitive dependence on initial conditions, topological transitivity, and density of repelling periodic points.
I find it hard to accept "density of repelling periodic points" as a necessary aspect of chaos; but Devaney's other two criteria are clearly important. Topological transitivity is a rough topological analogue of the more familiar measure-theoretic concept of ergodicity; essentially what it means is that the dynamics thoroughly mix everything up, that they map each tiny region of the attractor A into the whole attractor. In technical terms an iteration f is topologically transitive on a set A if, given any two neighborhoods U and V in A, the iterates f_{n}(U) will eventually come to have a nonempty intersection with V.
Sensitivity to initial conditions, on the other hand, means that if one takes two nearby points within the attractor and uses them as initial points for the dynamic, the trajectories obtained will rapidly become very different. Eventually the trajectories may become qualitatively quite similar, in that they may have the same basic shape -- in a sense, this is almost guaranteed by topological transitivity. But they will be no more or less qualitatively similar than two trajectories which did not begin from nearby initial points.
Although Devaney did not realize this when he wrote his book, it has since been shown that topological transitivity and density of periodic points, taken together, imply sensitivity to initial conditions. Furthermore, for maps on intervals of the real line, topological transitivity and continuity, taken together, imply density of periodic points. So these criteria are intricately interconnected.
The standard tool for quantifying the degree of chaos of a mapping is the Liapunov exponent. Liapunov exponents measure the severity of sensitive dependence on initial conditions; they tell you how fast nearby trajectories move apart. Consider the case of a discrete-time system whose states are real numbers; then the quantity of interest is the ratio
R_{n} = |f^{n}(x) - f^{n}(y)|/|x-y| (1)
where f^{n} denotes the n-fold iterate of f. If one lets y approach x then the ratio R_{n} approaches the derivative of f^{n} at x. The question is: what happens to this derivative as the elapsed time n becomes large? The Liapunov exponent at x is defined as the limiting value for large n of the expression
log[ f^{n'}(x) ]/n (2)
If the difference in destinations increases slowly with respect to n, then the trajectories are all zooming together, the ratio is less than one, and so the Liapunov exponent is negative. If the trajectories neither diverge nor contract, then the ratio is near one, and the Liapunov exponent is the logarithm of one -- zero. Finally, and this is the interesting case, if the difference in destinations is consistently large with respect to n, then something funny is going on. Close-by starting points are giving rise to wildly different trajectories. The Liapunov exponent of the system tells you just how different these trajectories are. The bigger the exponent, the more different.
A system in one dimension has one Liapunov exponent. A system in two dimensions, on the other hand, has two exponents: one for the x direction, computed using partial derivatives with respect to x; and one for the y direction, computed using partial derivatives with respect to y. Similarly, a system in three dimensions has three Liapunov exponents, and so forth.
A positive Liapunov exponent does not guarantee chaos, but it is an excellent practical indicator; and we shall use it for this purpose a little later, in the context of the plane quadratic iteration as given by:
x_{n+1} = a_{1} + x_{n}*(a_{2} + a_{3}*x_{n} + a_{4}*y_{n}) + y_{n}*(a_{5} + a_{6}*y_{n})
y_{n+1} = a_{7} + x_{n}*(a_{8} + a_{9}*x_{n} + a_{10}*y_{n}) + y_{n}*(a_{11} + a_{12}*y_{n})
(3)
This is an example of a very simple dynamical system which mathematics is at present almost totally unable to understand. Consider, for instance, a very simple question such as: what percentage of "parameter vectors" (a_{1},...,a_{12}) will give rise to strange attractors? Or in other words, how common is chaos for this iteration? The answer, at present, is: go to the computer and find out!
Now let us take a great leap up in complexity, from simple polynomial iterations to the dynamic of evolution.
As originally formulated by Darwin and Wallace, the theory of evolution by natural selection applied only to species. As soon as the theory was published, however, theorists perceived that natural selection was in fact a very general dynamic. Perhaps the first to view evolution in this way was Herbert Spencer. Since the time of Darwin, Wallace and Spencer, natural selection has been seen to play a crucial role in all sorts of different processes.
For instance, Burnet's theory of clonal selection, the foundation of modern immunology, states that immune systems continually self-regulate by a process of natural selection. More speculatively, Nobel Prize-winning immunologist Gerald Edelman has proposed a similar explanation of brain dynamics, his theory of "neuronal group selection" or Neural Darwinism. In this view, the modification of connections between neuronal groups is a form of evolution.
The very origin of life is thought to have been a process of molecular evolution. Kenneth Boulding, among many others, has used evolution to explain economic dynamics. Extending the evolution principle to the realm of culture, Richard Dawkins has defined a "meme" as an idea which replicates itself effectively and thus survives over time. Using this language, we may say that natural selection itself has been a very powerful meme. Most recently, the natural selection meme has invaded computing, yielding the idea of evolutionary computation, most commonly referred to by the phrase "genetic algorithms."
These diverse applications inspire a view of evolution as a special kind of dynamic. The evolutionary dynamic is particularly useful for modeling extremely complex systems -- biological, sociological and cultural and psychological systems. Evolutionary dynamics has its own properties, different from other dynamics. All systems which embody adaptive evolution will display some of the characteristic properties of evolutionary dynamics -- along with the characteristics of other dynamics, such as the structure-preserving dynamic of "autopoiesis" or "ecology."
Evolutionary Computing
A particular focus will be laid here on evolutionary computing -- not only because evolutionary computer programs are interesting in their own right, but because of the light they shed on evolutionary dynamics in general. Computational models of evolution represent the evolutionary process stripped bare, with the intriguing but distracting intricacies of the biological world stripped away. They allow us to get at the essence of the evolutionary process in a way that is sometimes quite striking.
From a practical, computational perspective, the exciting thing about evolutionary computation is that it seems to work by "magic." The recipe is so simple. First, choose an appropriate representation from the standard repertoire -- i.e., represent the set of possible solutions to one's problem as a collection of binary strings, or floating-point vectors, or LISP programs. Next, formulate a fitness function: a function which assigns each potential solution a numerical measure of "quality." Supplied with standard default routines for crossover, mutation and selection, the genetic algorithm will do the rest. Starting from a random initial population, it lets the population reproduce differentially with respect to fitness for a number of generations. Then -- presto! out pops an answer. With luck, the answer is a good one: a maximizer or near-maximizer of the fitness function.
The detailed dynamics of an evolutionary program are for all practical purposes incomprehensible. Except in trivial cases, there is no workable way to get at "what the program is doing," on a step-by- step basis. One just has to watch, wait and see; and if the result is bad, one adjusts the parameters, the fitness criterion or the representation, and begins again. This mystery as to internal dynamics can be frustrating to those scientists who are accustomed to a greater degree of control. But the key point is that the frustration and the magic of evolutionary computation are exactly parallel to the frustration and the magic of real, non-computational evolution. Both real-world and simulated evolution produce a lot of duds, often for hard-to-understand reasons. And both produce a fair number of successes, sometimes slowly, sometimes quickly, and with the occasional appearance of miracle.
From a psychological point of view, the main interest of evolutionary computing is as a general conceptual and computational model of problem-solving and creativity. Psychological theorists going back to William James and Charles Peirce have viewed the process of learning and invention in evolutionary terms. Evolutionary computing is a concrete instantiation of their insight: it presents systems which actually do learn by evolution. Here we will primarily work with genetic algorithms, but with an eye always on the more general class of evolutionary learning dynamics.
Our interest in genetic algorithms, in this book, has multiple dimensions. First, we will be interested in the dynamics of evolutionary learning systems, and in exploring whether there is any commonality between these dynamics and the dynamics of human learning. We will be interested in the use of genetic algorithms to generate complex forms, rather than merely solving optimization problems. We will be interested in evolutionary computing systems as autopoietic systems, in the role which self-organization and self-production plays in the dynamics of evolution. Finally, we will be interested in whether it is possible to extend such systems as genetic algorithms, which are obviously very limited models of human creativity, into more complete and psychologically realistic models.
The Simple GA
Perhaps the most common form of evolutionary computation is the Simple Genetic Algorithm, or simple GA. The simple GA is an extremely crude model, both biologically and computationally, but as the name suggests, it has the virtue of simplicity. It leaves out a number of features that are absolutely essential to biological evolution -- most notably, spatial distribution, and ecology. Even so, however, it has provided us with a great deal of insight into the evolutionary process.
The simple GA, as I mean it here, consists of a population of fixed size N, each element of which is a binary string of length M. It begins with a random initial population and then, to get from one generation to the next, proceeds as follows:
1) Evaluate the objective "fitness" function value f(x) of
each population element x. Sum these values to obtain the quantity SUMFITNESS. The probability of selecting x will then be given by f(x)/SUMFITNESS.
2) Select two elements of the population, call them x and y, and compute the crossover product C(x,y). Then mutate each bit of this crossover product with a certain independent probability (mutation rate). Repeat this step until M new binary strings have been created.
3) Replace the old population with the new population created in Step 2, and return to Step 1.
What is this thing called a "crossover operator"? On an abstract level, one might make the following definitions:
A crossover operator is any map C(,) from the space of pairs of length n bit strings to the space of probability distributions on length n bit strings.
A true crossover operator is any crossover operator which possesses the following property: if neither s nor t possess a B in the i'th position, then the probability that C(s,t) possesses a B in the i'th position must be zero (here B is either 0 or 1). In other words, a true crossover operator picks from among the bits provided by the "parents" it is crossing; it does not introduce anything new, but only recombines.
But this level of generality is almost never utilized in practice. Usually in practical and theoretical work with GA's, a bitwise or multi-point cut-and-paste operator is assumed.
Most simply, one may use a single point cut-and-paste operator, according to which, in order to cross a_{1}...a_{M} with b_{1}...b_{M}, one chooses at random a cut-point r between 1 and M, and then forms a_{1}...a_{r}b_{r+1}...b_{M}. Mutation then proceeds through the string bit by bit: supposing the mutation rate is .01, then each bit is changes to its negation independently with probability .01.
For instance, suppose one has
00001111101
as the first parent, and
10101010101
as the second parent. If the cut-point is randomly chosen as 5, then the cut occurs before the fifth position, yielding
0000 | 1111101 parent 1
1010 | 1010101 parent 2
______________
0000 1010101 "pure crossover" child
and finally, perhaps
00001011101
as the mutated child.
The specific mechanics of crossover are not essential to the function of the GA; any crossover operator that provides a broad-based search of the space of genotypes will do. The point is that crossover is a very flexible, "intelligent" kind of search. In the language of search algorithms, instead of merely searching from one point to other nearby points, it does a wide-ranging but subtly constrained stochastic search, using the information available in pairs of points and populations of points.
Beyond the Simple GA
The simple GA is ideal for mathematical analysis, but it can be cumbersome for practical GA work. One common modification is to use real vectors instead of bit strings as one's population elements. This is more natural for many applications, particularly since most modern programming languages offer a floating point data type. Numerous examples of the floating point GA are given in Michaelewicz's recent book.
Crossover of two real vectors is easy to define. In the one cut-point model, for instance, one might cross (.33,.55,.44) with (.55,.34,.17) to obtain any of the vectors:
(.33,.55,.44)
(.33,.55,.17)
(.33,.34,.17)
(.55,.34,.17)
(note the noncommutativity of the crossover operation, here as in the bit string model).
Mutation of two real vectors is a little trickier: it is not as simple as merely flipping between 0 and 1 in the bit string case. Instead, to mutate an entry x in a vector, one selects a number from some probability distribution centered around x. So, if (.33,.55,.44) were crossed with (.55,.34,.17) to obtain the unmutated offsprint (.33,.55,.17), the final, mutated offspring might look something like (.36,.50,.18). One can see from this that mutation plays a somewhat larger role in the floating point GA than in the bit string GA.
The role that the "mutation probability" plays in the bit string model is here played by the standard deviation of this probability distribution. A Gaussian distribution is the standard choice here, but there are also arguments for using something like a Cauchy distribution, which gives more weight to outlying values. Sometimes it is useful to gradually decrease this standard deviation from one generation to the next.
Substituting vectors and their genetic operations for bit strings and their genetic operations, one obtains a "simple floating-point GA." The simple GA and simple floating-point GA are really very similar as evolutionary implementations go. Much more radical is, for instance, John Koza's "genetic programming" implementation, in which the representation is neither strings nor vectors but simple LISP programs. The LISP programs are represented as trees labeled with commands. Mutating a program involves replacing one command with another; crossing over two programs means taking a subtree from one program tree and putting it in place of some subtree from the other program tree. As will be briefly argued later, this kind of graph-based evolutionary computation would seem to have a deep relevance to cognitive science. For practical purposes, however, it is at the present time rather cumbersome.
The Dynamics of Genetic Optimization
What, finally, is the psychological meaning of the GA? This is a topic for later chapters, but a hint may be given here.
In Chapter Six we will study the dynamics of the genetic algorithm in the case of very large population size, using tools from standard dynamical systems theory. This investigation will lead to a number of interesting results, including a mathematical parallel between genetic algorithms and the dynamics human and animal learning. Crossover, it seems, provides the kind of learning power that biological systems have. Mutation does not.
Also, we will look at the possibility of embedding the genetic algorithm in a spatially-extended, ecological model. This leads to much more interesting dynamics. The spatial organization displays typical properties of self-organizing systems, and one sees the emergence of an hierarchy of complex spatiotemporal attractors. This kind of genetic algorithm, it will be argued, has many interesting analogues with the psynet model of mind, and may well be able to serve as a computational substrate for the emergent structures described in the psynet model.
Finally, in the last chapter, the genetic algorithm will re-emerge as a kind of computational metaphor for a certain phase of human creativity. Typically, crossover of mental structures is strongly contrained by autopoietic thought-systems. But in the highly creative portions of the mind, this constrained is lessened, and one has a free-flowing recombination and mutation process quite similar to the GA.
Beyond continuous dynamical systems and genetic algorithms, we will require one further kind of complex systems model, the magician system. Philosophically, magician systems are largely equivalent to Maturana and Varela's "autopoietic systems," and Kampis's "component-systems" (see Varela, 1978; Kampis, 1992) However, the focus is different, and the precise mathematical formulation (as will be elaborated in later chapters) is different.
What are autopoietic and component systems? An autopoietic system, as defined by Maturana and Varela, is a system which produces itself. It is a system of components which interact according to some network of interrelationships; and so that the network of interrelationships is produced by the system components themselves. The paradigm case is the biological organism. Autopoietic systems are generally dissipative, i.e. they do not conserve energy. What they do conserve, however, is structure. A body is an open system, creating entropy, but what it uses its energy doing is maintaining its own structure.
A component-system, as defined by Kampis, is something only a little different: it is simply a collection of components which are capable of transforming each other, and of coming together to form larger components. In its emphasis on transformations and compounds, this vision of complex dynamics reflects Kampis's background as a chemist. However, Kampis does not give a formal mathematical definition of component-systems; he claims this is not possible using current mathematics.
The relation between these abstract, system-theoretic models and computation is somewhat problematic. Kampis has proclaimed component-systems to be fundamentally non-computational; Varela has made similar remarks about autopoietic systems. However, both of these scientists have studied their own system-theoretic models using computer simulations. Kampis presents a semi-rigorous "proof" that component-systems are uncomputable; however, the proof only applies to deterministic Turing machines, not stochastic computers or quantum computers (Deutsch, 1985). I have argued in CL that component-systems are stochatically computable; and I have also pointed out, in SI, that there is no way to tell stochastically computable systems from deterministically computable systems. In essence, "This system is stochastic" just means "This system has aspects which appear to me patternless." So, in my view, the computability issue turns out to be a non-issue.
In CL, I have given a mathematical formulation in terms of component-systems in terms of hypersets, a concept from transfinite set theory; and I have argued that, for practical purposes, these hyperset component-systems can be effectively modeled in terms of computational systems. However, Kampis (personal communication) does not fully accept this reduction, and so in CL I introduce a new system-theoretic model called the Self-Generating System or, more compactly, magician system. The magician system captures largely the same insight as Kampis's component-systems, but in a more precisely specified way.
A magician system consists, quite simply, of a collection of entities called "magicians" which, by casting spells on each other, have the power to create and destroy magicians. Magician A can "cast a spell" transforming magician B into magician C; or magicians C and D can cooperatively "cast a spell" creating an anti-A magician, which annihilates magician A.
The existence of "antimagicians," magicians which have the power to annihilate other magicians, is necessary in order for magician systems to have the full computational power of Turing machines. These antimagicians are much like antiparticles in physics: when magician A and magician anti-A come into contact with each other, both are destroyed. In Chapter Seven magician systems will be formalized in terms of hypercomplex numbers and other abstract algebras, and the algebraic role of antimagicians will be clarified. For now, however, an informal discussion will suffice.
At each time step, the dynamics of a magician system consists of two stages. First the magicians in the current population act on one another, casting their spells on each other, producing a provisional new population. Then the magician/antimagician pairs in the provisional new population annihilate one another. The survivors are the new population at the next time step.
What I call an autopoietic subsystem of a magician system, then, is simply a magician system which is an attractor for this dynamic of inter-creation. This usage of the word "autopoiesis" may not accord precisely with the intentions of Maturana and Varela, but the meaning is very close. Both of us are concerned with systems that produce themselves. It seems preferable to use the term "autopoiesis" with a slightly different shade of meaning, rather to creating yet another new word.
For psychological modeling, the case where the magicians cast spells involving pattern-recognition processes is particularly interesting. In CL, special terminology is introduced to deal with this case: the magician system iteration is called the cognitive equation, and autopoietic subsystems are called structural conspiracies. A structural conspiracy is a set of patterns that produces itself by mutual pattern-recognition; an autopoietic subsystem is any set of magicians that produces itself by mutual transformation or "spell-casting."
If a system is a fixed-point attractor under the magician dynamic then it, very simply, produces itself. More generally, one may look at collections of magician systems which are limit cycles or strange attractors of the magician dynamic: these too, in a practical context, may be considered autopoietic systems.
There are many ways to vary the basic magician system dynamic; for instance, there is the question of which magicians get to act on one another at each time step. The pairs may be selected at random with certain probabilities (the "well-mixed" approximation), or else some spatial structure may be imposed, with each magician acting on its neighbors. Also, the restriction to pairs is somewhat arbitrary and unnecessary. One may have magicians which create other magicians all by themselves, or magicians which create by acting on groups of magicians. The product of a magician's action need not be a single magician; it may be a group of magicians. Finally, there may be a "filtering" operation which acts on the magician population as a whole, before it is turned into the next generation. However, these variations do not affect the basic ideas.
Needless to say, there are very many concrete instantiations of magician systems. For instance, one might point to enzymes and other substances in biochemistry: these substances act on each other to create each other. Or, more generally, one might speak of the whole array of cellular processes existing in the body. Here, however, we will focus on interpreting the dynamical system of mental processes making up an individual's mind as a magician system.
To apply the magician systems formalism to any particular situation, one must make specific commitments about the space of magicians. In the context of the psynet model, abstract discussions can be made without reference to the nature of this space. When one goes to explore the model's instantiation in specific physical systems, however, one can no longer avoid the question of the nature of the individual magicians. If the reader desires to have a concrete model in mind while reading the present chapter, it is convenient to define magician action by reference to a fixed universal Turing machine which takes two tapes instead of one: one tape for "program" and one tape for "data" (this construction was inspired by the work of Moshe Koppel; see (Koppel, 1987)). The product action of A on B denotes the result of running the Turing machine with program A and data B.
For example, in the case of a pattern recognition system, the only programs A to be permitted are those which recognize patterns in their data B. Psychologically, we will think of our mental magicians as abstract pattern/processes, the "programs" of which serve to recognize patterns in their "data," and/or to create patterns in their environments (the "data" of other magicians).
The dynamics of magician systems are even less accessible to mathematical analysis than those of the genetic algorithm. For indeed, as will be shown later on, the genetic algorithm may be viewed as a special kind of magician. The magician elements are genotypes (bit strings, or whatever), and the magician action is crossover plus mutation. Magician systems may thus be viewed as a kind of "generalized genetic algorithm," where the standard crossover operator is replaced by a flexible, individualized crossover operator. This kind of dynamical system is difficult to analyze, difficult to simulate, and difficult to understand. However, I will argue in the following chapter that this is also precisely the type of dynamical system we need to use to model the brain/mind.
The relation between structure and dynamics is a question that lies at the very heart of science. Typically, physics has focused on dynamics, and has understood structures in a dynamical light; while chemistry and biology have understood structures as basic entities in their own right. Psychology, more than any other science, has exalted structure over dynamics, and has, except in the context of developmental psychology, almost entirely ignored processes of change.
On a more fundamental level, structure and dynamics are two different ways of looking at the world, two philosophical perspectives. From the one perspective, the world is composed of fixed entities, with aspects that change. From the other perspective, everything is always changing, and these world's many processes of change give rise to semi-permanent, stable "attractors" that we perceive as definite structures.
These general considerations suggest that it is very important for complexity science to come to grips with the relation between structure and dynamics -- and, most essentially, with the emergence of structure out of dynamics. Indeed, this is just a rephrasing of the title of this book, "Complexity to Creativity." How, out of the intricate, chaotic complexity of micro-level dynamics, do we get the creation of beautiful, complicated macro-level structures?
In order to address this kind of question, I suggest, one has to shift the focus of attention from dynamics itself to emergent patterns -- emergent patterns in structures, and emergent patterns in dynamics. From the pattern-theoretic point of view, structure and dynamics are just different ways of getting to patterns.
This perspective has particular psychological importance, given that, in SI, I have argued that the mind is the collection of patterns in the brain. The relation between neurodynamics and mind is the relation between a substrate and the patterns emergent from it. This is a particularly natural philosophy of mind, which has roots in the pragmatist philosophy of Charles S. Peirce (1935).
A Formal Theory of Pattern
There are many ways to formalize the notion of "pattern." For example, algorithmic information theory (Chaitin, 1987) gives us a convenient way of studying pattern using the theory of universal Turing machines. However, the concept of pattern is arguably more basic than the theory of universal computation. In previous works I have given a very simple mathematical definition of "pattern," and used it to model numerous psychological and biological processes. Namely, one may define a pattern as "a representation as something simpler." In symbols, this means, roughly speaking, that a process p is a pattern in an entity e if: 1) the result of p is a good approximation of e, and 2) p is simpler than e.
More rigorously, let s be a "simplicity function" mapping the union of the space of entities and the space of processes into the nonnegative real numbers; and let d be a metric on the space of entities, scaled so that d(r_{p},e)/s(e) = 1/c represents an unacceptably large degree of similarity. Then one reasonable definition of the intensity with which p is a pattern in e is given by the formula
[1 - c d(r_{p},e)/s(e)] [s(e) - s(p)] / s(e) (4)
The term [s(e)-s(p)]/s(e) = 1-s(p)/s(e) gauges the amount of simplification or "compression" provided by using p to represent e. If p provides no compression, this yields 0; in the limit where p is entirely simple (s(p)=0), the term yields its maximum value of 1 (100% compression). if, say, s(p) = .5s(e) then one has 50% compression. Next, the term 1 - c d(r_{p},e)/s(e) allows for approximate matching or "lossy compression": it has a maximum of 1 when r_{p}, the result of carrying out process p, is exactly the same as e. The maximum intensity of a pattern, according to this formula, will be 1; and anything with an intensity greater than zero may be considered to be a pattern. The set of all processes p that are patterns in e is called the structure of e; it is denoted St(e) and is a fuzzy set with degrees of membership determined by Equation 1.
The simplest way to express pattern computationally is to introduce a fixed universal Turing machine which takes two tapes instead of the usual one: one tape for "program" and one tape for "data." In this Turing machine model, the "entities" involved in the definition of pattern are binary sequences representing data, and the "processes" are binary sequences representing programs. The simplest course in this case is to define the simplicity of a binary sequence as its length. However, this is not the only possibility. There are other factors such as the program's running time, and the "crypticity" of the program (the difficulty of discovering the sequence in the first place). Many though not all of these possibilities are discussed in The Structure of Intelligence.
The reader schooled in modern theoretical computer science may be curious as to the relation between this general theory of pattern and algorithmic information theory (Chaitin, 1987). In fact, this relation is not at all difficult to determine. The key is to restrict attention to the special case where the metric d is defined so that d(x,y) is infinite unless x = y. Also, in algorithmic information theory, it is conventional to assume that all computer programs are "self-delimiting," i.e. contain a segment specifying their own length. Given these assumptions, if one defines the simplicity of a binary sequence as its length, one concludes that the algorithmic information I x is the simplicity of the simplest pattern in x. A straightforward example is the binary sequence
x = 100100100100100100100100100100100 ... 100100100100100100100100100100
consisting of 1000 repetitions of the string "100". Then we have s(p) = 200, while s(e) = 1000, and so the intensity of p as a pattern in e comes out to [1000 - 200]/1000 = .8.
It is crucial not to overlook the presence of the metric d in this equation. Algorithmic information theory assumes d away but, while this simplifies the mathematics, it is not adequate for real-world pattern recognition. Consider, for instance, the example of grammatical categories, which is a sort of paradigm case for categorization in general. Imagine a process that assigns words to categories, thus transforming an input stream of words into an input stream of grammatical tags (e.g. "I snipped the fly into two pieces" into "PRO V DET N PREP ADJ N"). This process definitely loses some of the structure of the input stream: namely, it loses semantics, phonology and pragmatics, retaining only grammar. But it allows one to make predictions regarding the sequence. For instance, having seen "PRO V DET N PREP" one can predict that the next word in the input stream is very unlikely to be a V, a verb. This predictive power is a consequence of the fact that the tag sequence approximates the input sequence, to within a certain degree of precision. The tag sequence is a potential pattern in the input sequence, in the sense that it produces something simpler than the input sequence, which approximates the input sequence. Whether the degree of approximation is adequate to render the tag sequence a pattern depends on the definition of the metric d. In the human mind the metric d gauges semantic and pragmatic similarity; it is defined implicitly by a constellation of other linguistic and nonlinguistic processes.
This kind of approximate pattern has wide relevance beyond linguistics. For example, the Chaos Language Algorithm, to be described below, recognizes grammatical patterns underlying the trajectories of dynamical systems. The patterns which it recognizes are not patterns in the strict sense of algorithmic information, but they are valid and useful patterns nonetheless.
Static, Dynamic and Static/Dynamic Patterns
In general, when studying any kind of system, one is interested in patterns which are recognizable in the tuple
Hist(S) = (S(t),S(t+1),...,S(r)) (5)
These might be called "static/dynamic patterns"; they are patterns which incorporate information about both the "static" structure of a system at a given time, and the "dynamic" structure of a system's temporal trajectory.
Next, in order to understand the complexity of systems, it pays to introduce the further concept of emergence. Let e#f denote some kind of "join" of the two entities e and f (such as, if e and f are two physical entities, the composite entity obtained by placing e and f next to each other; or if e and f are two bit strings, the result of placing e and f end to end). Then a process p is an emergent pattern between e and f to the extent
IN(p|e#f) - [ IN(p|e) + IN(p|f) ] (6)
Now, suppose the system S(t) is composed of a collection of "component parts," {S_{i}(t), i=1,...,N}. Each component part leads to its own tuple Hist(S_{i}), and hence to its own static/dynamic patterns. A complex system is one in which a great number of emergent patterns arise as a consequence of interactions between the parts. In other words, the complexity of a system should be measured in terms of the size of
St(Hist(S)) - [St(Hist(S_{1})) + ... + St(Hist(S_{N}))] (7)
The measurement of the size of a fuzzy collection of patterns is a matter of some difficulty; one must subtract off for overlap among different patterns, and there is no entirely "fair" way to do so. This issue is discussed in SI.
It will also be useful to define more specific types of patterns in systems. What I call a "purely static pattern" is, quite simply, a pattern in the state S(t) of a system S at some given time t. A "dynamic pattern" is, on the other hand, a pattern which is observable in the way a system changes. This pattern need have nothing to do with the actual structure of the system at any given time; it must emerge solely from observations of the way in which the system changes from one time to the next. Symbolic dynamics, a mathematical tool to be introduced in Chapter Five, deals with purely dynamic pattern. Much of traditional psychology deals only with static pattern: changes are hardly considered at all. The most interesting patterns of all, however, are the static/dynamic patterns, the ones that combine change over space with change over time. And some of the most interesting static/dynamic patterns, as it turns out, are the ones that serve to relate purely static and purely dynamic patterns. It is patterns in this class which allow us to observe the state of a system at a given time, and thus predict how the system will change; or else to observe the way a system changes, and thus make reasonable estimates of the nature of the system's internal structure.
The Structure-Dynamics Principle which is proposed below is a pattern of this type: it states, quite simply, that in many very complex systems there is a large overlap between the set of purely static patterns and the set of purely dynamic patterns. In other words the Principle says that Being and Becoming overlap; the reality of pattern and structure is deeper than the reality of space and time. As we probe the nature of structure and dynamics using tools from mathematics and computing, we will find the essential philosophical nature of these concepts is never distant, and never irrelevant.