Chapter Two:
PATTERN AND PREDICTION

Language, thought and reality form an inseparable triad. Each one is defined by the others; you can't understand any one of them until you have understood the other two. But in order to speak about this triad, I must noneless begin somewhere. I will begin with thought, mainly because this is the topic with which most of my previous writings have been concerned. In this chapter I will review the model of mind presented in my earlier works, embellishing where necessary and placing an emphasis on those aspects that are most relevant to the task ahead.

The key phrases for understanding this model of mind are pattern, process, and global structure. The mind is analyzed as a network of regularities, habits, patterns. Each pattern takes the form of a process for acting on other mental processes. And the avenues of access joining these processes adhere roughly to a specific global structure called a dual network. This is an abstract, computational way of looking at the mind. But it fits in well with the qualitative nature of current neurological data. And, as we shall see, it gives a great deal of insight into many concrete issues regarding the human mind.

2.1. THE LOGIC OF PATTERN

Pattern-symbolic expressions are exact, as mathematics is, but are not quantitative. They do not refer ultimately to number and dimension, as mathematics does, but to pattern and structure. Nor are they to be confused with the theory of groups orsymbolic logic, though they may be in some ways akin.

-- Benjamin Lee Whorf

Before I can talk about the structure of the mind, I must first develop an appropriate vocabulary. In this section and the next, following The Structure of Intelligence, I will present a general mathematical vocabulary for discussing structure. These ideas, while abstract and perhaps rather unexciting in themselves, are essential to the psychological ideas of the following sections.

Before getting formal, let us first take a quick intuitive tour through the main concepts to be discussed. The natural place to begin is with the concept of pattern. I define a pattern, very simply, as a representation as something simpler.

A good example is computer image generation. Suppose one wants to tell one's PC to put a certain picture up on the screen. There are many ways to do this. One is to write a program telling the computer exactly what color to make each little pixel (each dot) on the screen. But this makes for a very long program -- there are around twenty thousand pixels on the average screen.

A better way to do it is to come up with some algorithm that exploits the internal structure of the picture. For instance, if one is dealing with a figure composed of four horizontal stripes, alternatingly black and white, it is easy to tell the computer "fill in the top quarter white, the next quarter down black, the next quarter down white, and the bottom quarter white." This program will be much much shorter than the program giving a pixel-by-pixel description of the picture. It is a pattern in the picture.

The same approach works with more complicated pictures, even photographs of human faces. Michael Barnsley (1989), using fractal image compression techniques, has given very a short program which generates realistic portraits and landscapes. In general, computer graphics experts know how to write short programs to generate very close approximations to all ordinary pictures -- houses, people, dogs, clouds, molecules,.... All of these things have a certain internal structure, which the clever and knowledgeable programmer can exploit.

A screen filled with static, on the other hand, has no internal structure, and there is no short-cut to generating it. One can rapidly generate "pseudo-random" static that will look to the human eye like random static, but one will not be getting a close approximation to the particular screen of static in question.

In general, a pattern is a short-cut -- a way of getting some entity that is in some sense simpler than the entity itself. A little more formally, suppose the process y leads to the entity x. Then y is a pattern in x if the complexity of x exceeds the complexity of y.

2.1.1. Structure

From pattern, it is but one small step to structure. The structure of an entity may be defined as the set of all patterns in that entity. For instance, in a figure consisting of a circle next to a square, there are at least two patterns -- the program for generating the circle and the program for generating the square.

Next, consider a picture of 100 concentric circles. Cut the picture in half to form two parts, A and B. Neither of the two parts A or B contains a pattern involving concentric circles. But the combination of the two does! A pattern has emerged from the combination of two entities. In general, the emergence between two entities A and B may be defined as the set of all processes that are patterns in the combination of A and B but not in either A or B individually.

In all this talk about pattern, one technical point repeatedly arises. Two processes, that are both patterns in the same entity, may provide different degrees of simplification. The intensity of a process relative to a given entity, defined formally in the following section, is a measure of how much the process simplifies the entity -- how strongly the process is a pattern in the entity. It has to do with the ratio of the complexity of the process to the complexity of the entity.

If one considers each pattern to have an intensity, then the structure of an entity becomes what is known as a "fuzzy set." It contains all the patterns in the entity, but it contains some more "intensely" than others. And, similarly, the emergence between two entities becomes a fuzzy set.

The structural distance between two entities A and B may then be defined quite naturally as the total intensity of all the patterns that are either in A or notB, but in B or not A. This measures how much structure differentiates A from B. Thus, for instance, the structural distance between two random entities would be zero, since there would be no structure in either entity -- the amount of structure differentiating two structureless entities is zero.

These concepts may be used to measure the total amount of structure in an entity -- a quantity which I call the structural complexity. The definition of this quantity is somewhat technical, but it is not hard to describe the basic idea. If all the patterns in an entity were totally unrelated to one another (as, perhaps, with this picture of the square next to the circle discussed above), then one could define the structural complexity of an entity as the sum of the complexities of all its patterns. But the problem is, often all the patterns will not be totally unrelated to each other -- there can be "overlap." Basically, in order to compute the structural complexity of an entity, one begins by lining up all the patterns in the entity: pattern one, pattern two, pattern three, and so on. Then one starts with the complexity of one of the patterns in the entity, adds on the complexity of whatever part of the second pattern was not already part of the first pattern, then adds on the complexity of whatever part of the third pattern was not already part of the first or second patterns, and so on.

These concepts, as described here, are extremely general. Very shortly I will outline a very specific way of developing these concepts in the context of binary sequences computing machines. A few chapters later, this analysis of the complexity of sequences and machines will be extended to deal with mathematical entities called hypersets. However, these technical specifications should not cause one to lose sight of the extreme generality of the concepts of "pattern," "structure" and "emergence." These concepts, in and of themselves, have nothing to do with sequences, machines, or hypersets -- they are completely general and philosophical in nature. It is essential to have concrete models to work with, but one must always keep in mind that these models are only secondary tools.

Finally, one comment regarding is in order regarding complexity. I have been speaking of the complexity of an entity as though it were an "objectively" defined quantity, which an entity possessed in itself independently of any observer. But the story does not end here. One may define a process to be a pattern in A relative to a given observer if the result of the processis A, and if the process appears simpler to A relative to that observer.

2.2. PATTERN AND INFORMATION (*)

Now it is time to make the concept of pattern more precise -- to give a specific, "objective" measure of complexity. The best way to do this is with the obscure but powerful branch of mathematics known as algorithmic information theory.

The concept of algorithmic information was conceived in the 1960's, by Kolmogorov (1965), Chaitin (1974) and Solomonoff (1964). Where U is a universal Turing machine understood to input and output potentially infinite binary sequences, and x is a finite binary sequence, it may be defined as follows:

Definition: The algorithmic information I(x) contained in x is the length of the shortest self-delimiting program for computing x on U given the (infinite) input string ...000...

A self-delimiting program is, roughly speaking, a program which explicitly specifies its own length; this restriction to self-delimiting programs is desirable for technical reasons which we need not go into here (Chaitin, 1974). It is not hard to show, using simulation arguments, that as the length of x approaches infinity, the quantity I(x) becomes machine independent.

Bennett (1982) has criticized this definition, on the grounds that what it really measures is "degree of randomness" and not "degree of structure." It assigns a random sequence maximum complexity, and a completely repetitive sequence like ...000... minimum complexity. He defines the logical depth of a binary sequence x, relative to a universal Turing machine U, to be the running time on U of the shortest self-delimiting program which computes x on U from the (infinite) input ...000... .

The sequence consisting of the first billion digits of pi has low algorithmic information, but, apparently, high logical depth. It can be proved that, as n goes to infinity, the vast majority of binary sequences of length n have near-maximal algorithmic information and logical depth.

Moshe Koppel (1987) has formulated a third measure of complexity, which he calls "sophistication" or "meaningful complexity." He has shown that for large n its behavior is similar to that of logical depth. Anapproximate opposite of the sophistication of a sequence is given by the crudity defined as follows. Instead of simply considering a program y for computing the sequence x, let us consider a program y that computes the sequence x from the input sequence z. Then the crudity of a pair (y,z) may be defined as |z|/|y|, where |z| denotes the length of the sequence z and |y| denotes the length of the sequence y.

SI discusses in detail the qualitative properties of sophistication, algorithmic information, logical depth, crudity and a number of hybrid complexity measures. It also introduces a completely new measure of complexity, called the structural complexity. Structural complexity differs significantly from all of the algorithmic information based complexity measures discussed above. It does not refer to one distinguished way of computing a sequence -- the shortest, the most "sophisticated," etc. Rather, it considers all possible economical strategies for computing a sequence, where an economical strategy for computing x -- or more succinctly a pattern in x -- may be defined as follows, given a fixed universal Turing machine U.

Definition: A pattern in x is a self-delimiting program y which computes x on U from the input ...000z000... (it is understood that z extends to the right of the tape head of U), so that the length of y plus the length of z is less than the length of x. Where | | denotes length, this may be written

|y| + |z| < |x|

The intensity of (y,z) in x is the quantity

1 - (|y| + |z|)/ |x|

(note that intensity is always positive if (y,z) is actually a pattern in x, and it never exceeds one).

Note that no generality would be lost if z were set equal to 0, or some other constant value. However, in many applications the (y,z) notation is useful.

We have introduced algorithmic information as an "objective" complexity measure, which makes the theory of pattern concrete. But this "objective" measure may be used to generate other, "subjective" complexity measures. To see how this can be done, assume some standard "programming language" L, which assigns to each program y a certain binary sequence L(y). The specifics of L are irrelevant, so long as it is computable on a Turing machine, and it does not assign the same sequence to anytwo different programs. Where U is a universal Turing machine and v and w are binary sequences, one may then propose:

Definition: The relative information I(v|w), relative to U and L, is the length of the shortest self-delimiting program which computes v, on U, from input L(y), where y is a minimal-length self-delimiting program for computing v on U.

Obviously, if v and w have nothing in common, I(v,w)=I(v). And, on the other hand, if v and w have a large common component, then both I(v,w) and I(w,v) are very small. If one sets |y| = I(y|x), one has a measure of complexity relative to x.

2.2.1. Fuzzy Sets and Infons

Intuitively, a "fuzzy set" is a set in which membership is not "either/or" but gradual. A good example is the set of tall people. Being nearly six foot, I belong to the set of tall people to a somewhat higher degree than my friend Mike who is five nine, and to a much higher degree than Danny DeVito, but to a much lower degree than Magic Johnson.

Formally, a fuzzy subset of a given set E is defined as a function dE from E into some interval [0,a]. Where x is in E, I will write dE(x) for the degree of membership of x in E. dE(x)=0 means that x is not an element of E. Unless it is specified otherwise, the reader should assume that a=1, in which case dE(x)=1 means that x is completely an element of E. The usual algebra placed on fuzzy sets is

dE(x union y) = max{ dE(x), dE(y) },

dE(x intersect y) = min{ dE(x), dE(y) }

but I shall not require these operations (Kandel, 1986). The only operation I will require is the fuzzy set distance |E - F|, defined for finite sets as the sum over all x of the difference |dE(x)-dF(x)|.

In Chapter Three I will introduce a few ideas from situation semantics (Barwise and Perry, 1981; Barwise, 1989), which speaks about infons and situations. I will define an infon as a fuzzy set of patterns, and will make sporadic use of the following quasi-situation-theoretic notations:

s |-- i means that i is a fuzzy set of patterns in s

s |-- i //x means that i is a fuzzy set of patterns in s, where

complexity is measured relative to x, i.e. by I(,x)

s |-- i //x to degree a,

(s,i,x,a), and

d(s,i,x) = a all mean that the intensity of i in s, according to

the complexity measure I(,x), is d. Here the intensity of i in s may be defined as the average over all w in i of the product [intensity of w in s] * [degree of membership of w in i].

In later chapters, I will call a quadruple such as (s,i,x,a) a belief. x is the belief-holder, i is the entity believed, s is what i is believed about, and a is the degree to which it is believed.

2.2.2. Structure and Related Ideas

As in Section 2.1, having formulated the concept of pattern, the next logical step is to define the structure of an entity to be the set of all patterns in that entity. This may be considered as a fuzzy set -- the degree of membership of w in the structure of x is simply the intensity of w as a pattern in x. But for now I shall ignore this fuzziness, and consider structure as a plain old set.

The structural complexity of an entity, then, measures the size of the structure of an entity. This is a very simple concept, but certain difficulties arise when one attempts to formulate it precisely. An entity may manifest a number of closely related patterns, and one does not wish to count them all separately. In words: when adding up the sizes of all the patterns in x, one must adhere the following process: 0) put all the patterns in a certain order, 1) compute the size of the first pattern, 2) compute the size of that part of the second pattern which is not also part of the first pattern, 3) compute the size of that part of the third pattern which is not also part of the first or second patterns, etc. One may then define the size |S| of a set S as the average over all orderings of the elements of S, of the number obtained by the procedure of the previous paragraph.

Where St(x) is the set of all patterns in x, one may now define the structural complexity of x to be the quantity |St(x)|. This is the size of the set of all patterns in x -- or, more intuitively, the total amount of regularity observable in x. It is minimal for arandom sequence, and very small for a repetitive sequence like 000...0. It deems 0101010...10 slightly more complex than 000...0, because there are more different economical ways of computing the former (for instance, one may repeat 10's, or one may repeat 01's and then append a 0 at the end). It considers all the different ways of "looking at" a sequence.

For future reference, let us define the structure St(D;r,s) of a discrete dynamical system D on the interval (r,s) as the set of all approximate patterns in the ordered tuple [D(r),...,D(s)], where D(t) denotes the state of S at time t.

And, finally, let us define the emergence Em(x,y) of two sequences x and y as the set St(xy) - St(x) - St(y), where xy refers to the sequence obtained by juxtaposing x and y. This measures what might be called the gestalt of x and y -- it consists of those patterns that appear when x and y are considered together, but not in either x or y separately. This is an old idea in psychology and it is now popping up in anthropology as well. For instance, Lakoff (1987,p.486-87) has found it useful to describe cultures in terms of "experiential gestalts" -- sets of experiences that occurs so regularly that the whole collection becomes somehow simpler than the sum of its parts.

2.3. STRUCTURE AND CHAOS

Before diving into computational psychology, let us briefly return to a topic raised in the Introduction: the meaning of "chaos." In Chapter Three it will be shown that the concept of chaos is related quite closely with certain psychological matters, such as the nature of intelligence and induction.

In mathematics, "chaos" is typically defined in terms of certain technical properties of dynamical systems. For instance, Devaney (1988) defines a time-discrete dynamical system to be chaotic if it possesses three properties: 1) sensitivity to initial conditions, 2) topological transitivity, and 3) density of periodic points. On the other hand, the intuitive concept of chaos -- apparent randomness emergent from underlying determinism -- seems to have a meaning that goes beyond formal conditions of this sort. The mathematical definitions approximate the idea of chaos, but do not capture it.

In physical and mathematical applications of chaos theory, this is only a minor problem. One identifies chaos intuitively, then uses the formal definitions for detailed analysis. But when one seeks to apply chaos theory to psychological or social systems, the situation becomes more acute. Chaos appears intuitively to be present, but it is difficult to see the relevance of conditions such as topological transitivity and density of periodic points. Perhaps these conditions are met by certain low-dimensional subsystems of the system in question, but if so, this fact would seem to have nothing to do with the method by which we make the educated guess that chaos is present. "Chaos" has a pragmatic meaning that has transcends the details of point-set topology.

2.3.1. Structural Predictability

In this section I will outline an alternative point of view. For starters, I define a temporal sequence to be structurally predictable if knowing patterns in the sequence's past allows one to roughly predict patterns in the sequence's future. And I define a static entity to be structurally predictable if knowing patterns in one part of the entity allows one to predict patterns in other parts of the entity. This allows us to, finally, define an environment to be structurally predictable if it is somewhat structurally predictable at each time as well as somewhat structurally predictable over time.

One may give this definition a mathematical form, modeled on the standard epsilon-delta definition of continuity, but I will omit that here. The only key point is that, if an environment is structurally predictable, then patterns of higher degree have in a certain sense a higher "chance" of being found repeatedly. This shows that the assumption of a structurally predictable environment implies Charles S. Peirce's declaration that the world possesses a "tendency to take habits." The more prominent and rigid habits are the more likely to be continued.

It is interesting to think about the relationship between structural predictability and chaos. For example, one key element of chaotic behavior is sensitive dependence on initial conditions (or, in physicists' language, positive Liapunov exponent). Sensitive dependence means, informally, that slightly vague knowledge of the past leads to extremely vague knowledge of the future. In practical terms, if a system displays sensitive dependence, this means that it is hopeless to try to predict the exact value of its future state. Structural predictability is compatible with sensitive dependence. It is quite possible for a system to possess sensitive dependence on initial conditions, so that one can never accurately predict its future state, but still display enough regularity of overall structure that one can roughly predict future patterns. Intuitively, this appears to be the case with complex systems in the real world: brains, ecosystems, atmospheres. Exact prediction of these systems' behavior is impossible, but rough prediction of the regularities in their behavior is what we do every day.

But sensitive dependence does not, in itself, make chaos -- it is only one element of chaotic behavior. There are many different definitions of chaos, but they all center around the idea that a chaotic dynamical system is one whose behavior is deterministic but appears random.

A pattern-theoretic definition of chaos is as follows: an entity x is structurally chaotic if there are patterns in x, but if the component parts of x have few patterns besides those which are also patterns in the whole. For instance, consider the numerical sequence consisting of the first million digits of the pi: 3.1415926535... There are patterns in this sequence -- every mathematical scheme for generating the expansion of pi is such a pattern. But if one takes a subsequence -- say digits 100000 through 110000 -- one is unlikely to find any additional patterns there. There may be some extra patterns here and there -- say, perhaps, some strings of repeated digits -- but these won't amount to much.

Structural chaos is a weak kind of chaos. All the commonly studied examples of chaotic dynamical systems have the property that, if one records their behavior over time, one obtains a structurally chaotic series (the easiest way to see this is to use symbolic dynamics). But on the other hand, the interesting structurally predictable series are not structurally chaotic.

2.3.2. Attractors, Strange and Otherwise

To probe more deeply into the relation between chaos and prediction, one must consider the notion of an "attractor." Let us begin with the landmark work of Walter Freeman (1991) on the sense of smell. Freeman has written down a differential equations model of the olfactory cortex of a reptile (very similar to that of a human), and studied these equations via computersimulations. The result is that the olfactory cortex is a dynamical system which has an "attractor with wings." Recall that an attractor for a dynamical system is a region of the space of possible system states with the property that:

1) states "sufficiently close" to those in the attractor lead eventually to states within the attractor

2) states within the attractor lead immediately to other states within the attractor.

An attractor which consists of only one state is called a "fixed point." It is a "steady state" for the system -- once the system is close to that state, it enters that state; and once the system is in that state, it doesn't leave. On the other hand, an attractor which is, say, a circle or an ellipse is called a "limit cycle." A limit cycle represents oscillatory behavior: the system leaves from one state, passes through a series of other states, then returns to the first state again, and so goes around the cycle again and again.

And a "strange attractor," finally, is a kind of attractor which is neither a fixed point nor a limit cycle but rather a more complex region. Behavior of the system within the set of states delineated by the "strange attractor" is neither steady nor oscillatory, but continually fluctuating in a chaotic manner. More specific definitions of "strange attractor" can be found in the technical literature -- for instance, "a topologically transitive attractor" or "a topologically transitive attractor with a transversal homoclinic orbit." But, like the formal definitions of "chaos," these characterizations seem to skirt around the essence of the matter.

Freeman found that the olfactory cortex has a strange attractor -- a fixed set of states, or region of state space, within which it varies. But this strange attractor is not a formless blob -- it has a large number of "wings," protuberances jutting out from it. Each "wing" corresponds to a certain recognized smell. When the system is presented with something new to smell, it wanders "randomly" around the strange attractor, until it settles down and restricts its fluctuations to one wing of the attractor, representing the smell which it has decided it is perceiving.

This is an excellent intuitive model for the behavior of complex self-organizing systems. Each wing of Freeman's attractor represents a certain pattern recognized -- smell is chemical, it is just a matter of recognizing certain molecular patterns. In general, the states of a complex self-organizing systems fluctuatewithin a strange attractor that has many wings, sub-wings, sub-sub-wings, and so on, each one corresponding to the presence of a certain pattern or collection of patterns within the system. There is chaotic, pseudo-random movement within the attractor, but the structure of the attractor itself imposes a rough global predictability. From each part of the attractor the system can only rapidly get to certain other parts of the attractor, thus imposing a complex structural predictability that precludes structural chaos.

In other words, the structure of the dynamics of a complex system consists of the patterns in its strange attractor. The strange attractors which one usually sees in chaos texts, such as the Lorentz attractor, have very little structure to them; they are not structurally complex. But that is because these systems are fundamentally quite simple despite their chaos. A truly complex system has a highly patterned strange attractor, reflecting the fact that, in many cases, states giving rise to pattern X are more likely to lead to states giving rise to pattern Y than they are to states giving rise to pattern Z. The states within the attractor represent patterned states; the patterns of the attractor represent patterns of transition. And these two sets of patterns are not unrelated.

Chapter Two: PATTERN AND PREDICTION