Miraculous Mind Attractor Contents

Copyright Ben Goertzel 1995

THE MIRACULOUS MIND ATTRACTOR

Chaos, Complexity,

Artificial Intelligence and Mind:

A Dynamical Dialogue


ACKNOWLEDGEMENTS

This book, like all my others, is the result of a personal quest, and owes relatively little to any of my colleagues on an individual level. Of course I have taken many ideas from the research literature, and I have benefitted immensely from the general gestalt of the complexity science community, but this is a different sort of debt.

However, I would like to convey thanks to those individuals who have, over the past few years, engaged me in inspiring discussions: Kent Palmer, Sally Goerner, Allan Combs, Mark Randell, John Pritchard, and my father, Ted Goertzel. Also, my mother deserves mention, not only for giving birth to me, but more specifically and recently for urging me to present my ideas in a nontechnical way.

Finally, I must thank my wife Gwen -- for many discussions on these issues and, more importantly, for providing a loving and interesting home. And my sons Zar and Zeb, for being wonderful kids, and for providing such intriguing case studies of the developing mind.


CAST

DR. Z -- Middle-aged male professor; an eccentric mathematician with some expertise in computer science, psychology and philosophy

NAT -- Computer programmer

MELISSA -- Poet and science-fiction novelist; Nat's partner

JIMI -- Artificial intelligence program written by Nat, based on Dr. Z's ideas

JANINE: Nat's ex-wife, a neurosurgeon


PRELUDE

ARTIFICIAL INTELLIGENCE AND HUMAN STUPIDITY

In a lecture hall at State University, Dr. Z is giving a talk to the undergraduate Computer Science Club on "The History and the Future of Artificial Intelligence." Nat, one of Dr. Z's students from several years ago, is sitting in the front row.

DR. Z: In an hour talk I can't possibly give a complete story of the history of artificial intelligence. What I want to do is just talk about some of the high points, and low points, of this very ambitious quest.

The high points, so far, haven't been so high; and some of the low points have been very, very low.... But, anyhow, I think it's a story that has lessons for all of us who are interested in computers or the mind ... or in any branch of science for that matter.

First, though, I'd like to encourage you to interrupt me as much as you like. I'm prone to ramble on and don't at all mind being prodded back on the track now and then....

Nat, I'm surprised to see you here -- are you back in school?

NAT: I'm just sitting in on Dr. Slomienka's graphics class this semester. I never had graphics when I was a student and some of the math is kind of forbidding.... So when I saw you were giving a talk, I....

DR. Z: You ran the other way as fast as possible.... But you got lost and wound up here by mistake.

NAT: Basically....

DR. Z: All right.

Well, let's get started. The first question I want to put to you is, is it at all possible to build a thinking machine?

For, after all, if the answer is no, then the whole project of artificial inteligence is totally worthless, and the history of AI is the history of a big mistake. Something like the history of alchemy.

I think the best way to approach this question is to consider a related question with less philosophical baggage attached to it: Is it possible to build a skyscraper five miles high?

This isn't really an engineering question -- it has nothing to do with the limitations of currently popular construction techniques. It's a query of a purely theoretical nature. Think about it: If the human race were willing to forego consumer goods for a decade, and devote all the money saved to the construction of a single building, then the deed could probably be done. Some tricky engineering problems would have to be solved along the way ... but the human race has solved plenty of difficult engineering problems before.

So, on the one hand, we could build a five mile high skyscraper using current engineering principles -- given the appropriate materials. No revolutionary innovations would be required. But, on the other hand, in practice no one is going to be building a five mile high skyscraper any time soon. It's too much money, too much trouble, and there's not enough point. Now, let's return to our original question: Is it possible to build a thinking machine? What I suggest is that the two questions are in the same category. They propose things which are implausible to do given current technological and economic realities ... but still quite possible, in a more abstract sense.

The argument is as follows. First: given any set of dynamical equations, we know how to program a general-purpose computer to simulate these equations, to within any desired degree of accuracy. And second: even though our understanding of the brain is at present imperfect, there is no reason to assume that we won't someday achieve a detailed scientific theory of the dynamics of brain cells.

In theory, therefore, we could build a thinking machine by programming a computer with the equations for the brain. And in fact, if the human race were collectively willing to sacrifice a few decades' worth of clothes, junk food and gasoline, we could quite possibly build a thinking machine right now, using the sum total of current brain theory and a network of a million supercomputers. And even if this is not true today, it may well be true in ten years, or twenty, or fifty. It is said that, over the past decade, computer power has doubled every eighteen months. Or something like that....

In practice, on the other hand, the construction of a thinking machine is a long way off. The cost is simply too high to justify. Just as the fifty mile high skyscraper will have to wait for better materials and/or more sophisticated automated construction techniques, the thinking machine will have to wait for more powerful computer chips and fully automated computer assembly technology.

But still, given the theoretical possibility of an intelligent computer, it is interesting to ask: what would such a machine be like? What are the algorithms of mind? This is the domain of computational psychology.

And, just as we persist in building higher and higher skyscrapers, it is interesting to see just how much intelligence we can squeeze out of our limited computational resources. How much mind can you get on a PC, a VAX or a CRAY? This is the domain of artificial intelligence

Artificial intelligence and computational psychology are closely related. This should almost go without saying. But the fact is that, historically, most artificial intelligence projects have proceeded from a misguided and very, very shallow understanding of human psychology. In particular, they have ignored the intricate interdependence of the different aspects of the human mind. And the result of this, as we shall see, has been a strange amalgam of artificial intelligence and artificial idiocy.

AI theorists have designed programs which are expert, or at least competent, in certain domains. But yet, these programs display next to no general intelligence. They do not extrapolate from one situation to the next, not even as well as a profoundly retarded human. They are digital relatives of Dustin Hoffman's character in the movie Rain Man, who could count and calculate with amazing speed and precision despite an inability to deal with everyday life. They are artificial idiot savants. Only in recent years, with the aid of ideas from complex systems science, has AI begun to muddle toward an understanding of what makes our intelligence so flexible.

STUDENT: But wait a minute, Dr. Z. How can you compare the mind with a skyscraper? It seems to me the two aren't at all analogous. One is a physical entity and the other is an abstract thing -- I guess we don't even know what it is.... In Dr. Bowen's class on the philosophy of mind we went over about two dozen theories of what the mind is. Don't you have to know what something is before you can even talk about building it?

DR. Z: Ah, poor girl, fallen under the influence of philosophers....

No, I don't think you have to know what something is before you can even talk about building it. If that were the case then we would never build anything, let alone talk about building anything, because I don't think we really know what anything is, when you come right down to it. I'm made of immense numbers of little particles, the dynamics of which are substantially a mystery to me and everyone else.... So I don't even know what I am. So what?

Comparing mind with a skyscraper may seem somehow blasphemous -- but really, when seen from the proper point of view, it's is no more noteworthy than comparing 1 with 2. You're familiar with the "Church-Turing Thesis"?

STUDENT: Yes.

DR. Z: Probably the most basic principle of computer science. It states that anything that can be described at all can be expressed in computational language. In terms of logical operations on bits -- IF's, AND's, OR's, NOT's, 0's and 1's. In this perspective, a skyscraper is just a specific type of computational entity, and so is a mind.

NAT: What he's saying, Helen, is that the whole world is just a computer. According to this interpretation of the Church-Turing Thesis, there's no difference between the real world and a virtual reality inside a machine.

That right, Dr. Z?

DR. Z: Well ... yes, that would be correct, Nat. But it's not the topic of this seminar! The key point for now is that, rather than making the mind physical, or making the physical world mental, the Church-Turing Thesis brings mind and reality together on an equal footing -- it makes them both computational. This is a totally new view of the world, and I find it a very exciting one....

STUDENT: That's the same Turing from "Turing machines"...?

DR. Z: Right. Alan Turing was a brilliant British mathematician: he cracked German codes for the Allies in World War II, theorized about artificial intelligence, and pioneered the mathematical study of biological pattern formation. But his greatest contribution was the very simple concept of the Turing machine, which introduced to the world the notion of universal computation....

A Turing machine is a rather unimpressive-looking computer. It has no keyboard, no monitor, no disk drive, no lights or buttons or knobs. If it were put on display at a computer store, it would attract very little attention. It consists of a single processor box, attached to a very long tape which is marked off into little squares.

The Turing machine works like this. Each square of the tape contains either a 0 or a 1. The box is able to move the tape, read from the tape, and write on the tape; its "program" consists of a finite list of rules involving IF, AND, OR and NOT, for instance

IF this square contains a 1, AND the last square I read contains a 0 OR the square I read before that contains a 1, THEN move 3 squares to the left and mark a 0 there.

Turing proposed that this simple type of machine is capable of doing, in a word, everything. Any task, if it can be described with sufficient precision, can be programmed into a Turing machine. And any object, if one can describe its properties with sufficient precision, can be simulated on a Turing machine.

At first sight, this is an outlandish proposition. Look at all the complexity in the world -- cars, trees, calculus, molecules, clouds.... How could it all come down to nothing more than IF, AND, OR, NOT, 0, 1, move right and move left?

When Turing described his little machine to his friend Kleene, and suggested that it possessed universal computing ability, Kleene was more than a little bit skeptical. But then Kleene went home and tried a few mathematical examples. Every mathematical problem he came up with, could be formulated on the Turing machine. He was convinced!

Furthermore, Turing was able to show that some Turing machines are "universal," in the sense that they can simulate any other Turing machine. This gives rise to the wonderfully simple and powerful notion of universal computation: a computer is a universal computer if it can simulate every other computer.

Put this together with the Church-Turing thesis, which says that everything can be expressed computationally. One arrives that the following conclusion: a universal computer can simulate anything. And a universal computer is not a special kind of machine; every computer worthy of the name is a universal computer. The tape and box setup is just for simplicity. A PC fits the bill just as well, or a Mac, or a Commodore 64, or a Cray supercomputer ... you get the idea.

So, what does this abstract idea mean in practice? Let's consider an example. In theory, at least, a PC is capable of simulating the entire city of New York. Hook up wires to the sight, touch, smell, taste, touch and kinesthetic centers of the brain, hook up the other ends of the wires to the PC, load the New York Simulator program, and let 'er rip! The computer will generate a precise simulacrum of the experience of being in New York. Right? Well, sort of. What's the problem with this picture?

JASON [student in the audience]: Processing speed.

HELEN: And memory space.

DR. Z: Righty-o! To generate one second's worth of simulated New York might take a PC thousands of years. And, even if this difficulty were resolved, there would still be the problem of storage space. The data required for the New York Simulator would take up billions of floppy disks. In theory, the process of loading the disks in and out could be automated, but in actual practice it would take a whole city the size of New York just to warehouse them!

So: a PC can do anything whatsoever ... solve any problem, simulate any system or situation. After all, it's a universal computer! But the catch is, you have to supply it with a suitable number of floppy disks -- and then you have to wait. Everything the CRAY Y-MP supercomputer can do, the PC-XT can do too -- but in very many cases, the CRAY does it faster. Much faster. That's why we pay so much for better and better computers: not to get new computational capabilities, but to get new real-time capabilities.

These same conclusions apply to artificial intelligence. In theory, a PC is capable of simulating the human brain. The problem is that the program would take billions of floppy discs to store, and probably just as many years to run. Before the PC-brain finished one thought, the sun would go supernova and destroy it!

Think about it -- the brain contains somewhere around 10,000,000,000,000 neurons, each of which is a complex computational system -- plus glia and other cells. Unlike a typical PC or mainframe computer, which executes its internal operations one at a time, the brain does 1011 things at once. There are "parallel processing" computers which seek to mimic this property of neural computing. But even the fanciest parallel processing computers -- the Connection Machine, say -- contain only hundreds of thousands of processors. Mere chickenscratch, compared to these things we carry around in our skulls! If AI practice is to catch up with AI theory, we will need much more powerful machines.

But still, despite these limitations, the theory of universal computation is very profound and very interesting. And it's doubly interesting in light of the observation that the brain itself is a universal computer. Just as one can program a general-purpose computer to be a brain; one could, given appropriate technology, program a brain to be a general-purpose computer. This observation dates back to the 1940's, to the classic work of Walter McCullough and Warren Pitts.

In their landmark paper, McCullough and Pitts introduce a simplified mathematical model of brain function, and showe that their model was capable of serving as a univeral Turing machine. Their simulated brain is a collection of "formal neurons," each one receiving charge from certain neurons, and sending charge to certain other neurons. The network of neurons evolves through a series of discrete time steps: time step 1, time step 2, time step 3, etc. The details of their network are of more than just historical interest -- because they are not so different from the details underlying modern neural network computations. Each neuron in the McCullough-Pitts network worked according to the following simple rule:

If, at the last time step, I received at least T units of charge, then fire 1 unit to every neuron I'm connected to.

This "threshold law," drawn directly from the threshold behavior of real neurons, is the essence of neural network modeling ... every neural network model every proposed embodies some approximation to it.

Just as each connection between two real neurons has a certain conductance, each connection between two simulated neurons has a simulated "conductivity," usually called a weight. For instance, if the weight of a connection is 1, then when 1 unit of charge goes into the connection, 1 unit of charge comes out. If the weight of a connection is .8, then when one unit of charge goes into the connection, .8 units come out. Negative weights are also possible; they represent connections which are "inhibitory" rather than "excitatory." Suppose the connection from Neuron A to Neuron B has weight -2; then when Neuron A shoots out a single unit of charge, Neuron B's total accumulation of charge is subtracted by two.

For instance, suppose the threshold T is 14. Then, if a certain neuron receives 13 units of charge at time step 293, it will not fire at time step 294. But if one more neuron had fed it charge through a connection with weight 1, thus upping the total charge received to 14 units, then it would have fired. This sensitivity to small perturbations is what makes neural dynamics interesting.

The overall behavior of the neural network is given by the pattern of interconnections between the various neurons, and the values of the weights on the various connections. But, given the fussy nature of the threshold dynamics, there is no easy analytical way to determine the behavior that will result from a certain interconnection pattern and weight distribution. One simply has to set up the network, and let it run.

This may seem to be a rudimentary model, and indeed it is. But it was also a tremendous theoretical step. The historical import of the model lies not so much in what it included but in what it ignored: namely, chemistry and global brain structure.

Consider: one of the biggest steps in celestial physics was the decision to treat planets and stars as point masses: to ignore the obvious fact that planets have spatial extent, and to analyze them as if they were just dots of concentrated mass, taking up no volume at all. Philosophically, this makes no sense; but in proper mathematical context, it works wonders. The errors introduced by this approximation are small, and the simplifications are tremendous.

Similarly, McCullough and Pitts decided to ignore all the chemistry of the nervous system, and to simultaneously ignore the divison of the brain into different regions (right/left, cerebrum/cerebellum, etc.). At first sight this would seem an absurd approach. But in fact it is a simplification which has proved incredibly productive. The whole research programme of "neural network computation," which has become so popular over the past decade, owes its existence to McCullough and Pitts.

So, using their simplified neural network model, McCullough and Pitts proved that the brain is a universal computer in Turing's sense. And for the purposes of this result, the simplifications introduced in their model seem quite forgivable. After all, suppose one added more bells and whistles to the model: divided the network into different regions, replaced the simple neuron dynamics with something more realistic, etc. It seems plain that the computational ability of the network would not be decreased by this process; it would simply become more difficult to prove!

So, McCullough and Pitts showed that, given enough external memory (e.g. a huge pad of paper and a lot of pencils) and enough time (virtual immortality), a brain can do anything at all. But the catch, of course, is the same as with the PC simulation of the brain. A human brain has a life expectancy of 75 years, give or take a few. Just because the brain is theoretically capable of computing, say, the 2987654321'st prime number, this doesn't mean it is really capable of doing this before it dies. In computing as in music, timing is everything.

The crucial role of time limitations in human psychology is so obvious that it is hardly worth mentioning. We human beings need to deal with the physical environment as it changes -- to a large extent, we need to think on the world's schedule, not our own. For instance, we need to maintain a real-time visual model of the world surrounding us. To do this, we have to be able to recognize lines, curves, shadows and so forth -- not merely eventually, but before they shift too much. In general, the rates of our perceptual-motor and cognitive processes are determined by the rates of change of significant aspects of our environment.

In fact, William Calvin, in his book The Ascent of Mind, has pushed this point even further: he has argued that the need for accurate timing was actually crucial in the development of human intelligence. Here is his story. In order to be able to kill animals by throwing hand axes at them, early man needed the capacity to release the axe from his hand at the exact right moment. A little too soon, or a little too late, and the throw misses the target. It wasn't enough to have the theoretical ability to determine the optimal moment for release; the brain had to be able to do it on the fly, while the arm was moving, and the arm moved a little differently every time. Consciousness, being essentially iterative and serial in nature, provided precisely the kind of sequencing ability required for accurate throwing.

So, in the end, what's the take on universal computation? Though truly a crucial idea, it must be taken with a grain of salt. It is only the beginning of the story. To really understand what's going on in the mind, we have to pay attention to the efficiency of computation. Timing may not really be everything, but as the saying goes, it's way ahead of whatever's in second place.

DR. Z [after a pause for a drink of water]: Right. So, now let's think about brains versus computers. On the one hand we have maybe a trillion simultaneously pulsing neurons, each one a complex chemical system, arranged by function in clusters and columns. And on the other, we have registers based on binary arithmetic, and random access memory which supplies one datum at a time. Given these differences, and the tremendous psychological importance of timing, it may seem puzzling that the project of practical artificial intelligence programming ever got off the ground. Who would be crazy enough to spend his or her career trying to make today's computers act like brains, when the two types of system are totally dissimilar? From the point of view of real-time computational efficiency, doing AI research with today's computers is sort of like trying to teach dolphins to tap dance.

But on the other hand, we humans have made a habit of doing the apparently impossible. Machines that fly, test-tube babies, pictures that travel through the air -- why not computers that think? Rather than dwelling on the vast differences between brains and modern computers, early AI researchers adopted a "can-do" attitude. We know these computers are nothing like brains, they said -- but hey, you never can tell till ya try!

In fact, if you think about it the right way, the prospects of AI can come to seem downright positive. After all, it takes us 10,000,000,000,000 or more neurons to add up 1289792354 and 2349879234, and we still take a long time to do it, and we are prone to make mistakes. A computer or a pocket calculator can do it much faster and more reliably, with vastly fewer processors. Maybe the brain just isn't such an efficient machine. Computers can add and find logarithms more easily than us -- maybe, just maybe, they can do everything else we do more easily too!

This was the attitude of most of the early AI researchers. Some of their brags have achieved legendary status. Marvin Minsky, one of the founders of AI, wrote in 1967 that "in a generation ... few compartments of intellect will remain outside the machine's realm -- the problem of creating 'artificial intelligence' will be solved."

Yet more dramatically, an article in the 1963 Chicago Tribune described Frank Rosenblatt's Perceptron algorithm as follows: "The development of a machine that can listen to any conversation and type out the remarks just like an office secretary was announced yesterday by a Cornell University expert on learning machines. The device is expected to be in operation by fall."

Now, Minsky was a tremendous computer scientist; his ideas have sparked advances in software worth hundreds of millions of dollars. But today, in late 1993, the problem of artificial intelligence is about as well solved as the problem of world peace.

And the Perceptron was a true AI breakthrough: it was the first time a McCullough-and-Pitts style neural network was used to solve practical problems. But we're still waiting on the error-free secretary machine (though we do have robots capable of making coffee....).

These are not isolated examples; they're absolutely typical of published reports on AI research throughout the 1960's. They exemplify the attitude of the rank-and-file AI researchers of that era. But unfortunately, this optimism proved to be misplaced. Time after time, preliminary results failed to generalize, and researchers were stuck with amateurish programs, or programs successful only in limited domains.

What happened? The AI community, it appears, vastly overestimated the ease of automating conceptual generalization. They produced programs which were good at solving problem in very limited contexts -- e.g. good at playing chess, or good at recognizing letters, or good at doing calculus problems. In this way, they thought, they were constructing intelligent algorithms, which would then be able to turn their intelligence to other problems. Teaching a person chess or calculus improves their general powers of thought; why shouldn't the same be true of a computer?

But in fact these classic AI programs were idiot savants

of the most extreme possible form -- programs with more specialized savvy and more general idiocy than any human being. The programs worked because they embodied rules for dealing with specific situations ... but they never achieved the ability to come into a new situation cold and infer the appropriate rules. Rather than artificial intelligence, what occurred was a perverse form of artificial stupidity.

When enough of the grandiose prophecies proved wrong, the nature of AI research quickly changed for the worse. All but a few out-of-the-mainstream researchers abandoned the problem of designing thinking machines for related technical problems. What had once been a thrilling intellectual and engineering endeavor, was now just another branch of computer science.

Today, there's an excellent journal called Artificial Intelligence -- it's full of top quality research, but it contains precious few articles directly related to intelligence and its mechanization. Rather, it presents mostly technical results on decision theory, nonmonotonic logic, expert system design, and other formal spin-offs of the problem of mechanizing thought.

But why did it happen this way? Why were computers so much better at arithmetic than at reasoning? What makes conceptual generalization so hard? The AI community was at a loss for an explanation; and to a large degree, it still is. But at least one person had a reasonable hypothesis: a philosopher by the name of Hubert Dreyfus.

In his 1972 tract What Computers Can't Do, Dreyfus preached the importance of body-centered learning, and the close connection between logic, emotion and intuition. Without a body, he argued, without feelings, there can be no real generalization of special-case ideas. Based on these philosophical considerations, he predicted AI to be a failure. AI researchers laughed in his face, or worse.

Twenty years later, in 1992, Dreyfus rereleased the book with the title What Computers Still Can't Do. The new Introduction brims over with insolence. Everything Dreyfus said about the AI programs of the 60's and 70's turned out to be correct.

Dreyfus's critique of AI, in the first edition comes off a little too strong: he appears to believe that detailed simulation of the brain is the only possible path to AI. But his arguments pose a serious challenge to AI theorists: how to design a machine that incorporates body-based, emotion-based conceptual generalization? Dreyfus appears to be correct that if this is impossible, AI just ain't gonna work.

JASON: Can you give an exaple of this artificial idiot savant-hood?

DR. Z: Sure. A good example, I guess, is Rosenblatt's Perceptron algorithm; I think I mentioned it already. The Perceptron is particularly interesting because it was a precursor to the "neural network" designs that are so popular today.

Let's say you want a neural network that will recognize human faces. McCullough and Pitts showed it can be done, if you choose the right pattern of connectivity. But what is the right pattern? Frank Rosenblatt's idea was to look at hierarchical connectivity patterns. This was not only a clever computational trick, but also a small move toward biological realism: much of the brain appears to be organized in an hierarchical way, particularly those portions of the brain dealing with vision processing.

Rosenblatt set up a multilayered hierarchy of formal, artificial neurons. Each neuron was randomly connected to some neurons in the level below it, and some neurons in the level above it. Charge traveled up the network, from the bottom to the top. Learning was a matter of adjusting the weights of the connections: Rosenblatt randomly varied the n of connections until the network responded to each input with the "correct" output.

He managed to train the network to recognize letters with fair accuracy. The lowest layer of the network took in a "pixillated" version of a letter. The highest layer consisted of output codes: neuron 1 was supposed to fire for an "A"," neuron 2 for a "B", and so forth. By fiddling with the weights, Rosenblatt was able to get the network to recognize all the letters.

But this was only intended as a beginning. The idea was that, by making bigger networks with more connections, one could program a Perceptron to recognize not only typed letters but handwritten letters, or handwritten words, or maybe even to read whole sentences. Why not? Once the ability to learn was there, it just had to be scaled up to more and more complex tasks, right?

But alas, it didn't work out that way. The Perceptron was stuck at the level of fairly accurate recognition of typed characters. Marvin Minsky and many other AI researchers blamed this on its brain-simulating architecture. The way toward useful AI programs, they claimed, was not brain simulation but high-level reasoning, applied logic. Why not just skip all the murky chemistry and go straight toward the phenomena of interest: language, mathematics, science, conceptual problem-solving.

Marvin Minsky was something of a prodigy -- as an undergraduate at MIT, he was already conducting pathbreaking research in psychology, engineering and computer science. He experimented with neural network modeling as early as the 1950's -- using a physical simulated neural network made of vacuum tubes, and trying to coax it into some kind of interesting collective behavior. But the project had its difficulties, and Minsky had other fish to fry.... By the sixties, he was completely disenchanted with neural network algorithms. Together with his colleague Seymour Papert, he wrote a book entitled Perceptrons, intended to debunk Rosenblatt's approach once and for all.

The book was full of mathematical results regarding the limitations of two-layer Perceptrons -- Perceptrons with no "hidden layers" intervening between the inputs and the outputs. Such Perceptrons, they showed, could never learn basic operations of logic, and could never perform simple perceptual tasks such as telling a connected figure from a disconnected one....

Of course, limitations on two-layer Perceptrons need not apply to three-, four-, or ten-thousand-layer Perceptrons. But Minsky and Papert's book effectively killed the neural network research programme. Instead of emphasizing the immense untapped potential of multilayer Perceptrons and more complex neural networks, they focused the attention of the AI community on the shortcomings of the simplest possible neural networks.

A few mavericks could still be seen waving the banner of Rosenblatt, McCullough and Pitts, and pushing on toward greater computational efficiency and biological realism. For instance, Leon Cooper, a Nobel Prize-winning physicist, constructed a neural network model of a kitten's visual cortex, seeking to mimic the process by which an infant cat's brain learns to see. And the Finnish enginner Teuvo Kohonen developed a powerful new associative memory architecture -- a novel brain-inspired memory design in which "similar" memories are stored near each other. But, in their ideosyncratic isolation, these researchers were the "exceptions that proved the rule." It would be twenty years before AI journals and funding sources once again came to embrace the Perceptron programme.

And while neural network research was pulling its worldwide vanishing act, Minsky and other mainstream AI types were pushing on with a different point of view: rule-based AI. In rule-based AI programs, reasoning was done not by self-organizing networks of autonomous elements, but rather by systems of simple logical rules. Intelligence was reduced to following orders.

Minsky developed programs for solving calculus problems and "understanding" the English language; and he developed a theory of "frames" for representing complex real-world knowledge. And even more ambitiously, Alan Newell and Herbert Simon developed a program called GPS, which was supposed to imitate the way humans solve logic puzzles. The title of their paper is one of the biggest and emptiest brags of all time: "General Problem Solver: A Program that Simulates Human Thought."

Needless to say, GPS was a bust. It could solve simple problems like the Tower of Hanoi puzzle, and "cryptarithmetic" puzzles like

DONALD + GERALD = ROBERT

But in the overall scheme of intelligence, solving problems such as these is not all that different from computing logarithms or solving differential equations. A simple mathematical strategy suffices; no real learning need occur.

Similarly unimpressive was Simon's "computer-scientist" program, BACON. This program was inspired by Sir Francis Bacon, who viewed science as a matter of recognizing patterns in tables of numerical data. But Bacon never adequately realized the psychological ramifications of the task of pattern recognition; and BACON fell into the same trap, albeit more embarrassingly.

The "ideal gas law" from thermodynamics states that

pV/nT = 8.32

where p is the pressure of the gas, V is the volume of the gas, T is the temperature in degrees Kelvin, and n is the quantity of the gas in moles. In practice, this relation cannot be expected to hold exactly, but for most real gasses it is a very good approximation.

Given an appropriate table of numbers, BACON was able to induce this law, using rules such as:

If two columns of data increase together, or decrease together, then consider their quotient.

If one column of data increases, while another decreases, then consider their product.

Given a column of data, check if it has a constant value

As pressure goes up, volume goes down, so BACON forms the product pV. Next, as the combined quantity pV goes up, so does the temperature -- thus BACON constructs the quotient pV/T. And as pV/T goes up, so does the number of moles -- hence the quotient (pV/T)/n = pV/nT is constructed. This quotient has a constant value of 8.32 -- so the ideal gas law is "discovered."

Very interesting, indeed. But how terribly far this is from what real scientists do! Most of the work of science is in determining what kind of data to collect, and figuring out creative experiments to obtain the data. Once a reliable set of data is there, finding the patterns is usually the easiest part. Often the pattern is guessed on the basis of terribly incomplete data -- and this intuitive guess is then used to guide the search for more complete data. But BACON is absolutely incapable of making an intuitive guess.

Simon has claimed that a four-to-five hour run of BACON corresponds to "not more than one human scientific lifetime." Douglas Hofstadter, in his book Metamagical Themas, has sarcastically expressed his agreement with this: one run of BACON, he suggests, corresponds to about one second of a human scientist's life work. We suggest that Hofstadter's estimate, though perhaps a little skimpy, is much closer to the mark. Only a very small percentage of scientific work is composed of BACON-style data crunching.

One should not deny rule-based AI its successes. But the point is that, without exception, these successes have resulted from specialized tricks, rather than flexible intelligence. For instance, rule-based computer chess programs have achieved an incredibly high level of play -- they can beat all but the grandest grandmasters. However, the methodology of these computer chess whizzes really cannot be called intelligent. It is based on searching out millions and millions of possibilities, guided by certain general heuristics. Humans play chess by intuition, by recognizing general patterns in positions ... not by running through a list of every plausible future outcome.

Similarly, rule-based AI programs have excelled in medical diagnosis, and in financial prediction. But that is because each of these fields consists of explicit rules that can be found in books. A rule-based AI program will never recognize a new disease ... it only looks for diseases using "IF-THEN" rules fed into it by a programmer. A rule-based finance program will never discover a new trading strategy, it will only follow the rules programmed into it -- quickly, infallibly and precisely.

The main problem with GPS, BACON, and all the other rule-based AI programs can be summed up in one word: brittleness. Or, to put it another way, remarkable literal-mindedness. Just like WordPerfect, DOS 6.0, or my fifteen dollar K-Mart calculator, they do what they're told, and not one whit more. If they're programmed to deal with one context, then that's what they'll deal with; not in a million years will they generalize their knowledge.

For instance, there was a celebrated knowledge representation program that contained logical definitions of everyday words. An "arch" was defined as "Three blocks, A, B and C, so that C is supported by A and B, and A and B do not touch." This is all very well for playing with blocks -- but what will the program do when it gets to Arches National Park in Utah?

On the other hand, show a three-year old child an arch made of blocks, and she'll immediately recognize a rock arch as a member of the "arch" category. It won't occur to her that a rock arch can't be naturally decomposed into three blocks A, B and C. Children, unlike expensive research computers, are anything but brittle -- even their bones are flexible!

JASON: What about Doug Lenat and his CYC project? Wasn't that supposed to encode so much knowledge that the brittleness problem would be overcome?

DR. Z [laughing]: Yes, it was supposed to. What happened?

Lenat has a theory of general heuristics -- problem-solving rules that are abstract enough to apply to any context whatsoever. I haven't worked with CYC because it's new, but I know his programs AM and EURISKO, which applied these general heuristics to mathematics and science respectively. Both of these programs were moderately successful, it's true -- but far from intelligent!

For example, EURISKO won a naval fleet design contest two years in a row, until the rules were changed to prohibit computer programs from entering. And it also received a patent for designing a three-dimensional semiconductor junction.

But still, when looked at carefully, even EURISKO's triumphs appear simplistic and mechanical. Consider EURISKO's most impressive achievement, the 3-D semiconductor junction. The novelty here is that the two logic functions

"Not both A and B"

and

"A or B"

are both done by the same junction, the same device. One could build a 3-D computer by appropriately arranging a bunch of these junctions in a cube.

But how did EURISKO make this invention? The crucial step was to apply the following general-purpose heuristic: "When you have a structure which depends on two different things, X and Y, try making X and Y the same thing." The discovery, albeit an interesting one, came right out of the heuristic. This is a far cry from the systematic intuition of a talented human inventor, which synthesizes dozens of different heuristics in a complex, situation-appropriate way.

Think about the Croatian inventor Nikola Tesla, probably the greatest inventor in recent history, who developed a collection of highly ideosyncratic thought processes for analyzing electricity. These led him to an steady stream of brilliant inventions, from alternating current to radio to robotic control. But not one of his inventions can be traced to a single "rule" or "heuristic." Each stemmed from far more subtle intuitive processes, such as the visualization of magnetic field lines, and the physical metaphor of electricity as a fluid. And each involved the simultaneous conception of many interdependent components.

EURISKO may have good general-purpose heuristics, but what it lacks is the ability to create its own specific-context heuristics based on everyday life experience. And this is precisely because it has no everyday life experience: no experience of human life, and no autonomously-discovered, body-centered digital life either. It has no experience with fluids, so it will never decide that electricity is like a fluid. It has never played with Lincoln Logs or repaired a bicycle or prepared an elaborate meal, nor has it experienced anything analogous in its digital realm ... so it has no experience with building complex structures out of multiple interlocking parts, and it will never understand what is involved in this.

EURISKO pushes the envelope of rule-based AI; it is just about as flexible as a rule-based program can ever get. But it is not flexible enough. In order to get programs capable of context-dependent learning, it seems to be necessary to write programs which self-organize -- if not exactly as the brain does, then at least as drastically as the brain does.

JASON: Right. But CYC is a huge database of everyday information. It's supposed to fill in the information needed to make real creative analogies....

DR. Z: Sure it is. The problem is, it fails to do that. You can't encode our intuitions about ourselves and the world in a database of information small enough to fit on a CD-ROM. No way! Anyone who thinks they can do that really is crazy.

Anyway, let me finish the story.... So rule-based AI led to a number of exciting and useful programs. But the bottom line is, it was one of very few outright failures to mar the brief history of computer science. And in the wake of this failure, in the 1980's and early 90's, AI researchers have returned in droves to neural network research a la Rosenblatt.

The motivations for this return are not hard to see. First of all, rule-based AI didn't work, and plausible paradigms for AI research aren't exactly a dime a dozen. Secondly, recent advances in neuroscience make the view of the brain as a complex self-organizing system a lot more palatable. And thirdly, computer hardware has now developed to the stage where it is easy to simulate large neural networks.

If this neural network renaissance can be traced back to any one place, it is to the work of CalTech physicist John Hopfield. In 1982, Hopfield published a brief paper in the Proceedings of the National Academy of Sciences entitled "Neural Networks and Systems with Emergent Collective Computational Abilities." This paper, and several related articles appearing between 1980 and 1985, awakened a new generation of scientists to the possibility of doing computation by simulating the brain.

Unlike Rosenblatt's Perceptron, Hopfield's neural networks were not hierarchical in structure. In a conscious and serious departure from biological realism, they were fully connected -- every neuron was connected to every other neuron. But the most crucial feature of the Perceptron was carried over: the information stored in the network resided in the pattern of connection strengths.

Hopfield applied his fully-connected networks to a variety of problems: from solving combinatorial puzzles like the "Traveling Salesman Problem," to designing VLSI circuits, to simulating the properties of human long-term memory....

Say you're giving your sister Sue directions to get to your house. You tell her to make a right just after the sign saying "Family Mortgage Company." A thunderstorm breaks out, and the "t" on the sign is damaged, so by the time she gets there the sign reads instead "Family Morigage Company." What will Sue do?

If sister Sue has any sense, she'll realize what the sign was supposed to say, and make a right. In fact, she may not even notice the sign has been damaged. We've all had the experience of reading a sign at a distance, then moving closer to find out the words are totally different than what we originally thought. "Sick mind Lover" becomes "Steak and Lobster" ... what at first looked like "Main Street" soon becomes "Moon Street." We fill in the gaps, associating the input provided to our senses with the closest reasonable match provided by our memory.

This is the human memory: very flexible, very resilient, able to deal perfectly well with imperfect cues. But typical computer memories couldn't be more different. Anyone who has used MS-DOS (the standard operating system on modern personal computers) is well aware of this. "DiskCopy" tells the computer to copy the contents of one floppy disk to another one. But tell the computer "DinkCopy" and it will respond with an uncomprehending remark like "Bad Command or File Name." More annoyingly, "C:\UTIL" refers to the "UTIL" directory on disk drive C. But if you type in "C:/UTIL" -- with the slash the wrong way -- guess what the computer says? That's right -- the dreaded "Bad Command or File Name". You'd think a machine capable of finding logarithms to nine thousand places, running complex AI programs and sorting hundreds of documents would be able to overlook a mis-angled slash -- but no.

It's true, one could program a PC to respond more intelligently to mis-angled slashes ... to respond to "C:/UTIL" with the question "Did you really mean C:\UTIL?" But the point is that human memory does this sort of thing automatically, without thinking twice or even thinking at all. Human memory is simply structured differently. The Random Access Memory model of a standard computer bears very little resemblance to the memory centers of the human brain. Hopfield's idea was to build a computer memory that mimicked the essential properties of human memory: one that stored memories holistically, one that could deal with errors in its input, one that could respond to an unfamiliar cue by remembering something related to the cue.

Hopfield was not concerned with mimicking the details of the brain's neural wiring ... only with using neuron-like elements to simulate some of the properties of the brain. And, in line with this philosophy, the model which he came up with embodies serious departures from biological realism. First of all, each neuron is connected, at least potentially, to every other neuron. Next, the amount of "charge" in a neuron at any given time is restricted to be either plus one or minus one. The neurons have no memory; the iteration of the network takes place on a totally moment-by-moment basis. And, finally, the connections are symmetric: the weight of the connection from neuron 1 to neuron 2 is the same as the weight of the connection from neuron 2 to neuron 1. The brain just don't do that!

But despite its rudimentary nature, the Hopfield net has played -- and still does play -- an extremely important role in the history of neural network theory. Although some really creative researchers were studying neural networks throughout the 60's and 70's, a surprising number of scientists and engineers receieved their introduction to the field through Hopfield's articles.... And many important concepts in the theory of neural networks have received their earliest formulations in the context of Hopfield nets.

HERBERT [student in the audience]: But how does it store the memories? I don't understand....

DR. Z: Well, I can't explain that, and stay within the time limitations.... If you want to let me go overtime....

Okay. Let's suppose you wanted a Hopfield net to remember which type of beer was preferred by which one of your friends. Formally speaking, this means that what you want the net to remember is a collection of pairs:

Mike Bass Ale

Harold Budweiser

Gwen Milwaukee's Best

Richard Colt 45

Now, a Hopfield net knows nothing about words; it just consists of a network of interconnected neurons, each one with a charge of either +1 or -1. To get these pairs into a Hopfield net, you have to represent each one as a sequence of +1's and -1's.

To go from words to ones and minus ones, you need some sort of "translation dictionary" assigning each word a sequence of +1's and -1's. For simplicity, it is easiest to denote +1 simply by +, and -1 by -. Then, for example, one might have the following dictionary [writes on blackboard]:

Mike Bass Ale

+-+++-+++-+++-++ ++-+++-+++-+++-+

Harold Budweiser

---------------- -+---+---+---+--

Gwen Milwaukee's Best

++--++--++--++-- --++--++--++--++

Richard Colt 45

-+-+-+-+-+-+-+-+ +--++--++--++--+

In practice one generates dictionaries systematically, but for illustrative purposes anything will do....

The association of Mike with Bass Ale is then represented by the sequence formed by running the Mike sequence and the Bass Ale sequence together:

SEQUENCE 1: +-+++-+++-+++-++++-+++-+++-+++-+

The association of Harold with Budweiser is represented by running the Harold sequence and the Budweiser sequence together:

SEQUENCE 2: -----------------+---+---+---+--

The association of Gwen with Milwaukee's Best by

SEQUENCE 3: ++--++--++--++----++--++--++--++ ;

the association of Richard with Colt 45 by

SEQUENCE 4: -+-+-+-+-+-+-+-++--++--++--++--+

How are these sequences encoded in a neural network? One makes twenty-four neurons, one for each position in the sequences, and then one forms the weights of the connections between the neurons from the sequences according to a special rule. For instance, to find the connection between neuron 1 and neuron 7, one looks at the first and seventh positions of the sequences.

STUDENT: How do you get the weights?

DR. Z: It's sort of complicated. I'll explain it after the lecture if you want to know. Okay? It's just a mechanical procedure.

The most important thing to know about the weight-building process is that every sequence plays a role in the construction of every neural weight. This makes the Hopfield network incredibly holistic -- much more holistic than the brain. If you change just one sequence, then you change every single weight in the network ... and thus, potentially, the whole dynamics of the system.

So then, after you've built it, how does the memory work? This is the easy part! Just start the network by giving charges to some of the neurons -- and let 'er roll. Charge pulses from one neuron to another, to another, to another, through excitatory and inhibitory connections, until eventually, hopefully, the process comes to a stop. It comes to a halt and the result is a fixed point of the neural dynamics, an equilibrium, a steady state. In this fixed state, each neuron has charge +1 or -1, so one may read off the state of the whole network as a sequence such as ---++--+.

And what are these sequences corresponding to equilibria of the network dynamics? Why, lo and behold, if the network is working right, they're nothing but the sequences you encoded in the first place! In our example, the steady states of the dynamics are precisely sequences 1, 2, 3 and 4, corresponding to our four pairs of associated words. This is the central theorem of Hopfield net theory ... that if you set up the weights according to the proper formula, then under appropriate conditions the equilibria of the network are the encoded memories. The proof of this is a nice bit of calculus, not sophisticated but certainly clever.

What good is this arrangement? Well, suppose you start the network, not with a random assignment of charges to the neurons, but with charges based on part of a stored memory. Then, if things work out right, the network will converge to the whole stored memory. For instance, the sequence for Mike is

+-+++-+++-+++-++

... suppose one starts the network with the sequence

+-+++-+++-+++-++----------------

Then the idea is that the network should converge to the equilibrium

+-+++-+++-+++-++++-+++-+++-+++-+

corresponding to the pair

Mike Bass Ale

The initial sequence +-+++-+++-+++-++---------------- corresponds the "word pair"

Mike ???

which is not an attractor. In other words, ------------ has no interpretation and has not been programmed into neurons 13-24 of the network, which are encoded by the second half of a memory sequence.

But suppose one started the network with an incorrect pair like

Mike Budweiser

corresponding to the sequence

+-+++-+++-+++-++-+---+---+---+--

Then it's a toss-up. The network could converge to

Mike Bass Ale

or it could converge to

Harold Budweiser

Only a careful analysis of the dynamics would allow you to predict which.

The beauty of this approach to memory is that error tolerance is seen to come from the same root as association learning. If one starts the network with

+-+++-+++-+++-++++-+++-+++-++---

which differs only a few places near the end from the sequence for

Mike Bass Ale

then the network dynamics will immediately converge to

+-+++-+++-+++-++++-+++-+++-+++-+

which corresponds exactly to

Mike Bass Ale

This is precisely what MS-DOS doesn't do when it can't understand that the input "DinkCopy" is supposed to mean "DiskCopy." But this ability to deal with imperfect cues is just an outgrowth of the network's ability to learn pairs of associated stimuli. Learning and flexibility are seen to be part of the same package. This property reflects biological reality extremely well, even though the details of the Hopfield net dynamics do not.

One way to think about the behavior of Hopfield nets is to introduce the idea of energy. The network, if it were built rather than computer simulated, would have a physical energy ... but to compute this, one would have to do some real physics. So instead, a quantity called the computational energy is defined -- and it is proved that, as the neural network evolves, the computational energy decreases. Finally, when the system reaches an equilibrium point (corresponding, hopefully, to a memory), the energy reaches a "local minimum" -- a state with the property that any change in the charge of any one neuron would increase the energy.

Each memory, in this view, may be associated with a certain basin in energy space.... The process of learning is a process of winding down from the initial state to the bottom of the basin that the initial state is located in. There may be a problem here, with situations where there are small, shallow basins all over the place. But this can be avoided by introducing an element of controlled randomness into the dynamics. Just give the charge coming into each neuron a "random nudge" every now and then, and gradually decrease the amount of randomness as time goes on. This strategy is called "simulated annealing," because of its similarity to the process by which a hot metal anneals toward thermal equilibrium (or, in other worlds, cools down).

In practice, the Hopfield associative memory seems to work pretty well -- so long as the number of memories is not too large. If the number of memories is less than fifteen percent of the number of neurons in the network, then the process works pretty much as planned. But if one tries to overload the memory, then things get ugly. The network starts to invent fake attractors like

Gwen Colt 45

and it generates an abundance of attractors which are just plain nonsense. Our example had 4 memories and 32 neurons, which is 12.5% ... safely below 14! But one more memory pattern might just blow it....

Anyway, that's the take on Hopfield networks....

STUDENT: Thanks....

DR. Z: Now, I don't have much time left. Let me wind things up.

The Hopfield net was tremendously different from the Perceptron -- and indeed, this may have been part of the secret of its success. But once Hopfield set off the neural network renaissance, it didn't take long for Perceptrons to resume their place in the public eye.

Yes, Nolan Ryan pitched a no-hitter at age 44 ... and, after a twenty year slump, Perceptron type networks are back and looking better than ever! As it turns out, Rosenblatt's main problem (besides a penchant for making excessive claims) was an insufficiently sophisticated approach to neural network training. If one experiments with the neural connections in a more structured way, then one can tease out some fairly impressive learning behavior. In particular, two new strategies for Perceptron training have come to prominence over the past few years: backpropagation and genetic programming. Genetic programming is an innovation of truly historical proportions; it deserves a whole talk to itself.... Back-propagation, however, is not nearly so radical -- it is something that Rosenblatt himself could have conceived, had he been so inclined. It represents a joining of neural networks with the branch of mathematics called optimization theory.

Many branches of mathematics and applied science involve optimization problems ... problems of the form "Find the minimum or maximum value of this quantity." Intuitively, this sort of problem may be visualized in terms of a landscape, much like the "energy landscape" of the Hopfield network. If one is trying to minimize something, then one is looking for the lowest valley in the landscape. But the problem is that, when one is down at the bottom of a certain valley, one has no way of knowing what lies over the next hill. It's very easy to get stuck on "local minima" -- valley which look lower than anything else around, but are actually much higher than the overall, "global" minimum.

The oldest and most straightforward technique for solving optimization problems is the method of steepest descent. To illustrate this algorithm, imagine you're stuck on the side of a mountain in a heavy fog. You can't see more than three feet in front of you, but you desperately need to get back to the lodge, which is at the bottom of the valley. You have absolutely no idea what direction the lodge is in, nor does the terrain give any clues.

What will you do? If you have any sense, you'll pick the direction in which the descent of the mountainside seems to be steepest, and walk that way a while. Then maybe you'll stop again to get your bearings, and do the same thing: pick the direction of steepest descent, and keep on going a while. Et cetera. This is exactly the method of steepest descent.

If there are no local minima -- no valleys other than the one with the lodge in it -- then the method of steepest descent works just fine. Otherwise it needs to be replaced with something trickier. In the case of being lost in the fog, the best long-term strategy is "If you at first you don't succeed, try, try again." Keep going down, then if you get to a false valley, go up again till you reach the top of the hill, and start going down again.... This is a viable mathematical technique which was explored by the mathematician Gabor Szego in the 1970's ... it is reliable but has the disadvantage of being very slow. Another possible strategy is to randomize your search. At first, wander around largely at random, with a general tendency toward lower directions ... then, the longer you walk, decrease the amount of randomness and increase the tendency toward lower directions. This might seem a psychotic way to do things, but it makes sense if you have reason to believe you might be stuck in a local valley. If you're know you're stuck in a local valley then you know you don't want to go down ... but you just suspect you're stuck in a local valley, then you have no way to tell whether going up or going down is a better strategy. Random motion is a sort of compromise behavior. At any rate, this is the algorithm called simulated annealing; it is a fairly good model of the cooling-down of metal from a very high temperature. A metal seeks the minimum-energy configuration, and temperature-induced randomness keeps it from settling into a false minimum of energy, keeps it moving along toward the real thing.

The core idea of the backpropagation algorithm is to use the steepest descent method to train a neural network -- to determine the network's weights. This might seem to be a silly idea ... after all, we have just said that steepest descent is generally a lousy optimization strategy. But, as the saying goes, nothing succeeds like success. Determining the weights of a neural network based on steepest descent seems to work much better than Rosenblatt-style random adjustment, especially for multilayer networks.

Backpropagation is a completely unbiological method. It involves a very formal and rigid exploitation of the Perceptron's hierarchical structure: it determines the weights of level 1 based on the weights of level 2, the weights of level 2 based on the weights of level 3, and so forth.... This is the "backwards propagation" that gives the algorithm its name, and it is a dynamic that has nothing whatsoever to do with the modification of synaptic connections in the brain.

But be this as it may, the algorithm is easy enough to program into a computer simulation. And, these days, that's exactly what it takes to make an idea catch on. While backpropagation is nowhere near as commonplace as fractal graphics, it did make it on the Tonight Show! Johnny Carson interviewed psychologist Terry Sejnowski and his program NetTalk, a neural network which had learned to speak comprehensible English words using the backpropagation algorithm.

Marvin Minsky, arch-enemy of neural network research, is on record saying that he saw NetTalk on TV, and couldn't understand it very well. But this is quintessential sour grapes ... I could understand NetTalk better than some of the math teachers I had in graduate school! The program was a smashing success, exactly what Rosenblatt desperately needed in 1968. If NetTalk had existed back then, it might well have defused Minsky's mathematical and rhetorical attacks, and kept the Perceptron programme alive.

Let me give another example. OCR -- optical character recognition -- was a particular interest of Frank Rosenblatt's. And indeed, it seems to be tailor-made for backpropagation. The input is just pixillated letters, and the output is one of 26 responses -- a,b,c,d,.... The technical obstacles are many fewer than those involved in a project like NetTalk.

In the context of OCR, the limitations of single-layer perceptrons are very easy to see. OCR researchers conceptualize their algorithms in terms of letter space -- an abstract multidimensional space in which each letter is represented as a single point. A single layer perceptron divides letter space linearly, which is not sufficient ... the relations between different letters are too complex for that. A multilayer perceptron, on the other hand, divides letter space according to piecewise linear boundaries, boundaries made of a lot of little lines. This means that, to within any degree of approximation, multilayer perceptrons can recognize any possible boundary between letters in letter space. One only needs a sufficient number of neurons, arrayed in a sufficient number of levels. The problem is, how big this "sufficient number" really is.

Rule-based OCR systems tend to work a little differently: instead of dividing up letter space with line segments, they divide it up with circles. Typically, a rule-based AI system will begin with a collection of "prototype" letters -- a few prototype "a"'s, a few prototype "b"'s, and so forth. Then, given an unidentified input letter, it will identify which of the prototypes give the best match to the input. The collection of all inputs which match "a" to a given degree is represented as a circle in letter space, with its center at "a."

Of course, by intersecting circles in complex ways, one can approximate any shape one wishes. Really there is no reason to prefer line segments over circles, or vice versa. The trick is in getting the right collection of circles, or lines -- dividing up letter space in the appropriate way. In rule-based AI the programmer typically has to determine this division, and explicitly encode it in the OCR program. Neural network OCR is based on the premise that the program can discover the division for itself.

As it turns out, in practice, a standard-issue hierarchical neural network just won't do a good job of OCR, not even with typewritten characters. One needs to modularize -- to construct a network divided into subnetworks, one for each type of letter. Then these subnetworks can be constructed by backpropagation, and their results can be combined by other subnetworks, also constructed by backpropagation. This is cheating, in a way -- because it is the programmer who determines how to break up the network into subnetworks. But the subnetworks themselves are still doing a lot of learning.

Using this idea, a few years ago, the engineer Sabourin and his colleagues trained a backpropagation network to recognize typewritten characters with around 97% accuracy -- just barely better than current programs written with rule-based techniques. Sabourin's network consists of 20 subnetworks, each one specialized to deal with some easily recognized feature. For instance, the subnetwork marked "Descender, Genus 1" contains 161 neurons, 6106 weights, and 97 different threshold values; it recognizes the letters and letter combinations g, p, q, ng, rg and yp, with about 99.6% accuracy. This is somewhat more accurate than the network as a whole, reflecting the fact that these letters and combinations are relatively easy to recognize.

One can train a single network to recognize a given feature -- say a descender, or an ascender coupled with a great deal of whitespace, or a collection of letters with little whitespace and no ascenders or descenders. But it is hard to train a single network to do several different things -- say, to recognize letters with ascenders only, letters with descenders only, letters with both ascenders and descenders, and letters with neither. Thus, instead of one large network, it pays to break things up into a collection of smaller network in an hierarchical architecture.

And all this is just for typewritten characters. What about handwriting? No backpropagation network has yet been trained to read handwriting as well as a person. But some reasonable efforts have been made....

In truth, it seems clear that backpropagation networks are much cleverer than the brain -- on a per neuron basis. The brain, as reviewed above, seems to use a very crude method for synaptic modification. Perhaps this is why it needs so many millions of neurons to do what backpropagation does with dozens or hundreds or thousands. The trick of human intelligence lies in the overall organization of neurons and neuronal clusters -- not in the method for updating individual neural connections.... But, clever as it is, there are ways of making backpropagation even cleverer. One of the most intriguing of these is to use chaos to make the learning go faster. Some of you will have learned about chaos in differential equations class, no?

Okay, three or four of you, anyway....

Chaos is random behavior that comes out of apparently deterministic systems. Formal neural network models display mathematical chaos, as do the most accurate mathematical models of brain function. Furthermore, EEG analysis reveals rampant chaotic dynamics in the human brain. But the question is, what purpose does this chaos have?

Backpropagation networks, despite their limited biological realism, provide an ideal context for exploring this sort of question. A computer scientist named Paul Verschure has shown that chaos can play a crucial role in guiding the learning process of a backpropagation network. And as it turns out, the general ideas of Verschure's demonstration are easily transferable to the context of real human learning. This is a perfect example of neural networks as an exploratory research tool.

The backpropagation algorithm contains a parameter called the learning rate, which controls the speed with which the network weights adjust themselves. If the learning rate is too low, then the system will take a very long time to arrive at a useful weight configuration ... and once it gets there, due to its "stupidity," it may just wander off again. But on the other hand, if the learning rate is too high, then the system is bound to be "jumpy"; it may well overshoot the correct weight configuration, in its eagerness to find something better.

In fact, Verschure has shown that when the learning rate is too high, a backpropagation network exhibits chaos. For instance, suppose one has a network with two input neurons, "hidden" neurons, and one output neurons. And suppose one wants to train the network to compute the "exclusive or" function, meaning that the output neuron should fire when and only when just one of the input neurons receives input.

When the learning rate is low, between 1 and 1.25, the network converges to a stable solution -- the learning process works. But once the learning rate surpasses 1.25, chaos sets in. Then, between 1.6 and 2.3, some orderly behavior is observable ... the network may vary between one possibility and another, but does not always jump around chaotically. Finally, after 2.3, chaos sets in again, and stays. The diagnosis of chaos here is not merely qualitative -- it can be verified by computation of the Liapunov exponent.

These specific numbers, obviously, are dependent on the architecture of the neural network in question. But the same general phenomenon is observable in all backpropagation networks. So what's the moral? Just avoid making the learning rate too high, right? Well, no ... this sort of conservatism just produces overly slow convergence. The trick, according to Verschure, is to gradually increase learning rate until one reaches the chaotic region. Then, once one has induced chaotic behavior, one lowers the learning rate again. And then one repeats the whole cycle: order, chaos, order, chaos, order, chaos,.... Empirically, in many situations (such as the exclusive-or function), this results in more effective learning than one would obtain from any fixed value of the learning rate.

The vision underlying this "chaos-based learning" algorithm is a beautiful one: chaos as a tool by which complex systems explore their own state space. By inducing chaotic fluctuations in itself, a complex system can "jounce" itself out of an ineffective configuration, thus permitting itself to self-organize into a more effective state.

This is a variation on the old idea that creativity is "rules plus randomness" -- an idea first developed forty years ago in Ross Ashby's groundbreaking book Design for a Brain.... But the difference is that here the randomness, or rather pseudo-randomness, comes out of the rules. Therefore the randomness is in principle controllable: it comes into play when it is needed, when things aren't working well enough. And then, when things are running smoothly, it hides away.

The proof that chaos-based learning is effective lies in Verschure's computer simulations, which use the backpropagation rule. But the idea is obviously much more general: it applies to brains, formal neural networks of all stripes, and in general any complex problem-solving system which has the ability to adjust its own parameters. If chaos-based learning turns out to be a general phenomenon, this will be a very large feather in the cap of the neural network research programme.

NAT [waking up from a brief sleep]: Hey, Dr. Z. I think some of these students have another class in five minutes.

DR. Z: Okay. I'll sum things up.

So these algorithms -- Hopfield nets, backpropagation, chaos-based learning -- are scientifically revolutionary. But one thing is certain: they are not miniature brains. Neural networks display an impressive ability to induce, to pick up on the nuances of particular situations. But this ability is coupled with some very severe limitations.

Like rule-based AI programs, neural network programs are incapable of dealing with the real world, on a second-by-second basis. They have to be spoon-fed their input. Thus, in a very concrete sense, they are tested on toy problems. These problems may have more real-world applicability than the "microworlds" of rule-based AI, but they are contrived nonetheless.

It is assumed that whatever works on these contrived problems will somehow "scale up" to deal with the real problems confronted by actual intelligent systems. But this assumption is problematic: it hides a very sharp distinction. On the one hand, when programming a connectionist application, one must figure out how to best represent the data in order to present it to the program. For instance, in the "beer" example of the Hopfield network given above, we decided how to encode words into plusses and minuses. But on the other hand, truly intelligent systems possess the ability to choose appropriate representations for themselves.

Obviously, one could automate the process of assigning plesses and minuses to words. But then what would the network do when you asked it to remember something other than words ... say, faces. It would have to come up with some other coding scheme. This is exactly what our brains do automatically! It is, one suspects, a long way from generalization ability in simple, appropriately represented domains to generalization ability across a variety of complex, un-preprocessed domains. Until we have an algorithm that figures out on its own how it should represent the data presented by a novel situation, we will be nowhere near capturing the fluidity of human intelligence. It may well be that a sufficiently large neural network would be able to determine its own representations in a flexible way ... to perform induction on representation space. But the backpropagation experiments reviewed above suggest that big is not enough -- if one has a difficult induction problem, one needs neural networks with an appropriate global structure.

So, what kind of global structure is appropriate for neural network induction of representations? The very language of neural network theory makes this sort of question difficult to discuss. But I want to end this pessimistic lecture on an optimistic note. My personal opinion is that, not only is AI possible, it is possible today, or in the near future, the next ten or twenty years. But we will need to go beyond, not only the old logic-based approaches, but also the neural network approaches. We will need totally new ideas, ideas that confront the full complexity of the brain and mind. This is the area of my current research....

DR. VERNOR: I think we'll have to skip the question session since the talk has run so far over. Let's thank Dr. Z for a most stimulating seminar. The next seminar is in two weeks -- Dr. While on VLSI design.

NAT [approaching Dr. Z after the others have left]: Hi, Dr. Z.

DR. Z: Hi. So what have you been up to? Still contract programming. I see you still can't stay awake through one of my lectures.

NAT: Uh, sorry about that.... Yup, I'm still programming. But I'm very curious about this current research of yours. Have you really made any progress towards understanding how to build thinking machines?

DR. Z: Actually, I think I have. But it's not a simple story.... I've fallen head over heels into this complexity science stuff. Once you accept that the mind is a very complex system, you start to focus on its emergent properties, and its commonalities with other complex systems -- things start to make a little more sense, you know....

NAT: Hmmmm....

DR. Z: I'd love to talk about it sometime. But I'm never here anymore. I can't stand to be around this university, actually. You're not a student any more, so I can tell you that. It has a kind of deadening effect on the brain....

NAT [laughing]: Is that a fact?

DR. Z: Of course not, don't be silly. But seriously, if you'd like to come by my house some day, I'd love to chat about this stuff with you. Any day you like, except Tuesdays and Thursdays, I have to come in here and teach.... And bring that beautiful young wife of yours....

NAT: You mean Janine? We're divorced....

DR. Z: Oh. Well, don't bring her then. I don't believe I ever spoke to her anyway. Have you remarried?

NAT: No, but I'm living with someone. A sci-fi writer. Melissa Bainbridge.

DR. Z: Well, bring her along if you like. I love science fiction, you know that! Everything I do is science fiction.... Bring all your friends and relatives. The more the merrier. We'll penetrate the secrets of the mind....

INTERLUDE

Melissa and Nat are walking down the street, toward Dr. Z's house....

MELISSA: So how well do you know this Dr. Z person anyway, Nat?

NAT: Not that well. I hadn't seen him for three years before I saw his talk the other day. I took one of his classes when I was in college -- but that was years ago. He was a pretty good teacher, I guess; usually ill-prepared for class, but full of interesting stuff to say.... He was always sort of reclusive. Worked on oddball types of research problems.

MELISSA: What do you mean?

NAT: Oh, there were all sorts of stories spread about him. They said he had a drawer in his file cabinet full of brains in jars, sitting in all sorts of weird chemicals....

MELISSA: Oh, come on!

NAT: I don't know. That's just what they say. Anyway, he always seemed like a pretty nice guy. He liked me, he thought I was one of his best students. He was pretty disappointed I didn't want to go on to grad school....

MELISSA: Well, I can understand that. I don't see how you can stand that programming work. It must be so tedious.

NAT: You know I don't mind it. Everything has its challenges. Try it sometime, you'll see what I mean.... Besides, not everyone has the creative spark....

Anyway, he wanted me to come by so he could tell me about this new theory of mind he's been developing. He's thinking I can somehow use it to make some new kind of artificial intelligence program.... Or, I don't know if he really thinks that. Mostly I think he just wants someone to talk to. No one pays much attention to him at the university anymore.... Which is a shame, as he's really quite brilliant. So I told him my girlfriend wrote some science fiction and he invited me to bring you along too. Apparently he has a weakness for sci-fi....

MELISSA: Cool.... Well, I'm certainly eager to meet him. He sounds like quite a character....

Look! What's that?

NAT [stoops down to pick something up]: It's a very unusual beetle. Look at it.

MELISSA [taking the beetle]: Why, it's not a beetle at all! It seems to be some kind of robot thing! It's made of metal!

NAT: Put it in your purse. There's something strange about this.... Come on, there's Dr. Z's house, over there.

MELISSA: I expected it to look different -- like a haunted castle or something....

NAT: Don't be silly -- he's just another eccentric professor. Come on, let's go in....


1

FRACTAL SELF, CHAOTIC MIND...

Dr. Z's livingroom.

NAT: So, Dr. Z, you've invited us here to talk about this new way of thinking about the mind you've been developing. I guess we should start off by having you give a quick summary of what it is you want to talk about..... Then we can start quizzing you about specifics.

That sound all right to you, Melissa?

MELISSA: Sure. Anything's cool with me.

DR. Z: Right. Well ...

It's not really the sort of thing you can summarize. It's a whole point of view....

I don't really know where to start. There's not really any natural starting point, you could enter into it anyplace. Hmmm....

NAT [quietly, after an awkward silence]: I saw a talk you gave at the university three years ago, Dr. Z, when I was just graduating. I didn't understand much, but I remember the title: "Fractal Self, Chaotic Mind." Maybe you could start by explaining what that was all about....

DR. Z: Ah, so someone was paying attention in that lecture, were they? Fancy that! You learn something new every day....

Fractal self, chaotic mind, huh? Okay.... Guess that's as good a place to start as any....

Well, fractal self means that the self is not unified but multiple. Each of us has numerous subselves, interacting with each other and containing images of each other in a fractal hierarchy....

On the simplest level, you can talk like Jung did about the anima and animus, the male and female subselves within. I'm a man, but I have an inner woman, that is, an inner self-organizing system of female thought-patterns, learned from my mother in early youth. You're a woman, but similarly, you have an inner man, an autonomous collection of male thought-patterns picked up from your father and other males you interacted with. And maybe your inner man, if it is a fair replica of some of your father's behavior patterns, contains its own inner woman....

It's well-known that we display state-dependent memory: when you're drunk, for example, you immediately go back to all the ideas that were in your head the last time you were drunk. "Drunk-Nat" might not be a complete personality, because you don't drink very often, but I've known people who had a whole separate personality while intoxicated.

NAT [looking at Melissa]: I see what you mean there. When I was married to Janine, I had a whole separate personality around her than around other people. There were bad bits -- I was a bit more easily angered around her ... rotten-tempered, you know. Though not as rotten-tempered as she was! But I was also way more open and emotionally expressive than normal. Everything that popped to mind, I could say to her. It was like a whole different system of patterns, a whole different way of being.... Even though I'm really close with you, Melissa, it's not quite the same as the vibe you get after being together with the same person four or five years.... Of course, what finally happened was ... well, that doesn't matter.

DR. Z: Right.... So, we each have these subselves, these autonomous systems of thought, perception and behavior. Which is a complicated idea, that would take a while to develop in full. This idea is becoming big in psychotherapy circles now, by the way. The fractal self idea is simply the recognition that each of these selves has to contain models of the other selves that surround it. This holds on an interpersonal level: each of us models each other, models each others' models of each other, and so on. And it holds within a single brain too: each of our subselves models each other in the same manner. So we have selves, and little model selves within selves, and littler model selves within these....

MELISSA: I get a picture of a family of Russian dolls, each one of which contains not one doll but a bunch of dolls, which are smaller copies of its siblings....

NAT: Yeah, I like that. Only these dolls, these selves, aren't really static things, they're self-organizing systems of processes, right? They just have the appearance of solidity.

DR. Z: Right.

MELISSA: But what do you mean? Are these subselves inside us like real people? Could the three of us just be subselves of some greater mind? I don't quite get it.

DR. Z: Subselves aren't quite as coherent as regular people. The further down the hierarchy you go the less coherence and detail there are.

But you should realize that people aren't nearly as coherent as they are usually thought to be. We're all just part of the social network, defined by our interactions; just as our subselves are part of their interactions. What appear to be solid wholes and objects are just nexuses of interrelation -- particularly concentrated regions of the dynamic continuum of interdependence....

MELISSA: Okay ... well, that's not all entirely clear to me, but we can get into it later....

So what about the chaotic mind?

DR. Z: Well, "chaos" is simply a recognition that even systems obeying simple underlying mechanisms can still be difficult or even impossible to predict. Even if we know how the weather is determined by the motion of molecules, we still can't tell how hot it will be next June. And even if we know how thoughts are produced by the firings of nerve cells in the brain, we still can't tell what someone's going to think ten minutes down the road.

NAT: A chaotic system is one that is known to be determined by rules, but yet appears random?

DR. Z: Well, but not totally "random," in the sense of having no regularities. A chaotic system has to be impossible to predict in detail, but it can still be predictable on a certain coarse scale.

MELISSA: Like, you mean we can predict that it will tend to be hot in the summer and cold in the winter, even though we can't predict the exact temperature for even tomorrow or next week?

DR. Z [laughing]: Right. And we can do a lot better than that. Human behavior is chaotic, but we can make all sorts of refined predictions about what other people will do. The catch is that we can't predict exactly, and we can't predict reliably.

NAT: And if you could find a way to predict people exactly or reliably, the world would beat a path to your door!

DR. Z: Yes, but the whole point is that it's impossible. Because the system is chaotic. It's structured chaos, so one can recognize patterns, but only on a coarse scale.

The mind is structured chaos, and the environment is structured chaos too. Mind works by recognizing these coarse-scale patterns in other minds and in the environment. The patterns a mind recognizes come inside it and become dynamic, alive; they make up its own self-organizing mental process system. This system of mental processes recognizes patterns in itself which join in the dynamic network.

And they key thing is, certain structures emerge from this dynamic interaction network -- structures like the fractal self. These emergent structures are an example of what we mathematicians call "attractors," or "strange attractors." They are mind attractors. The fractal self is a stable configuration of patterns emergent from the chaotic mind -- it is a mind attractor among many others.

MELISSA: Mind attractors, huh? The miraculous mind attractor! The language is unfamiliar, but it all makes sense.... Too much sense, almost. Aren't these just everyday facts of human experience cast in a new terminology?

NAT: But wait, you can't fault him on that point. If a picture of the mind didn't mesh with everyday human experience, it wouldn't be tenable. The goal is to find something that meshes with subjective experience, but goes beyond it, giving new levels of detail and showing new connections. Right?

DR. Z: Right. Couldn't have said it better myself.

NAT: Hold on though. Here's what worries me. What does this new way of looking at the mind you're talking about have to do with all the work everyone else is doing? There have got to be thousands of researchers in the world trying to understand the mind. I mean, look at the stuff Janine does, as a neurosurgeon. She's constantly reading papers, more and more papers trying to keep up. It seems like there are breakthroughs every other minute. Isn't it too early to try to make a grand synthesis?

DR. Z: Well, that would probably be the perspective of the vast majority of neuroscientists. But to me, that's like saying you can't understand chemistry until physics is finished, or you can't understand biology until chemistry is finished. The details of the brain don't make mind. Mind emerges out of brain.

Think about it -- modern science approaches mind from several different directions. Neuroscience illuminates the biomechanisms of thought and feeling. Psychology charts the intricacies of experience and behavior. Artificial intelligence provides algorithmic models for what we think and do. None of these specialties, however, gets at the whole, at the essence of mind. And that's what I'm trying to do.

As you point out, neuroscience has proceeded by leaps and bounds, but even so our understanding of the brain is very incomplete. We understand a good bit about the global "macroscopic" organization of the brain, and also about the microscopic behavior of small groups of brain cells. What's lacking is a reasonable account of middle-level brain structure -- what I call "mesoscopic" structure. Despite all the research in cognitive neuroscience, the problem of how cells combine to produce thoughts and feelings remains totally unresolved, from a biology point of view.

Then you have the discipline of cognitive science, which is what I'm wrapped up in more than anything else -- it seeks to fill in this middle level using computer programs and computational models. So far, though, this research programme has led to a number of very interesting failures, and no clear successes...

NAT: True enough, I guess, but this fact can be partially ascribed to the tremendous memory capacity and processing power of the brain as compared to contemporary computing machines. Our best supercomputers can't compare to the brain of an ant!

DR. Z: Well, that's only a partial excuse, though. Even if we can't simulate a mind on current computers, we could still give a detailed design for a mind, to be implemented when better computers come about. That's something I've worked on a lot....

Anyway, even though cognitive science has failed so far, it has taught us some very interesting lessons. First of all, as I said in my talk last week, there's a long history of trying to use programs based on logical rules to simulate the mind. Although this hasn't worked, what it has done is to show us just how reliant human logic is on the rest of the complex web of human thought. We can come up with systems of simple logical rules to do things -- play chess, diagnose diseases, or whatever. But we totally can't come up with systems of logical rules that are able to generalize from one context to another. Whatever the rule system learns about chess is totally inapplicable to, say, checkers, or Go. There's no transfer of knowledge at all -- but transfer of knowledge is totally essential to human learning, creativity, intelligence....

NAT: But what about neural networks? Programs that simulate the brain? A lot of people in the computer world are very excited about them....

DR. Z: Well, as you pointed out, we can't simulate the brain on current computers; they lack the power to simulate even the brain of a chicken! But of course, there's the whole industry of biologically inspired "neural-network" programs.... One of the main points I wanted to make in my talk last week was that these neural networks have the same basic problem as logic-based artificial intelligence programs, in the end. Neural networks can learn and adapt within a specific context -- they can learn to do things that simple logical rule systems can't do, like recognize faces or predict trends in time series. But they still can't generalize and adapt to totally different contexts, like people can. You have to give the neural network its information in a specially prepared form, and then it can learn inductively. But it doesn't know how to go into a new situation and prepare the raw data for processing. It's not truly adaptable or truly creative.

I mean, I don't want to put anyone down. The recent advances in cognitive science and brain science are real and astounding. But they leave the core untouched. Think about it! To give a very partial list, they barely touch

the layered multiplicity of the self,

the structured unpredictability of mental dynamics

the creative self-organization and self-construction of memory,

the intricate hierarchical pattern formation of thought, perception and control systems.... the intricate interplay of mind systems involved in

acts of creativity (writing a symphony, listening to a rock tune, falling in love....)

The fact is, scientifically speaking, the human experience remains largely uncomprehended. In order to cope with the full subtlety and complexity of the mind, new ideas are needed.

MELISSA: So that's where this thing called chaos and complexity comes in, right? I read Gleick's book Chaos, but I still don't feel I really understand what it's all about. I'm even starting to see stuff on chaos in literature now. Finnegan's Wake as chaotic.... Goddess knows it is, but what does it have to do with mathematics I can't understand....

DR. Z: Gleick's book is a nice bit of history of science, but I think it sort of ignores the broader context. It doesn't mention me, after all!

Seriously, from reading books like that you can get the impression that chaos and complexity are these radical new ideas introduced by a few mavericks at the edge of science. While this isn't totally false, it's just part of the story. Really chaos and complexity science are an inevitable consequence of computer technology. Computers allow us to simulate complex systems with intricate interdependences between their parts, systems we couldn't study using mathematics or plain old intuition.

The key concepts of complexity science, as it exists today -- things like chaotic dynamics, fractal geometry, self-organization, abstract linguistic structure, evolutionary adaptation -- these are all just examples of the use of computers to understand how systems work.

The beauty of the computer is that we can proceed in a whole new way: without figuring out the details of how a system works, we can still simulate it and simulate other systems that are somehow "like" it. It's this new methodology that is the really crucial thing about chaos and complexity. It's a new way of doing science. And it's a new way of thinking about science too: we're led to think about computational models, to think about simpler computational structures that capture the "essence" of complex systems.

NAT: The way you're talking, all this is just an offshoot of high technology. Technology drives science. I like it -- I'm a technology man -- but isn't it also the other way around?

MELISSA: Yes.... I think you're leaving something out here, Dr. Z. What I hear you saying is that complexity science captures interdependence and patterns of change .... the same qualities focused on by Oriental philosophy. And before Oriental philosophy, in the world-views of so-called primitive cultures. You want to talk about the mind as a system of processes mutually producing each other, giving rise to these emergent ... attractors, which make up minds and selves.

I think the technology is just bringing us back, back to a more natural, pre-technological way of thinking about the mind. We naturally know that the mind is complex, interconnected, formed from emergent structures. The pygmies in Africa know this as well as we do. The only thing that denies this is reductionist science, the standard kind of science you're taught in high school. What this is really all about is maybe recognizing that mechanistic science was a kind of philosophical error.... Productive though it was in certain areas....

DR. Z.: I'll have to think about that more. That's an intriguing way to look at it.... I'm afraid I don't know much about Oriental philosophy or the African pygmies....

NAT: Well, we could zip over to Africa and drum some up. That might liven things up a bit.

MELISSA: I don't imagine they would care for suburban New Jersey....

Hey, here's something I've been wondering about. You know those cool fractal pictures you always see. What do they have to do with all this stuff? Or are they just there to look at?

NAT: I can answer that one. You have attractors, right?

The same types of complex characteristic behaviors, the same "chaotic attractors," pop up over and over again in different systems, different situations. And these recurrent behaviors are often best described using fractal geometry rather than conventional Euclidean geometry. Fractals are attractor geometry.

MELISSA: Is that how all the pretty pictures are obtained?

DR. Z: Well, sort of. The pretty pictures are attractors of simple dynamics, not necessarily of dynamics that are of interest for studying real systems. The pretty pictures are relatives of the fractal attractors that are really of interest. The problem is that the ones of scientific interest are pictures in many dimensions, not just two or three dimensions.

For instance, the Julia sets you always see [shows them a picture in a book] are gotten from a quadratic equation in two dimensions. That same quadratic equation in higher dimensions can model the behavior of complex self-organizing systems. But you can't look at a picture in seven hundred dimensions -- unless you have seven hundred dimensional eyes!

MELISSA: Seven hundred dimensional eyes ... yeah. I remember that from an acid trip back in college!

DR. Z: No, seriously. So the pretty pictures aren't just frills,

they're symbols. Symbols for the higher-dimensional pictures that govern system behavior, that we can't visualize. Julia sets are regions of system viability: they tell you which internal states of a system allow the system to remain in a reasonable condition, without any of its internal parameters increasing to an unreasonable degree. The ones we look at are the regions of viability of relatively uninteresting mathematical systems -- but they have a lot in common with the higher-dimensional regions of viability of systems like you and me, and the clouds over our heads....

NAT: But the fractal nature of these pictures means that they're infinitely ramified, infinitely detailed: they keep having more and more variety no matter how finely you look at them. So what you're saying is that attractors of systems, and portraits of system viability ... that things like this have a fractal structure.

DR. Z: Right. And thus it's no coincidence that things like the mind and the self have a fractal structure too....

But we have to remember that the geometry is just one way of looking at attractor structure. Attractors are the key thing; they are just the characteristic complex behaviors of systems. Attractors can be understood as geometry, but they can also be understood as languages.

NAT: That's a new one to me. You mean languages like English or French? ... or more like programming languages?

DR. Z: More like programming languages. Formal languages, they're called: just sets of symbols with grammatical rules.

NAT: Hmmm....

MELISSA: It strikes me that you might phrase all this in a rather different way. You're talking about complexity, self-organization and interdependence in the mind. Couldn't you summarize this all by talking about mind as an ecology.... Ecologies have complexity, self-organization and independence. And they're also wonderfully creative. They guide and focus the evolution process; look at all the wonderful forms they come up with! Even forms like us, who sit around chattering all night....

DR. Z: You're right, that may be a good way to put it. Ecology is something everyone can relate to. Concern for the environment and all that.

The key point, I think, is that the process of natural selection is not restricted to the evolution of species. It plays a role in a huge variety of natural systems, from biospheres to immune systems to brains and computer learning algorithms. The essence of evolution, it seems, is neither competition nor cooperation, but rather the emergence of structure out of a fluctuating web of interacting elements. According to this new understanding, every human being is an ecosystem in him or herself, containing a variety of coevolving mental processes. We're all evolving ecologies!

NAT: Wow ... I feel so large! So green! So fertile!

MELISSA: Keep your pants on, Nat....

DR. Z: Seriously, guys, the evolutionary perspective is wonderfully revealing regarding memory. For, after all, what is it we are doing when we remember something? Do we pull that thing out of some "slot" in the mind? Or do we create it anew? Memory can be wonderfully accurate; but it can also be wonderfully creative, as false memory syndrome suggests. On careful consideration, memory emerges as a complex evolving system whose purpose is creation-to-order rather than storage and retrieval. The chaotic dynamics of memory systems add novelty and innovation to our world.

MELISSA: You're talking about those people who "remember" being molested as a child ... but half the time the memories are actually fake. Constructed during the process of being counselled by some therapist.

NAT: Or those people who "remember" being abducted by UFO's.... I watched a whole movie on that once; it's very convincing. These people seem so totally sincere -- and some of them very intelligent, well-educated people. Their minds have just built these memories to order....

DR. Z: Right. So the mind is an evolving ecology. Memories evolve to match the external world and to match each other. They are constantly revised and re-created by the evolutionary process. And thought processes, as a whole, are then re-visioned as strategies for survival in a complex, self-organizing, evolving mental process network....

Really, what we call the "thought process" is actually several processes, interdefined and interlinked. The philosopher Charles Peirce distinguished three kinds of thought processes. There's induction, the recognition and forecast of pattern ... there's abduction, the guessing of theories to match information; and there's deduction, the working-out of consequences from given axioms.... These processes depend on each other and on memory in a complex way. They are optimized for survival in a world characterized by structured chaos. Each type of thought is contained within the others in a fractal hierarchy of reason....

NAT: I think we're getting off track here.

DR. Z: Okay, you're right. So let me sum things up. Finally, what we arrive at is a view of the mind as a self-producing system, a network of interproducing mental processes. A network of processes which has evolved itself into an chaotic "attractor state" characterized by certain large-scale memory and thought structures. This mind attractor has a fractal structure which is essential to the coordination of memory with perception and action. The structure of the mind attractor is intricately connected with the structures of the attractors in other minds, and the world.

NAT: So the mind is this attractor. But what is the mind made of, then? The mind isn't the brain, but it's not something external to the brain either. It's just this pattern of dynamic behavior in the brain?

DR. Z: Pattern of dynamic behavior -- good. The mind is a pattern in the brain, that's the most crucial thing. Or, you might say, a pattern in the body: the brain's the most crucial part but not the only part....

NAT: But what do you mean by a pattern, really?

DR. Z: A pattern is just a representation, a representation that simplifies. It's a symbolic representation, essentially. It lives in the realm of mathematics, of language, of pure relationship. The mind is not an object but a kind of relation, a web of relations.... Really, when you get down to it, we only know of things when they enter in the mind, so everything in our world is just a relation, a pattern. We live in the world of patterns.

MELISSA: Whoa, Dr. Z. This is getting way too deep for me.... We'll have to go over all this again, much slower.... But one thing is bothering me. You haven't talked at all about consciousness....

DR. Z: Well, you can go a long way without talking about consciousness! The vast majority of mental processes are unconscious: consciousness is not even the tip of the iceberg of mind, it's more like the tip of the tip....

I think it was Nietzsche who said consciousness is like the military commander who takes responsibility for the actions of his troops.... "I" did it, he says, when really it was just a spontaneous self-organization, over which he had no particular control. Tolstoy thought the best generals were the ones who recognized their inability to control their armies, and let spontaneous self-organization take its course ... he beats this point into the ground all through War and Peace.... Similarly, you might say that the best people are the ones who recognize the impotence of their consciousness -- who understand that the best consciousness can do is to manage the self-organizing spontaneity of the unconscious....

MELISSA: I get the point, yeah. That's all very well.... We tend to ascribe our decisions to consciousness when in fact they're usually determined by unconscious processes, processes we don't even know about.

But still, this consciousness, this indefinable core of mind, this spark at the center you could call it ... this is the thing that we most fundamentally identify with. You can't get around it. Is it just a quirk of brain function, or what? Or is it something else totally different, something forever outside rational understanding? This is an important question, not something you can just sweep under the rug.... Even if its importance is sometimes overestimated.

NAT: Well, if you talk to Janine or the other brain surgeons, consciousness becomes something very concrete. Lesions in particular areas of the brain cause impaired ability to be conscious of one's actions....

MELISSA: Okay. But how do you know that's affecting the essence of conscious experience, and not just the nature of conscious experience....

NAT: I don't know that. I didn't say anything about that.

MELISSA: I tend to feel that everything is conscious, right down to rats and bugs and trees and rocks and elementary particles....

NAT: Well, Christ....

MELISSA: What?

NAT: I don't know any nicer way to put it; that just sounds like nonsense to me.

I guess the problem is you have to define your terms.... What I think of as consciousness a rock doesn't have. It must not be the same as what you're calling consciousness.

DR. Z: I think you have to distinguish between types of consciousness. There's raw consciousness, primal experience of being. Then there's consciousness of objects, what the psychologists call "attention." Then there's introspective consciousness, looking inward at yourself. And there's consciousness without an object, as in certain types of meditation.

MELISSA: So you'd say raw consciousness is in everything. But the other kinds aren't. They're just special ways that raw consciousness manifests itself.

DR. Z: They're the effect of raw consciousness on certain complexly structured systems. Right.

NAT: I don't know if I'm satisfied with that. So I'm supposed to believe everything is conscious, it just doesn't use its consciousness in a sophisticated way?

DR. Z: I didn't say you were supposed to believe anything. The key point is that, as a mind theorist, I'm not going to try to explain raw consciousness. I'm going to try to explain how raw consciousness manifests itself in the complex, self-organizing mind.

MELISSA: Okay, we'll get back to consciousness. Now what about self? That's the other thing that worries me. Consciousness is the feeling of being here, of just plain being. Self is the feeling of being me. It's the other half of essential experience. Are you going to say that self is just a part of the mind attractor? But how does consciousness work into it? Somehow consciousness works to enliven self, doesn't it?

DR. Z: I have a feeling you're right, but this is something I haven't quite come to terms with.

The thing is that consciousness has got to be necessary for the evolution of the self in early childhood.... Self arises through sustained awareness of other humans, and by reification of the images of oneself one perceives in other humans.... And this reification is accomplished by consciousness. More than anything else, consciousness serves to make things real. And self is what makes us real.

Self is a lie, of course; the little word "I" is an absolute falsehood. It's just a linguistic simplification.... But it's a white lie, a useful lie, anyway. It's easier to put a label on your behavior, to say "I" did it, than to constantly think about the vast self-organizing array of processes that is really involved in every thought, feeling, action....

The evolution of the fractal structure called the self is the main task of the first few years of childhood....

NAT: It just occurs to me, what does this say about AI? What you're suggesting is that artificial intelligence won't be possible until we figure out the proper conditions for the growth of the self.

MELISSA: Well, that's almost obvious.

DR. Z: Absolutely correct.

MELISSA: Okay, but self-growth doesn't end in childhood. You work toward personal growth throughout your life....

NAT: Well, some people do.

MELISSA: That's right....

Hey, that reminds me of the joke. How many psychologists does it take to change a light bulb?

NAT: How many?

DR. Z: Only one, but it has to want to change.... Ha ha ha.

I know a better one than that. How many psychologists does it take to wallpaper a room?

NAT: I know that one. It depends how thin you slice 'em!

MELISSA: That's sick! Come on, Nat....

Seriously, Dr. Z....

DR. Z: Yes, the self can keep changing in adulthood -- by revisiting the mental processes of childhood, according to which patterns of self- and reality-perception are fluid and open to creative evolution.... The disadvantage of the changing self is that you lose a certain expertise that comes from having been doing the same thing for a long time. But in the modern cultural and technological climate, there may be no choice but to have this kind of protean self, I suppose. To remain forever a child....

One thing to remember, though -- we've been speaking of the self as something contained in the individual; but in the end self is social: each self is made of others. It's an error to think of a mind as an isolated, individual phenomenon.... Minds are inextricably situated in communities, which in turn exist in minds. The process of reality-building is a collective mental act, which follows the same dynamics as more individualistic types of thought. Reality is a kind of society-wide intimacy and romance, one might say. This kind of interdependence is crucial to human personality, human intelligence, and social interaction.

NAT: But...

DR. Z: Wait.... So mind is social. Mind comes along with collective world. But the next part is, it is by our language, more than anything else, that we structure our collective world. Different cultures, different minds and different parts of the mind have different languages and hence different subjective realities. Conventional theories of syntax and semantics fall short of capturing the fundamental nature of language, but new ideas show more promise! Each chaotic system, as it turns out, has its own language, and in this way contains its own view of reality. The fractal, chaotic structure of the mind is reflected in the depth and variety of its languages, and in the fractal, chaotic structure of reality itself.

Okay, now what were you going to say, Nat?

NAT: I forget. I think we've had enough for right now. Let's adjourn and eat some of the wonderful-looking carrot cake Melissa brought. We can start again in half an hour.... I think I get the general background for what you're thinking about the mind. I'm ready to get into the specifics.


2

HOW TO BUILD A BRAIN

Dr. Z's livingroom again.

MELISSA: Okay. Let's get started again now. Where did we leave off?

I guess we'd sort of finished the basic summary. We can delve into some of the specifics now. This is really fascinating; I'm so glad I came along. I miss this kind of stimulation since I graduated college.... I was especially interested in your last point, about how language is used to construct the world....

NAT: Well, language is interesting, but I'd rather get into it later....

I'm especially interested in the artificial intelligence connection, Dr. Z. I can see that there might be problems getting mind attractors to come out of the small-scale computer systems we have today. But still, I'm thinking that it might not be impossible.... I mean, if you lump together all the computers in the world, it seems quite possible that you might be able to get an amount of processing power approximating the human brain. This is really nagging at me....

This is really why you invited us here, isn't it? You're hoping I might be able to program something....

MELISSA: You're talking about some kind of Internet brain, Nat? That's intriguing....

NAT: Right. What I want to know is, exactly what would I have to do to program a system that would have a mind attractor emerge out of it?

DR. Z: If you want a definite and rigorous answer, I can't give one. We just don't know enough yet.

But if you want an educated speculation.... I'd say, take a bunch of flexible processes, capable of recognizing patterns in each other. Say, repeated patterns, and simple logical combinations of repeated patterns. Set it up so the processes need to manipulate their environment in some way in order to survive. So, a subnetwork of the process network can survive only if it, collectively, is able to manipulate the environment successfully....

NAT: Each process survives if it recognizes patterns in the environment, or in other processes?

DR. Z: Right. So some processes will survive just by circular causation: group A recognizes patterns in groups B and C, groups B recognizes patterns in group A and C, group C recognizes patterns in group A and B.

It's the specific parameters of the system that determine how well a circular thought system like this can survive. A very similar situation occurs in the immune system. Antibody classes survive by matching foreign antigens, i.e., they need to recognize the pattern of the surface of the antigen. It is the matching with antigens that stimulates them to reproduce. But antibody classes can also survive by matching other antibody classes: this can also stimulate them to reproduce. This gives rise to the notion of the "immune network." The immune network can keep on interacting with itself, at a low level of activity, even when there's no external antigen in the system -- just antibodies matching antibodies, matching antibodies,.... But how much of this non-antigen-related activity goes on is an unanswered question. It depends on all the specific chemical properties of the antibody cells....

Now in the mind, you have a similar situation -- the external world plays the role of the antigen, the thing which the system needs to learn to recognize patterns in. The network of thoughts plays the role of the immune network. But in the mind, the parameters are set so that there's lots and lots of network activity....

The analogy actually carries pretty far. Which is not surprising, since the brain and the immune system are the two most complex systems in your body. You know, your immune system weighs a whole pound! A third as much as the brain.... The immune system is an evolutionary system: it works by natural selection. Just mutation, no crossover though. Antibodies mutate, and they survive if they match antigens or other antibodies, otherwise they don't reproduce and die out. It's very Darwinian, but also ecological, because the fitness of an organism is not determined entirely by the antigen, it's also determined by the network, by the other antibodies. Your immune system is an intelligent, evolving ecology!

MELISSA: Wow! So it's no wonder things like stress and positive thinking can affect your immunity....

DR. Z: Right. The brain and the immune system are two intelligent systems occupying the same body, constantly communicating with each other. But the relationship is bound not to be a simple one. We really don't know much about the language by which they communicate....

NAT: But let's get back to the point here. So you set up this immune network-like collection of pattern-recognition processes. Then what? Just stir briskly and watch what happens? There's got to be more to it than that.

DR. Z: Does there? That's an interesting question.

Your metaphor of "stirring" is a good one. The immune system is, to a fair degree of approximation, a "well-stirred" system. Each antibody type has a good chance of bumping into each other antibody type, as the antibody cells swarm around the body in the bloodstream. The brain, on the other hand, is not well-stirred at all: all the cells are fixed in specific locations.

MELISSA: There's still a kind of natural selection going on in the brain, though, isn't there?

DR. Z: Right, but it has to do with the electrical connection pathways between brain cells, between neurons. Connections which are not useful tend to assume a low conductance, so not much electricity passes along them. Connections which are useful tend to get a high conductance. There is survival of the fittest electrical connections; an evolving ecology of links.

On the other hand, in some cases the brain has a pretty good ability to move things around. As a person's competency in a second language increases, the brain regions responsible for that language can shift considerably, even from one hemisphere to another. So things aren't as rigid as they might seem.

NAT: The goal is to get the right kind of attractor to emerge out of the system of pattern-processes, right? What kind of attractor do we want, exactly; that's the question.

DR. Z: We need an associative memory structure. Some kind of structure in which processes which are related to each other can be used to access each other. This should emerge pretty much spontaneously; it's there in the immune system as well as the brain.

There's a certain size threshold here though. If you figure the relations between one process and another are relatively random, you can make certain calculations. You can figure that the chance of one process recognizing a pattern in another has to pass a certain critical threshold. Once this probability passes the threshold, you're going to have a useful associative memory. When it's below the threshold, you're not.

NAT: At the threshold value, everything sort of falls into place?

MELISSA: I see what he means. Think about trying to follow a chain of associations. If everything you think of reminds you of at least two other things, then you're going to constantly be led on a wide variety of trains of thought; you can always go out in all sorts of different directions. But if, say, half the things you think of only remind you of one other thing, and the other half don't remind you of anything at all, then you're not going to be able to follow very many interesting trains of thought....

The threshold would be when every thought reminded you of one other thing. As long as everything reminds you of more than one other thing -- on a kind of average, I mean -- you'll have a useful memory. A memory that can be used for analogy, for following from one idea to another to another. I never thought about it this way before, but it sort of makes sense. You have to see enough connections before you can really link your knowledge base together into an organic whole.

But the thing is, the more you can link your knowledge base as an organic whole, the better you'll be able to see connections. So the whole thing links back on itself....

NAT: Wow -- Melissa, you blow me away! You should've been a mathematician!

MELISSA: No way. I can't deal with abstraction. I can only think about things that come out of my own experience....

It's just, the idea of trains of thought comes pretty naturally to me. It's like stream of consciousness writing. You just take one thought, and see where it leads you, then take another thought, and see where that leads you, and so on....

DR. Z: Yes, I see what you mean. It's what we'd call a "random walk" -- a random walk on an associative memory network. If you had true stream of consciousness writing you could use it to reconstruct the memory network of the author at the time he was writing....

Right, so one thing you need is this associative memory network. And another key thing you need is an hierarchical network for perception and control.

NAT: You mean, you need processes building up to more general processes, which build up to more general processes, and so forth....

DR. Z: Right. But the hierarchy of processes has to display dynamics in both directions: up and down. Perception is the propagation of information up an hierarchy. Control is the propagation of information down. The two processes are interlinked.

NAT: So we need to somehow get these two things to emerge? The associative memory and the hierarchical network.

DR. Z: But the catch is, the two things aren't two things at all; they're really one thing! See, the hierarchical network has got to be the same network as the associative memory network! The memory is dynamic and active; it's a memory composed of procedures, of processes that actually do things -- not dead memory like papers in a file cabinet!

The hierarchical network is the associative memory network, and the most natural way to achieve this is for the memory network to be fractally structured. So that the memory consists of clusters within clusters within clusters.... Larger and larger clusters match up with higher and higher levels of the hierarchical network....

And consciousness -- Melissa -- introspective consciousness would then emerge as a certain process involving loops between the bottom of the hierarchy and the middle-to-top levels of the hierarchy. Between perception and thought....

NAT: So what we want is to get this combined associative memory hierarchical-perception-control network to emerge spontaneously, as an attractor in a mixed up system of processes. Why don't we just impose the structure in the first place.

MELISSA: I see the answer to that. Because then it would be brittle. Just like with AI programs.

The reason the brain and mind are so robust is because they evolved. They emerged spontaneously by natural selection and self-organization. If they were so brittle that slight changes in internal or external conditions would destroy them, they would never have evolved in the uncertain environment of Earth....

DR. Z: Yes -- they evolved, and they also manifest the process of evolution. As the memory network adjusts itself to reflect new patterns recognized, the hierarchical network is adjusting itself to form new processes, new "programs" for acting and perceiving. The constant fluctuation of process interaction is a kind of random variation which provides the raw material for natural selection....

NAT: Okay. I'm starting to get a clear picture here. The question is, how much of this structure, this "two network" structure, associative and hierarchical...

DR. Z: I call it the "dual network" structure

NAT: How much of this dual network structure do we have to build in? Can we leave it totally mixed like the immune system or do we have to impose some kind of geometry on the network, as is done in the brain?

DR. Z: What you're asking is, what is the shape of the basin of attraction of the mind attractor? And no one knows the answer to that question.

MELISSA: The shape of the basin? What?

DR. Z: An attractor is a characteristic behavior, a behavior that other behaviors will tend to lead to. The basin of an attractor is the set of all behaviors that will eventually lead to that particular attracting behavior.

You can see this pretty easily in toy neural network models. Suppose you set up a network so that each state of the network represents a certain visual pattern, a pattern of pixels or black-and-white squares. The attractors of the network then correspond to letters that the network's supposed to recognize. Then the basin of the attraction for the letter "A" is the set of all input states that will lead the network to converge to the letter "A".... What you want out of the network, in this case, is that the basin of attraction should consist of everything that an average human would classify as an "A." Then you could feed the network any digitized letter as an initial state, and it would converge to the attractor representing the "ideal form" of that letter.

NAT: That's the Hopfield network, right? Invented back in the 80's, you said in your talk. But I thought they never got it to work adequately well....

DR. Z: Well, it's not clear what that means, "to work." You can make a neural network with whatever attractors and attractor basins you want, so you can make it work. I saw a recent paper by a guy named Michael Zak, who does this in a very neat way.

But if you have to program in all the attractor basins, then there's no real point is using a neural network instead of just encoding a series of rules for recognizing letters in an ordinary programming language.... What you want is a network that will make its own attractors and basins, based on experience.... And that's what has never been gotten to work quite well enough to be of practical use for automatic character recognition.

NAT: So, for the mind attractor, the question is really how big is the basin. How much of the dual network structure do we have to build in, in order to get it to emerge?

MELISSA: It could be that we inherit some of this structure, just from birth, by heredity. It could be already there in our brain structure.

DR. Z: Could be. I really don't know....

What I would do would be to put the processes in my system in a world with an approximately fractal structure. So as to sort of encourage it to develop the right attractors -- without coercing or forcing it.

NAT: But that's perfect! The Internet itself is made up of clusters within clusters within clusters.... We can have larger and larger regions of the world correspond to more and more general clusters, higher levels of the hierarchy. The structure of the Internet can give you the structure of mind.... Little bits of the mind stored on everybody's computer.... The only problem is getting access to everyone's computer to put the information there....

I know a lady named Stephanie Forrest who is trying to build computer security systems modeled on the immune system. This is sort of just the opposite, it strikes me. We're talking about evading computer security to install a worldwide brain modeled partly on the immune system....

MELISSA: It's a beautiful vision.

NAT [laughing]: A beautiful mirage, more like! I'll think about it some more.... You've given me a lot to think about tonight, Dr. Z. But we still have some time left. Melissa, you wanted to ask talk about languages?

MELISSA: Yes, I did. Although I must say I'm glad you changed the subject when you did -- this idea of building an Internet brain is really quite fascinating. I think I could make a brilliant novel out of it!

Let's take a break, though....


3

LANGUAGE, MIND, CHAOS

Dr. Z's livingroom again.

MELISSA: Okay, language.... This business of -- what did you call them? -- false languages ... no, formal languages ... really fascinates me. As a writer, I'm always concerned with making up my own language, my own way to express myself.... I wonder if these formal languages could be used for artistic expression....

DR. Z: I imagine they could, but they'd require a lot of work on the part of the reader. I mean, the reader would have to learn the new language in order to read the book....

A formal language can be something very, very simple, something much too simple to have any expressive value. On the other hand, it can also be very very complex, much more complex than English or Chinese or any human language.

NAT: I heard somewhere that Hungarian is the most difficult language in the world. Is that true?

MELISSA: Nem.

NAT: Nem? What does that mean?

MELISSA: It means "no" and "sex" in Hungarian....

I don't know whether Hungarian is really more difficult than any other language, but I sort of like the idea of a language where no and sex have the same word....

NAT: Yeah, I see what you mean! Imagine the date rape cases.... You know how they say "No means no" -- well, if no means sex then things could really get confusing!

MELISSA: One peculiar thing about Hungarian is its isolation. It's distantly related to Finnish and Estonian, but basically it's totally unintelligible to non-Hungarians. The Hungarians are such an incredibly creative people, they've written so many novels, short stories, poems and so forth, but only a very small percentage have been translated into other languages....

NAT: Yes, when you think about it, translation is a weird thing. There would really be no way to translate "No means no" into Hungarian, would there?

MELISSA: I have a collection of about forty different translations of the Lao Tzu. There are plenty of commonalities, I guess, but the differences are really incredibly striking. The Chinese is so simple, but to render it in English you have to add all this grammatical complexity....

What I wanted to ask you, Dr. Z, is whether you believe in the Sapir-Whorf hypothesis. Do you....

NAT: The wha?

MELISSA: Sapir and Whorf were ethnolinguists. Well, actually Whorf was a petroleum engineer or something, but in his spare time he studied Amerindian languages. Their idea was that language structures thought and reality. That we think and perceive the way we do because our language makes us....

There's some evidence for it. For instance, a sociologist named Bloom did a study in Hong Kong, asking Chinese various questions about alternate futures, like "If the government were to outlaw pens and pencils, how would you react?" The result was that Chinese with little exposure to Western languages were unable to deal with his questions at all. They would just give answers like, "Well, the government wouldn't do that; that's ridiculous." Their language doesn't contain any abstract construction for counterfactuals, for saying "If X,..." when X may in fact be false. It's a conformist language, in other words: it doesn't contain any mechanisms for contradicting the established order of truth.

Whorf's big example was the Hopi model of space and time. Apparently the Hopi don't have any words for past, present and future, nor any grammatical tense system for distinguishing them. They lump the future together with the imaginary, and the past with the present. A fairly reasonable system, if you think about it. Whorf claimed that the way their language treats time is the same as the way they think about time, psychologically....

I guess it all makes sense, on a crude level; the real question is the extent of it. Do linguistic structures have only a slight effect on thought and perception, or do they have a major effect?

DR. Z: That's a very good question; I'm afraid I can't answer it....

One problem is that we don't really know how to measure the commonality between two different natural languages. Look at someone like Chomsky, who believes that all human languages are basically the same -- that they're gotten from some underlying universal language by a process of twiddling some kind of linguistic "parameters." You just take basic human language, answer a list of questions -- do modifiers go before nouns, or after them?, do we have gender markers for verbs, or not?, etc. -- and presto! you have a human language. If Chomsky's right and all our languages are basically the same, then the differences Whorf is pointing out would be pretty minor in the grand scheme of things, and you wouldn't expect them to lead to major differences in thought.... On the other hand, maybe human languages are really fundamentally different from each other, and the differences Whorf is pointing out are major.... I just don't know....

The basic point that language differences lead to thought and perception differences -- that's obvious. Our thought is guided by associative memory, and a lot of our associations are supplied by language. Language, by giving us associations, structures our mind. But how many of these linguistically-inspired associations really differ from one culture to another, that I don't know.

See, if you took a nonhuman species, say a dolphin or something, then you wouldn't expect the same linguistic framework to be there, and you'd definitely expect their different language to be tied up with their different thought patterns. There'd be no question about it, really.

What has to be totally rejected is the idea that language is separate from thought: that thought just proceeds on its own and then feeds its output to language. This is entirely ridiculous. It's obvious that, as you say, Melissa, language affects thought intensely....

MELISSA: Okay, I guess I see your point. There are two different questions: one is whether language affects thought and perception, the other is how fundamentally different the various human languages really are. But I don't see why the second question is so difficult to answer....

DR. Z: Maybe it isn't. I've never tried to answer it, and I've never read anyone address the issue in a way that really made sense to me.... Linguistics is a very confused science!

MELISSA: But I mean, as a writer, it's very clear that different idioms for expressing things lead to totally different perceptions on the part of the reader. The same actual events told in different styles, different tones, can give the reader a totally different picture in his head -- even if she'll never know exactly why. So here we have language constructing reality. This is just an imagined, fictional reality, admittedly.... But you've said that we construct our memory by a creative process -- and one can infer from this that, to some extent, we construct our subjective reality by a creative process. The person reading a novel isn't so different from the person listening to others' speak and constructing his view of reality based on what they say. And that's what we do all the time, isn't it? We construct our collective, cultural reality together....

DR. Z: Yes, that's true. You have a decent analogy there.

I forget who it was that said, the goal of a novelist is to create a warped, twisted view of reality and render it in such a way that the reader will accept it as real. I think that was Vargas Llosa, the Peruvian novelist, who actually ran for President of Peru a few years ago.... He lost to a Japanese guy named Fujimori -- it was a very interesting story. There was a cholera epidemic in Peru, and no one was buying fish, and it was hurting the economy, so President Fujimori went on TV and said the cholera situation wasn't that serious. He urged everyone to eat lots of fish, and to make his point more convincing, he even ate a live fish on TV, in good Japanese style.... His broadcast reassured people somewhat -- and death by cholera increased dramatically! Talk about creating a warped, twisted reality and making people accept it as real.... It's awfully ironic!

NAT [laughing intensely]: Is that a true story??

DR. Z: It doesn't matter if it's true.... Yes, of course it is, I read it in the newspaper a few years back.... Anyway, believe it or not, there was a point in there somewhere.... You're awfully tolerant, Melissa! The point is that what a novelist does is not so different from what we all do, every day. We all rationalize like hell, constantly. We have a view of reality -- of the reality we'd like to live in -- and we talk ourselves into accepting that this reality is the real one. And we talk to others, portraying reality in this light. We convince ourselves that this false view of reality is real. But it's not clear to me how important the talking really is in this process, as opposed to just thinking and acting....

I guess the key thing is, you have to take a broader view of language than just talking. Language is really just systematic, rule-based ordering. Language is whenever you have a bunch of things that can be arranged in various patterns, and there are certain definite regularities to the patterns, which can be captured by a bunch of abstract rules that you can call a "grammar"....

NAT: But wait -- if you want to define the term that broadly, then everything is language. Is that really meaningful?

MELISSA: Well, a whole host of novelists and poets have written that everything is language. It was a meaningful feeling for them, at any rate.... These writers got so deep into the structure of language that they were able to see this structure everywhere, not just in written texts and speech but in physical objects, feelings, rivers, anything....

NAT: Okay, fine, it's a valid feeling, but what does that mean to anyone else? Suppose I have a feeling that the whole universe is a giant watermelon, and we're just the seeds? Who gives a flying potato?

MELISSA: Watermelons, potatoes, you're making me hungry.... Maybe we should break for a snack.

DR. Z: No, wait a moment. I think we're onto something here....

The interesting thing is, this intuition that everything is language is borne out by recent developments in chaos theory. I have a computer algorithm that operates on exactly this principle, that everything is language. Given any system that extends in space or changes over time, my program represents the abstract structure of this system as a language.... So these poets were right! Everything really is language....

The watermelon example wasn't a good one, Nat. A better example is -- you're a programmer -- what if you have a feeling that everything in the universe is just a computer? This is a valid feeling and, to a certain extent, it's a valid statement as well. It's called the Church-Turing Hypothesis....

NAT: I guess you're right, in a sense. According to automata theory, computers and languages are interchangeable. Every type of computer corresponds to a certain type of language -- the most expressive type of language it's capable of recognizing, of understanding. The most powerful computers correspond to phrase-structure languages like we use. Weaker types of computers correspond to languages without phrase structure, or without context-sensitive rules, or something....

So I can see some kind of connection there, I guess. But it still feels kind of fuzzy to me....

MELISSA: You sort of lost me there with your computer talk, Nat. Sorry.

NAT: Don't worry about it; I'm sort of lost myself, that's the whole point....

MELISSA: Tell us, Dr. Z, about this program of yours that recognizes languages in everything -- this is fascinating. I'm not even hungry anymore.... Well, maybe a little bit.

NAT: I'll tiptoe out to the kitchen and grab us some watermelon.... Go on, Dr. Z, I'm still listening....

DR. Z: Well, it's really quite simple. A language is just a way of combining things in time. Suppose you have a system that changes over time -- a system of any kind! Its condition, its state at any given time may be very complicated, may be characterized by a huge number of different things, different quantities. The full range of possible conditions that the system might be in is called the "state space" of the system.... So, suppose you decide to categorize the state space, to divide it into a finite number of categories. Then the system's changing states over time can be represented as a series of categories. First the system was in Category 1, then it was in Category 2, then it was in Category 1 again, then Category 4, and so on....

MELISSA: Okay, let me see if I've got this so far. Suppose the system you're talking about is me. Now, I can be in all sorts of different conditions, right? My condition is characterized by a huge number of different properties.... On one level, you have every single particle in my body, with its own position and movement.... But even on a more physiological level, you have heart rate, various EEG waves, breathing, what I'm saying, how I'm moving, and so forth....

But I can put the various possible conditions of Melissa in a few classes -- if I want to. I can say Melissa is "happy" or "sad" or "ecstatic" or "wondrous" and so forth....

There are loads of different ways to categorize my conditions, though. I could also categorize Melissa as, say, "waking" versus "sleeping" -- or, "ordinary waking state" versus "sleeping state" versus "just having woken-up state," versus "stoned state" versus "tripping on acid state," and so forth....

Or I could do like in the hospital, "critical" versus "serious" versus "moderate" versus "healthy"....

The thing is, which categorization you choose is very important, right? It affects the way you think about yourself -- about the system, I mean. If I'm writing something and I decide to categorize a character as "ecstatic" as opposed to "healthy" or "waking," this evokes a totally different image in the reader's mind.... The category system creates the character. So if we carry forward the metaphor we were using before -- that we're all the authors of our realities -- the conclusion would be that our choice of category system creates our world....

NAT: We're getting back into that Whorf stuff again. I want to hear about his algorithm.

MELISSA: Oops -- my apologies. Sorry for leading off the track; I'm just trying to understand things in my own way.

NAT: And it's remarkable what a good job you're doing. I think you understand this stuff better than I do, at any rate....

MELISSA: I wish I did [giggles]. Now I'm distracting us. Go on, please, Dr. Z.

DR. Z: Right, okay.... So we have a system, and we've divided its state space into various categories. Now, as the system changes over time, we can monitor what category the system is in at each time. The system's pattern of change over time is represented as a certain series of category names, which is just like a series of letters or words in a language....

MELISSA: Okay, let's get back to my example of me. Suppose we divided my state space -- my conditions of being -- into four categories, based on the ideas of happy versus sad and excited versus relaxed.... So we have four categories

A -- happy and excited

B -- happy and relaxed

C -- sad and excited

D -- sad and relaxed

Right? Four categories. Now suppose I wrote down every half hour how I was feeling: which one, A, B, C or D. Then what I would get would be a long list of letters, like

AABCDBCADCCBBCDDACCCDCCDCAAABBC...

or something. Just a long list of letters. Not broken up into words or anything. What you're saying is that this list of letters is going to have some grammatical rules?

NAT: I can see that. Yeah. Suppose that, say, you rarely went from A to D or from D to A. Or suppose you rarely went from B to C or from C to B. That would be pretty reasonable. Who goes from happy and excited all the way to sad and relaxed? To sad and excited, maybe; or sad and relaxed....

Then these rules are like grammatical rules, of a sort, I guess. You'd have a rule that says, A has got to be followed by B or by C. Or at least, A will nearly always be followed by B or by C. Something like AD just wouldn't be a rule in your language!

MELISSA: These symbols of yours, these A, B, C and D, these are peculiar sorts of things. They're not quite like letters but they're not quite like words either.

DR. Z: Well, they don't have to be exactly like either letters or words. But the could be. It could be the case that certain short series of symbols tend to act as units -- I mean, as words. In the particular example you've cooked up, it doesn't work this way, that's all.... It could happen that the A, B, C and D only occurred in a few combinations, like say ABACAB, DABACA, BABA.... Then you could look for patterns in the occurence of these combinations, these "words"....

NAT: Okay. But just saying A isn't followed by D doesn't give you much of a language.

MELISSA: But wait, Nat, you'll have more complicated rules, too. You certainly have rules like, if A is followed by C, then D is very likely to occur. In other words, ACD is probable, while ACA is unlikely. We've already ruled out ACB.... Having been happy and excited, then sad and excited, you're not that likely to become happy and excited again; much more probable that you'll get sad and relaxed.

DR. Z: Why is that? I don't quite follow you.

MELISSA: Well, if you're going from A to C, you're moving in a direction of constant excitation and decreasing happiness, right? Your excitement level is just cruising along, your happiness level is dipping down. This has brought you to C, to sad excitement. But if you're going to get out of C, you have to break one of the two patterns that brought you there: either break the habit of constant excitement, or break the pattern of decreasing happiness. Breaking the pattern of constant excitement will bring you to sad relaxation, or D. Breaking the pattern of decreasing happiness will bring you back to happy excitement. Breaking both patterns at once, which is pretty unlikely, would bring you straight to B....

It's just.... Well, it's much more likely you're going to break the pattern of constant excitement than to break the pattern of decreasing happiness. One of them you just have to nudge something that's unchanging, the other one you actually have to reverse a definite trend....

NAT: Yes, Melissa, I see what you're thinking -- but that's just for you! You're not manic-depressive. A manic-depressive person would tend to have excitement stay high while happiness and sadness oscillate back and forth. A kind of periodic attractor, I guess you'd say. So for a manic-depressive person it would be ACA that occurred a lot. You'd have a lot of cycles ACACACA....

Well, I guess that's not quite clear either: you could have some manic-depressives who went something like ACDACDA.... Or maybe ACDCACDCA.... Getting really depressed, in a relaxed and subdued way, rather than hyped-up depressed. The manic-depressive girl I knew was definitely on the ACACACA... plan, though. With occasional dips into the D range, occasional occurrences of CDA or CDC.... I don't think she knew what B was, actually! Happy and relaxed -- it never occurred to her....

I don't know. The rules for what sequences are allowed are probably different for each type of person.... Or for each individual person, if you look at it finely enough....

MELISSA: Yeah, you're right of course. I guess if you charted these things you could tell what was going to happen with you -- when you were heading toward a bad situation of one kind or another. What the preconditions were. I mean, say a person was suicidal sometimes when they got sad and excited -- state C. Then if AC was a word in their grammar they would know to watch out when they were in A. And if ACDC were common while, say, ACAC wasn't, then they'd know to watch themselves especially carefully when they'd just come out of a sad-excited episode into a sad-relaxed episode ... as opposed to when they'd come out of sad-excited into happy-excited....

DR. Z: Yes, but you'd really need to chart yourself carefully to make that kind of prediction. You'd have to do like you said, write down your mood every half hour or something.

MELISSA: Oh, I don't know. Every day might do the trick. I've half a mind to try it.

NAT: Which half though -- that's the crucial point!...

What I'm thinking about is that these repeated patterns aren't the only kind of linguistic pattern.... They're just the start. You have to remember context-sensitive patterns. Like, A follows B, but only in contexts where A occurs between C and D. And...

MELISSA: OK, but the pattern you just said can be summarized as CABD being a rule while other things like CABB, AABD and so forth aren't. By talking about contexts, you're just giving a different way of talking about the same pattern....

NAT: Sure, OK. A rule with context is like a pattern among rules. A rule for making rules. You can say

CABB

AABD

CABB

BACC

...

and so on, a whole long list, or you can say something simple like: XABY is only a rule when X is C and Y is D.

DR. Z: Right. You can recognize rules, and then you can recognize rules in the list of rules, and so on....

There are all sorts of interesting things going on besides just repetitions.... In natural language you have something called "zeroing," where a word which is very very probable can be left out. Like in "The house I made," which should really be "The house that I made," only the word "that" is zeroed out because its probability in that context is so high that everyone knows it's supposed to go there. I always wonder if this kind of thing goes on in other complex systems, too.... It would seem to be a natural for any system concerned with having its states interpreted by other systems of reasonable intelligence....

Or, then you have transformation rules. The most obvious are movement rules like, whenever you have ABCDB, you can turn that into BCDAB. Meaning that any sequence involving ABCCD can just as wellbe a sequence involving BCACD. The weird thing you run up against in linguistics is, you have situations like

ADCBD

ACBCC

ADBDD

ABDCB

being rules -- so you'd think the rule would be A X _ _ X ...

NAT: You mean it seems like the general rule is, A, followed by some thing X, followed by two other things, followed by X again, should always be allowed, should always happen.

DR. Z: Right, but for some strange reason the rule

ABCDB

that fits the pattern isn't a rule, it's not even allowed, instead you have to move the A over, you have some cockamamie thing like

BCDAB

as a rule instead.

NAT: You're talking about situations like the rules for why, where and how.... We say "I know where the person was" instead of "I know the person was where" -- in spite of the pattern set up by sentence like

I know the person was here

I know the person was there

I know the person was in Topeka

I know the person was higher

...

MELISSA: Okay, this is getting way too complicated for me. I get the point -- there are all kinds of complex linguistic patterns that you can get from looking at sequences of symbols.... These patterns tell you how systems change, they let you make predictions....

DR. Z: And even when a system is chaotic, you can still make predictions about it....

NAT: I don't quite get that. Chaos is supposed to be unpredictability, right?

MELISSA: No, I get it. Chaos is just unpredictability in detail. But you can still make coarse-level predictions.

Think about my moods: anyone will tell you they're unpredictable. You can't tell what exact mood I'm going to be in three days from now, not with any degree of accuracy at all. But if you divide things into categories like I was doing, then you can make some rough predictions about what's going to happen. And guessing the language rules from the category sequence lets you make better and better predictions....

NAT: Geez, Melissa, I think you're in the wrong profession. You'd make a wicked mathematical psychologist....

MELISSA: Wicked mathematical psychologist, eh? Well, my last boyfriend told me I was wicked. One out of three's not bad, huh?

DR. Z: Okay, so that's how you predict a system over time using language. Every system that changes over time is hiding a number of languages in it. To get them out you just have to categorize. And as you point out, every person, every culture categorizes differently....

Systems that extend over space have languages too, actually: it doesn't have to be one-dimensional....

MELISSA: Whorf did some work with Mayan ceremonial pictures; he claimed they were a two-dimensional language.... That the meaning of each picture was modified by the other pictures that occurred above and below it.

DR. Z: I don't know that work, but it sounds very plausible.... Instead of patterns like ABC, you have patterns like

A

BDC

B

or something.... Repeated images. Sure, I can see it. The whole human visual system relies on recognizing patterns like this. It's a tricky business though; it requires way more computation than doing what my program does, just dealing with one-dimensional patterns of change over time....

MELISSA: I imagine you could even use grammatical rules to generate new kinds of art.

NAT: Hmmm.... From the language of chaos to the language of art, eh? Sounds cool, yeah. You could probably make freaky T-shirts that way....

DR. Z: Okay, so we can recognize these languages in systems. But the next step is to see that systems can recognize languages in each other. That's what's going on inside the brain! Each system of neurons is following a complex chaotic attractor, and the attractor has a certain geometric structure that comes out as a language. Then other systems of neurons recognize this language and base their actions on the predictions and discoveries it brings....

NAT: That's an interesting idea, Dr. Z. But what does it have to do with the design for an artificial brain that you were telling me about before? You didn't have language in there anywhere.

DR. Z: Well, we had these processes that were recognizing patterns in each other. Maybe some of them are recognizing patterns in each other's dynamical behavior, using this kind of language-recognition method we've been talking about.... I mean, the processes can't see inside each other, in general, right? What they can see is what the other processes are doing. So recognizing patterns in each other means recognizing patterns in each others' behavior. It is the same thing, I've just put it in a different language.

NAT: I guess you're right, yeah.... You're a sly devil, Dr. Z.

DR. Z: I like to think so....

NAT: Okay, right. And we can take this one step further. Suppose you want to make a machine that will act like a person -- say, like Melissa. A mechanical Melissa. All we have to do is to recognize the patterns in Melissa, the languages implicit in Melissa, and then build a system manifesting these same languages.

MELISSA: Yes, but you have to be careful. You don't want it to just spout back patterns I've expressed in the past. I'm creative! I keep on doing new things. You want to make a creative system that has in its repertoire the things I've done before. That system wouldn't really be me though, it would be its own thing, just sort of like me....

NAT: Right. This is getting too weird. I think we should call it quits for the night. This has been really good, though, Dr. Z. I've got some fairly concrete ideas out of this. I might even try to code up some of your ideas, just you wait....

DR. Z: Well, that would be great....

NAT: Oh wait, Dr. Z! I nearly forgot -- there's something we wanted to show you. Melissa....

MELISSA [taking the bug out of her purse]: Oh yes! That bug! Look at this! It's a metal bug we found on the pavement on the way over here.

DR. Z [taking it out of her hand]: That's not a bug, that's a nanospy. Look at that! It's been recording our conversation. Someone sent that thing to follow you; they knew you were coming to my house.... They're trying to get something they think I have.... But they're wrong: all I have are a bunch of theories, I haven't developed anything practical. The CIA is really desperate.

MELISSA: You really think it's the CIA?

DR. Z: I don't know. But I've had things like this happen before. That's why I've become so secretive. I can't talk about it, people would just think I was crazy -- but now you see....