Structure of Intelligence -- Copyright Springer-Verlag © 1993 Back to Structure of Intelligence Contents
Chapter 1:

# Mind and Computation

## 1.0 Rules

What does it mean to tell someone exactly what to do?

Sixty years ago no one could give this query a plausible response. Now, however, we have a generally accepted definition: a set of instructions is exact if some computer can follow them. We have a word, algorithm, which is intended to refer to a completely exact set of instructions. This is impressively elegant. But there's a catch -- this approach is meaningful only in the context of a theory explaining exactly what a computer is. And it turns out that this problem is not so straightforward as it might seem.

Note that one cannot say "a set of instructions is exact if every computer can follow them." Obviously, computers come in different sizes and forms. Some are very small, with little memory or processing power. Some, like the computer chips installed in certain televisions and cars, are dedicated to one or two specific purposes. If there were little or nothing in common between the various types of computers, computer science would not deserve the label "science." But it seems that many computers are so powerful that they can simulate any other computer. This is what makes theoretical computer science possible. Computers of this sort are called "universal computers," and were first discussed by Alan Turing.

What is now called the Turing machine is the simple device consisting of:

1) a processing unit which computes according to some formula of Boolean algebra

2) a very long tape divided into squares, each square of which is marked either zero or one

3) a tape head which can move, read from and write to the tape

For instance, the processing unit might contain instructions like:

If the tape reads D and -A+(B-C)(D+E)=(R-J), then move tape to the left, call what is read C, move the tape two to the right, and write (D-B)C on the tape.

The Boolean formula in the processing unit is the "program" of the Turing machine: it tells it what to do. Different programs lead to different behaviors.

Assuming that the tape head cannot move arbitrarily fast, it is clear that any specific program, running for a finite time, can only deal with a finite section of the two tapes. But theoretically, the tapes must be allowed to be as long as any program will require. Thus one often refers to an "infinitely long" tape, even though no particular program will ever require an infinitely long tape in any particular situation.

At first, Turing's colleagues were highly skeptical of his contention that this simple machine was capable of executing any exact sequence of instructions. But they were soon convinced that the behavior of any conceivable computer could be simulated by some Turing machine, and furthermore that any precise mathematical procedure could be carried out by some Turing machine. To remove all doubt, Turing proved that a certain type of Turing machine, now called a "universal Turing machine", was capable of simulating any other Turing machine. One merely had to feed the universal Turing machine a number encoding the properties of Turing machine X, and then it would act indistinguishably from Turing machine X.

PUT THE CUP ON THE TABLE

Most people who have studied the literature would concur: no one has been able to come up with a set of instructions which is obviously precise and yet cannot be programmed on a Turing machine. However, agreement is not quite universal. For instance, the philosopher Hubert Dreyfus (1978) has written extensively about the inability of existing computers to see, move around, or make practical decisions in the real world. From his point of view, it is revealing to observe that, say, no Turing machine can follow the instruction: put the cup on the table.

The problem is not, of course, that a Turing machine doesn't have any way to pick up a cup. One could easily connect a robot arm to a computer in such a way that the output of the computer determined the motions of the robot. This is the state of the art in Japanese factory design. And even if current technology were not up to the task, the fact that it could be done would be enough to vindicate Turing's claim.

But could it, actually, be done? What is really involved here? When I tell someone to "put the cup on the table," I am really telling them "figure out what I am talking about when I say 'the cup' and 'the table' and 'on', and then put the cup on the table." Even if we give a computer a robot eye, it is not easy to tell it how to locate a cup lying in the middle of a messy floor. And it is evenharder to tell a computer how to distinguish a cup from a bowl. In fact, it is hard to tell a person how to distinguish a cup from a bowl. This is a matter of culture and language. We simply learn it from experience.

One might take all this as proof that "put the cup on the table" is not actually a precise instruction. Or, on the other hand, one might maintain that a Turing machine, provided with the proper program, could indeed follow the instruction.     But there is an element of circular reasoning in the first alternative. "Put the cup on the table" is very precise to many people in many situations. To say that it is not precise because a Turing machine cannot understand it is to define precision in terms of the Turing machine, in contradiction to common sense.     And the second alternative presupposes a great deal of faith in the future of artificial intelligence. The hypothesis that the Turing machine can simulate any computer and execute any set of precise mathematical instructions is very well established. But the hypothesis that the Turing machine can execute any set of precise instructions is a little shakier, since it is not quite clear what "precision" is supposed to mean.

In sum: there is still plenty of room for philosophical debate about the meaning of the Turing machine. In the Introduction I mentioned Deutsch's result that according to quantum theory any finite physical system can be simulated by a quantum computer. Coupled with the fact that a quantum computer cannot compute any functions besides those which a Turing machine can compute, this would seem to provide a fairly strong argument in favor of Turing's hypothesis. But, of course, physics can never truly settle a philosophical question.

BRAIN AS TURING MACHINE

In a paper of legendary difficulty, McCulloch and Pitts (1943) attempted to demonstrate that the human brain is a universal Turing machine. Toward this end, they adopted a greatly oversimplified model of the brain, ignoring the intricacies of neurochemistry, perception, localization, and the like. The McCulloch-Pitts brain is a network of dots and lines, each dot standing for a neuron and each line standing for a connection between neurons. It changes in discrete jumps: time 0, then time 1, then time 2, and so on. Each neuron operates according to "threshold logic": when the amount of charge contained in it exceeds a certain threshold T, it sends all its charge out to the neurons it is connected to. What McCulloch and Pitts proved is that a universal Turing machine can be constructed using a neural network of this sort instead of a program.

Some neuroscientists have protested that this sort of "neural network" has nothing to do with the brain. However, this is simply not the case. It is clear that the network captures one of the most prominent structures of the brain. Precisely what role this structure plays in the brain's activity remains to be seen. But it is interesting to see how tremendously powerful this one structure is, all by itself.

As mentioned above, there have been numerous efforts to form biologically realistic neural network models. One approach which has been taken is to introduce random errors into various types of simulated neural networks. This idea has led to a valuable optimization technique called "simulated annealing" (Aarts et al 1987), to be considered below.

## 1.1 Stochastic and Quantum Computation

When noise is added to the McCullough-Pitts network, it is no longer a Turing machine. It is a stochastic computer -- a computer which involves chance as well as the precise following of instructions. The error-ridden neural network is merely one type of stochastic computer. Every real computer is a stochastic computer, in the sense that it is subject to random errors. In some situations, randomness is a nuisance; one hopes it will not interfere too much with computation. But in other situations, chance may be an essential part of computation. Many Turing machine algorithms, such as Monte Carlo methods in numerical analysis, use various mathematical ruses to simulate stochasticity.

As I will argue later, one may view randomness in the neural network as a blessing in disguise. After all, one might well wonder: if the brain is a computer, then where do new ideas come from? A deterministic function only rearranges its input. Is it not possible that innovation involves an element of chance?

One may define a stochastic Turing machine as a computer identical to a Turing machine except that its program may contain references to chance. Forinstance, its processing unit might contain commands like:

If the tape reads D and -A+(B-C)(D+E)=(R-J), then move tape to the         left with probability 50% and move it to the right with probability 50%,         call what is read C, move the tape two to the right, write (D-B)Con the         tape with probability 25% and write C on the tape with probability 75%.

One may construct a theory of stochastic Turing machines parallel to the ordinary theory of computation. We have seen that a universal Turing machine can follow any precise set of instructions, at least in the sense that it can simulate any other computer. Similarly, it can be shown that there is a universal stochastic Turing machine which can simulate any precise set of instructions involving chance operations.

QUANTUM COMPUTATION

If the universe were fundamentally deterministic, the theory of stochastic computation would be superfluous, because there could never really be a stochastic computer, and any apparent randomness we perceived would be a consequence of deterministic dynamics. But it seems that the universe is not in fact deterministic. Quantum physics tells us that chance plays a major role in the evolution of the physical world. This leads us to the question: what kind of computer can simulate any physical system? What kind of computer can follow any precise set of physical instructions?

It turns out that neither a Turing machine nor a stochastic Turing machine has this property. This puts the theory of computation in a very uncomfortable situation. After all, the human brain is a physical system, and if computers cannot simulate any physical system, there is no reason to simply assume that they can simulate the human brain. Perhaps they can, but there is no reason to believe it. Clearly it would be desirable to design a computer which could simulate an arbitrary physical system. Then we would have a much better claim to be talking about computation in general.

As mentioned above, D. Deutsch (1985) has taken a large step toward providing such a computer. He has described the quantum Turing machine, which according to the laws of quantum physics can simulate the behavior of any finite physical system within an arbitrarily small degree of error. It can simulate any Turing machine, and any stochastic Turing machine, with perfect accuracy. Of course, the rules of quantum physics may be revised any day now; there are a number of pressing problems. But Deutsch's idea is a major advance.

There is much more to be said on the topic of quantum computation. But for now, let us merely observe that the question "what is a computer?" is hardly resolved. It may never be. Various abstract models may shed light on differentissues, but they are never final answers. In the last analysis, "precise instructions" is just as elusive a concept as "intelligence" or "mind."

## 1.2 Computational Complexity

Computational complexity theory, also called algorithmic complexity theory, seeks to answer two different kinds of questions: "How hard is this problem?", and "How effective is this algorithm at solving this problem?". A number of difficult issues are involved here, and it is not possible to delve into them deeply without sophisticated mathematics. Here we shall only scratch the surface.

Questions of computational complexity are only meaningful in the context of a general theory of computation. Otherwise one can only ask "How hard is this problem for this computer?", or "How hard is this problem for this particular person?". What lets us ask "How hard is this problem?", without any reference to who is actually solving the problem, is a theory which tells us that problems are basically just as hard for one computer as for another. Here as in so many other cases, it is theory which tells us what questions to ask.

According to the theory of Turing machines, any sufficiently powerful computer can simulate any other computer. And this is not merely a theoretical illusion. In practice, computers such as PCs, mainframes and supercomputers are highly flexible. An IBM PC could be programmed to act just like a MacIntosh; in fact, there are software packages which do something very close to this. Similarly, a MacIntosh could be programmed to act just like an IBM. Turing proved that there is a program which tells a computer, given appropriate information, how to simulate any other computer. Therefore, any computer which is powerful enough to run this program can act as a universal Turing machine. If it is equipped with enough memory capacity -- e.g. enough disk drives -- it can impersonate any computer whatsoever.

True, this universal simulation program is very complex. But if a problem is sufficiently difficult enough, this doesn't matter. Consider the problem of sorting a list of numbers into increasing order. Suppose computer A is capable of solving this problem very fast. Then computer B, if it is sufficiently powerful, can solve the problem by simulating computer A. If the problem is sorting the list {2,1,3}, then this would be a tremendous effort, because simulating A is vastly more difficult than sorting the list {2,1,3}. But if the list in question is a billion numbers long, then it's a different story. The point is that lists of numbers can get as long as you like, but the complexity of simulating another computer remains the same.

Let us make this example more precise. Assume that both A and B have an unlimited supply of disk drives -- an infinite memory tape -- at their disposal. Suppose that the program for simulating computer A is so slow that it takes computer B 10 time steps to simulate one of computer A's time steps. Supposealso that computer A is capable of sorting a list of n numbers in n2 time steps. That is, it can sort 10 numbers in 100 time steps, 100 numbers in 10000 time steps, and so on. Assume that computer B is not quite so bright, and it has a sorting program built into its hardware which takes n3 time steps to sort a list of n numbers.

Then, if B were given a list of 3 numbers, its hardware could sort it in 33=27 time steps. If it tried to sort it by simulating A, it would take 10(32)=90 time steps. Clearly, it should rely on its built-in hardware. But if B were given a list of 10 numbers, it would take 103=1000 steps to sort it. If it tried to sort the list by simulating A, it would take 10(102) time steps -- exactly the same amount of time. And if B were given a list of 1000 numbers, it would take 10003=1,000,000,000 steps to sort it using its hardware, and only 10(10002) =10,000,000 steps to sort it by simulating A. The longer the list is, the more useful is the capacity for simulation, and the less useful is the built-in hardware.

The point is that as the size of the problem, n, gets bigger and bigger, the differences between computers become irrelevant. It is worth being a little more rigorous about this. Take any type of problem, and assign to each instance of it a "size" n. For example, if the problem is sorting lists of numbers, then each instance is a list of numbers, and its size is its length. Let A(n) denote the longest amount of time which computer A requires to solve any problem instance of size n. Let B(n) denote the longest amount of time which computer B requires to solve any problem instance of size n. Assume that the time required to solve an instance of the problem increases as n increases (just as the time required to sort a list of n numbers increases as n increases). Then it follows that the bigger n gets, the less significant is the difference between A(n) and B(n). Mathematically, we say that as n goes to infinity, the ratio A(n)/B(n) goes to 1.

All this follows from the assumption that any sufficiently powerful computer can simulate any other one, by running a certain "universal Turing machine" program of large but fixed size.

AVERAGE-CASE ANALYSIS

Note that the quantity A(n) is defined in terms of "worst-case" computation. It is the longest that computer A takes to solve any problem instance of size n. Any computer worth its salt can sort the list {1,2,3,4,5,6,7,8,9,10} faster than the list {5,7,6,4,10,3,8,9,2,1}. But A(n) ignores the easy cases. Out of all the possible instances, it only asks: how hard is the hardest?

For some applications, this is a useful way to look at computation. But not always. To see why, consider the following well-known problem. A salesman, driving a jeep, must visit a number of cities in the desert. There are no mountains, rivers or other obstructions in the region. He wants to know what is the shortest route that goes through all the different cities. This is known as the Traveling Salesman Problem. Each specific instance of the problem isparticular collection of cities or, mathematically speaking, a set of points in the plane. The size of an instance of the problem, n, is simply the number of cities involved.

How hard is this problem? When the data is presented pictorially, human beings can solve it pretty well. However, we must remember that even if Maria is exceptionally good at solving the problem, what Maria(n) measures is the longest it takes Maria to arrive at the correct solution for any collection of n cities. No human being does well according to this strict criterion. We do not always see the absolute shortest path between the n cities; we often identify a route which is close to correct, but not quite there. And we sometimes miss the mark entirely. So we are not very good at solving the Traveling Salesman Problem, in the sense that there are instances of the problem for which we get the answer wrong or take a long time to get to the answer. But we are good at it in the sense that most of the time we get reasonably close to the right answer, pretty fast. There are two different notions of proficiency involved here.

The simplest way to solve the Traveling Salesman problem is to list all the possible paths between the cities, then compare all the lengths to see which one is the shortest. The problem is that there are just too many paths. For instance, if there are 5 cities, then there are [4x3x2]/2 = 12 paths. If there are 10 cities, then there are [9x8x7x6x5x4x3x2]/2 = 181440 paths. If there are, say, 80 cities, then there are more paths than there are electrons in the universe. Using this method, the number of steps required to solve the Traveling Salesman problem increases very fast as the size of the problem increases. So, given a large Traveling Salesman problem, it might be better to apply erratic human intuition than to use a computer to investigate every possible path.

Let's consider a simple analogy. Suppose you run a bank, and you have three loan officers working for you. Officer A is very methodic and meticulous. He investigates every case with the precision of a master detective, and he never makes a mistake. He never loans anyone more than they can afford. Everyone he approves pays back their loans, and everyone he turns down for a loan would not have paid it back anyway. The only problem is that he often takes a long time to determine his answer. Officer B, on the other hand, works entirely by intuition. He simply looks a person over, talks to them about golf or music or the weather, and then makes his decision on the spot. He rejects a some people who deserve loans, and he gives some people more or less money than they can afford to pay back. He gives loans to a few questionable characters who have neither the ability nor the inclination to pay the bank back.

Suppose that, although you really need both, you have been ordered to cut back expenses by firing one of your loan officers. Which one should go? At first you might think "Officer B, of course." But what if you have a lot ofmoney to lend, and a great many people demanding loans? Then A might be a poor choice -- after all, B will serve a lot more customers each month. Even though there are some cases where A is much better than B, and there are many cases where A is a little better than B, the time factor may tip the balance in B's favor.

You may be thinking "Well, a real bank executive would find someone who's both fast and accurate." In the case of the Traveling Salesman problem, however, no one has yet found an algorithm which finds the exact shortest path every time much faster than the simple method given above. And it seems likely that no such algorithm will ever be discovered. The Traveling Salesman problem and hundreds of other important problems have been shown to be "NP-complete", which means essentially that if there is a reasonably fast algorithm for solving any one of them, then there is a reasonably fast algorithm for solving all of them. Many mathematicians believe that the question of whether such algorithms exist is undecidable in the sense of Godel's Incompleteness Theorem: that there's no way to prove that they do, and there's no way to prove that they don't.

Now, we have discovered algorithms which solve the Traveling Salesman problem faster than people, and on the average come up with better answers (Peters, 1985). But there are still some collections of cities for which they give the wrong answer, or take a ridiculously long time to solve. In the case of the Traveling Salesman problem, it seems that there is no point in looking for algorithms which solve the problem exactly, every time. All the algorithms which do that are just too slow. Rather, it seems to be more intelligent to look for algorithms that solve the problem pretty well a lot of the time.

It turns out that most of the mathematical problems involved in thought and perception are a lot like the Traveling Salesman problem. They are "NP-complete". So when, in later chapters, we discuss the algorithms of thought, we shall virtually never be discussing algorithms that solve problems perfectly. The relevant concept is rather the PAC algorithm -- the algorithm which is Probably Approximately Correct.

PARALLELISM

One interesting aspect of the McCullough-Pitts neural network is the way it does many things at once. At every time step, all the neurons act. The original formulation of the Turing machine was not like that; it only did one thing at a time. It moved the tapes, then looked in its memory to see what to do next. Of course, the McCullough-Pitts network and the original Turing machine are fundamentally equivalent; anything one can do, so can the other. But the McCullough-Pitts network will, in most cases, get things done faster.

The computers in popular use today are like the original Turing machine: they only do one thing at a time. This is true of everything from PCs to huge mainframe computers -- Cybers, VAXs and so forth. They are serialcomputers. Some supercomputers and special-purpose research computers, however, can work in parallel: they can do up to hundreds of thousands of things at once. The advantage of parallelism is obvious: speed. By using a parallel computer, one trades off space for time.

There are many different kinds of parallel computers. Some are so-called single-instruction machines. They can do many things at once, as long as these things are all the same. For instance, a typical single-instruction machine could multiply fifty numbers by four all at the same time. But it might not be able to multiply one number by four at the same time as it added six to another number.      Multiple-instruction machines are more interesting, but also more difficult to build and to program. A multiple-instruction parallel computer is like a bunch of serial computers connected to each other. Each one can execute a different program, and communicate the results of its computation to certain others. In a way, it is like a society of serial computers. Thinking Machines Corporation, in Cambridge, Massachusetts, has manufactured a number of powerful multiple-instruction parallel computers called Connection Machines. They are now being used in science and industry -- for, among other things, modeling the behavior of fluids, analyzing visual data, and generating computer graphics.

Why is all this relevant? Some may dispute the neurophysiological relevance of the McCullough-Pitts model and its contemporary descendants. But everyone agrees that, if the brain is a computer, it must be a parallel computer. The brain contains about 100 billion neurons, all operating at once, and besides that it is continually swirling with chemical activity. The diversity of its activity leaves little doubt that, if it is indeed a computer, it is a multiple-instruction parallel computer. This is the intuition behind the recent spurt of research in parallel distributed processing.

In Chapter 11 I will take this one step further and argue that the brain should be modeled as a multiple-instruction parallel quantum computer. By then, it will be clear just how different such a computer is from today's serial computers. We are talking about a computer which does billions of different things at once and incorporates a huge amount of chance into its operations. As we shall see later, it is a computer whose state is not completely measurable by any sequence of physical observations. It is a computer which, in a physically precise sense, plays a significant role in the continual creation of the universe. It could be argued that a computer with all these properties should not be called a "computer". But, mathematical theories aside, the intuitive concept of computation has always been somewhat fuzzy. As warned in the Introduction, the limitations of present-day computers should not be taken as fundamental restrictions on the nature of computation.

## 1.3 Network, Program or Network of Programs?

Throughout history, philosophers, scientists and inventors have arguedprofusely both for and against the possibility of thinking machines. Many have also made suggestions as to what sort of general strategy one might use to actually build such a machine. Only during the last half-century, however, has it become technically possible to seriously attempt the construction of thinking machines. During this period, there have emerged two sharply divergent approaches to the problem of artificial intelligence, which may be roughly described as the "neural network approach" and the "programming approach". Cognitive science has played an important role in the development of the latter, for obvious reasons: cognitive science analyzes mental processes in terms of simple procedures, and simple procedures are easily programmable.

What I roughly label the "neural network approach" involves, more precisely, the conception, construction and study of electric circuits imitating certain aspects of the electrical structure of the brain, and the attempt to teach these circuits to display behavior similar to that of real brains. In the late 1940s and the 1950s, no other approach to AI was so actively pursued. Throughout the 1960s, it became increasingly apparent that the practical success of the neural network approach was by no means imminent -- fairly large neural networks were constructed, and though the results were sometimes interesting, nothing even vaguely resembling a mind evolved. The rapid advent of the general-purpose digital computer, among other factors, led researchers in other directions. Over the past decade, however, there has been a tremendous resurgence of interest in neural networks.

The fundamental tenet of the neural network approach is that certain large, densely interconnected networks of extremely simple but highly nonlinear elements can be trained to demonstrate many or all of the various activities commonly referred to as intelligence. The inspiration for this philosophy was a trend in neuroscience toward the modelling of the brain as a network of neurons. The dynamics of the individual neuron was understood by Hodgkin and Huxley in 1955, although recent investigations have led to certain modifications of their analysis. Unable to mimic the incredible complexity of chemical interaction which underlies and subtly alters the operation of a biological network of neurons, and possessing few ideas as to what restrictions might be placed on the elements or structure of a network in order to encourage it to evolve intelligence, early researchers simply constructed model networks of simulated neurons and tried to teach them.

Each of the neurons of such a network is connected to a small set of other neurons in such a way that it can input charge to them. The charge which it sends to them at a given time is a function of the amount of charge which it contains as well as, possibly, other factors. Usually the function involved is a threshold function or a continuous approximation thereof. Some researchers actually built networks of simulated neurons; others merely simulated entire networks on general-purpose computers, sometimes including nontrivial physical aspects of the neural network (such as imperfect conductance of connections, and noise).

The first problem faced by neural network researchers was the fact that asimple network of neurons contains no obvious learning device. Some thought that the ability to learn would spontaneously evolve; most, however, implemented within their networks some rule for adapting the connections between neurons. The classical example is the Hebb rule (Hebb, 1949): when a connection is used, its resistance is decreased (i.e. more of the charge which is issued into it actually comes out the other end; less is lost in transit). This may be interpreted in many different ways, but it is clearly intended to serve as a primitive form of analogy; it says "this connection has been used before, so let us make it easier to use it again." Whether the brain works this way we are not yet certain. Various modifications to the Hebb rule have been proposed, mostly by researchers thinking of practical algorithmic development rather than biology (Rumelhart and McClelland, 1986)

Neither the failures nor the successes of this approach have been decisive. Various networks have been successfully trained to recognize simple patterns in character sequences or in visual data, to approximate the solutions of certain mathematical problems, and to execute a number of important practical engineering tasks. On the theoretical side, Stephen Grossberg (1987) and others have proven general theorems about the behavior of neural networks operating under a wide class of dynamics. And in various particular cases (Hopfield, 1985), it has been proved in what sense certain neural networks will converge to approximate solutions to certain problems. But it must nonetheless be said that there exists no empirical or theoretical reason to believe that neural networks similar to those hitherto designed or studied could ever be trained to possess intelligence. There is no doubt that researchers into the neural network approach have demonstrated that disordered circuits can be trained to demonstrate various types of adaptive behavior. However, it is a long way from adaptation to true intelligence.

It is clear that the "neural networks" hitherto produced involve such drastic oversimplifications of brain structure that they must be considered parallel processors of a fundamentally different nature. In fact, most contemporary practitioners of the neural network approach are quite aware of this and continue their labors regardless. Such research is important both practically and theoretically. But it is connected only indirectly with the study of the brain or the design of thinking machines. For this reason many neural network researchers prefer the term "parallel distributed processing" to "neural networks."

By the 1970s, the neural network approach had been almost entirely supplanted by what I shall call the programming approach: the conception, study and implementation on general-purpose computers of various "artificial intelligence" algorithms. Most such algorithms consist of clever tricks for approximating the solutions of certain mathematical problems (usually optimization problems) thought to reflect important aspects of human mental process. A few approach closer to the real world by applying similar tricks to the execution of simple tasks in computer-simulated or carefully controlled environments called "microworlds". For example, a famous program treats theproblem of piling polyhedral blocks on a flat floor.

In the early days of the programming approach, AI programmers were routinely predicting that a truly intelligent computer program would be available in ten years (Dreyfus, 1978). Their optimism is quite understandable: after all, it took computers only a couple of decades to progress from arithmetic to expert chess, competent vision processing, and rudimentary theorem proving. By the late 1980s, the programming approach had succeeded in creating algorithms for the practical solution of many difficult and/or important problems -- for instance, medical diagnosis and chess. However, no one had yet written an AI program applicable to two widely divergent situations, let alone to the entire range of situations to which human intelligence is applicable. Enthusiasm for AI programming declined.

Nearly all contemporary researchers have accepted this and are aware that there is no reason to believe true intelligence will ever be programmed by methods remotely resembling those currently popular. The modern practice of "artificial intelligence", has little to do with the design or construction of truly intelligent artifices -- the increasingly popular term "expert systems" is far more descriptive, since the programs being created are never good at more than one thing. Feeling that the programming approach is reaching an ill-defined dead-end, many researchers have begun to look for something new. Some have seized on parallel processing as a promising possibility; partly as a result of this, the neural network approach has been rediscovered and explored far more thoroughly than it was in the early days. Some of those who found "neural networks" absurd are now entranced with "parallel distributed processing", which is essentially the same thing.

The programming approach is vulnerable to a critique which runs parallel to the standard critique of the neural network approach, on the level of mind instead of brain. The neural network approach grew out of a model of the brain as a chaotically connected network of neurons; the programming approach, on the other hand, grew out of a model of the mind as an ingenious algorithm. One oversimplifies the brain by portraying it as unrealistically unstructured, as implausibly dependent on self-organization and complexity, with little or no intrinsic order. The other oversimplifies the mind by portraying it as unrealistically orderly, as implausibly dependent upon logical reasoning, with little or no chaotic, deeply trial-and-error-based self-organization.

As you have probably guessed, I suspect that the brain is more than a randomly connected network of neurons, and that the mind is more than an assemblage of clever algorithm. I suggest that both the brain and the mind are networks of programs. Networks of automata.

This attitude is not exactly a negation of the neural network or programming approaches to AI. Certainly the primary aspect of structure of the brain is the neural network; and certainly the mind is proceeding according to some set of rules, some algorithm. But these assertions are insufficiently precise; they also describe many other structures besides minds and the organs which give rise to them. To deal with either the brain or the mind, additional hypotheses arerequired. And I suspect that neither the neural network nor the programming approach is up to the task of formulating the appropriate hypotheses.