Back to The Evolving Mind Table of Contents

CHAPTER 4

NEURAL NETWORKS

We have discussed evolution in ecosystems, from a self-organizational rather than a strict Darwinist point of view. It is time to move on to our main topic of interest: the structure of the brain and the mind. In this chapter we shall begin our trek in this direction, with a crude yet tantalizing model of the brain: the formal neural network.

Contemporary brain science is extremely fragmented. Before one can even attempt to understand it, one must accept that every complex system is structured on many different levels, and that, to a certain extent, each level may be studied independently. Through neuroscience we have achieved a solid understanding of the lowest level of brain structure: individual nerve cells and the chemical reactions mediating their activity. And we have attained a fairly good understanding of the highest level of brain structure: global organization of the brain. For instance, we know that the neocortex is the seat of abstract thought, that the cerebellum directs motor control, etc. There are still a number of important mysteries here _ left/right polarity, for instance _ but these mysteries are being studied in a systematic and effective way.

On the other hand, neuroscientists have not made so much progress with the middle levels of structure. Their greatest successes in this regard have involved building up from the bottom level _ in this way they have arrived at some useful models of those parts of the brain which deal with low-level perception and motor control (Churchland, 1989; Edelman, 1987). However, the really crucial question still remains basically unaddressed: how do the microscopic interactions of chemicals and neurons combine to form abstract thoughts?

At present, therefore, if one wishes to construct a general intuitive picture of brain function, one's only recourse is to somehow combine the information supplied by neuroscience with indirect, non-neuroscientific data. My goal in this chapter, and the two which follow it, is exactly this. On the one side, I take Edelman's theory of Neural Darwinism, which states that perception and motor control work by natural selection on the level of neuronal groups. And on the other side, I take the theory of computational psychology outlined in The Structure of Intelligence (Goertzel, 1992). By combining these two theories with the theory ofevolution given in Chapter 3 and the theory of genetic algorithms (Holland, 1975), we shall obtain a coherent and moderately detailed model of the overall structure of the brain. The brain appears as a network of neural networks, which self-organizes itself according to the logic of evolution by natural selection, and creates new ideas by a multilevel process that incorporates a form of abstract "sexual reproduction."

The present chapter introduces formal neural networks, sketches the theory of Neural Darwinism, and proposes two hypotheses regarding brain function: first, that the brain is a multilevel network of neural networks; second, that this network of networks evolves by natural selection. These themes will be picked up in the Chapter 5, which deals with genetic optimization. Chapter 5 begins with the optimization of simple mathematical functions, and concludes with a general discussion of the evolution of neural networks by sexual reproduction.

Finally, Chapter 6 outlines the theory of computational psychology given in SI, and shows how this theory of the mind imposes certain restrictions on the nature of the brain. These restrictions will be shown to support the ideas of the present chapter: that the brain has a multilevel structure which evolves by natural selection and creates by an abstract form of sexual reproduction.

4.1 FORMAL NEURAL NETWORKS

Neurons are not the only brain cells; in fact, they are greatly outnumbered by glia. However, many neuroscientists (Edelman, 1987; Rose and Dobson, 1985) believe that the key to mental process lies in the large-scale behavior of networks of neurons. So let us begin our journey into the brain by reviewing the basic characteristics of the neuron.

A neuron consists of a cell body with a long, narrow axon emerging from one end, and a large number of branches called dendrites snaking out in all directions. The dendrites are inputs _ they receive electrical signals from other neurons. The cell body periodically generates _ "fires" _ a new electrical impulse based on these input signals. After it fires, it needs to "recover" for a while before it can fire again; and during this period of recovery it basically ignores its input.

The axon carries the electrical impulse from the cell body to the dendrites and cell bodies of other neurons. The points at which signals pass from one neuron to another are called synapses, and they come in two different forms _ excitatory and inhibitory. When an impulse arrives through an excitatory synapse, it encourages the receiving neuronto fire. When an impulse arrives through an inhibitory synapse, it discourages the receiving neuron from firing.

Each synapse has a certain conductance or "weight" which affects the intensity of the signals passing through it. For example, suppose excitatory synapse A has a larger "weight" than excitatory synapse B, and the same signal passes through both synapses. The signal will be more intense at the end of A than at the end of B.

Roughly speaking, a recovered neuron fires if, within the recent past, it has received enough excitatory input and not too much inhibitory input. The amount of the past which is relevant to the decision whether or not to fire is called the period of latent addition. How much excitation is "enough," and how much inhibition is "too much," depends upon the threshold of the neuron. If the threshold is minimal, the neuron will always fire when its recovery period is over. If the threshold is very high, the neuron will only fire when nearly all of its excitatory synapses and virtually none of its inhibitory synapses are active.

Mathematically, the rule that tells a neuron when to fire can be modeled roughly as follows. Let wi be the weight of the i-th synapse which inputs to the neuron, where positive weights denote excitatory connections and negative weights denote inhibitory connections. Let xi(t) denote the signal coming into the neuron through the i-th synapse, at time t, where the time variable is assumed to be discrete. Let P be the period of latent addition, and let R be the recovery period. Then the total relevant input to the neuron at time t is the sum, over all s so that

max{t-P,r+R} < s < t, of wi(s)xi(s), where r is the time at which the neuron last fired. Where T is the threshold of the neuron, a neuron fires at time t if its total relevant input exceeds T. When a neuron fires, its output is 1; when it does not fire, its output is 0.

A "neural network," then, is a network of interconnected neurons firing according to this rule. This is a greatly oversimplified model of the brain: it gives no role to other brain cells such as glia, and it completely ignores all the chemistry that mediates actual neural interaction. In the brain the passage of an electrical signal from one neuron to another is not exactly analogous to the passage of electricity across a wire. This is because most neurons that are "connected" do not actually touch. What usually happens when a signal passes from neuron A to neuron B is that the dendrites of neuron A build up a charge which causes certain chemicals called neuro-transmitters to carry that charge to the dendrites of neuron B. The neural network model ignores all the subtleties of this process.

So, to consider the brain as a neural network is anoversimplification. The "neural networks" which are now so popular in electrical engineering (Garrido,1990; Kawato et al., 1987; Goldberg et al., 1988; Hopfield and Tank, 1985) are usually simplified even further. It is generally assumed that the period of latent addition is 1 time step, and the recovery period is 0 time steps. This yields a (relatively) pleasant-looking system of equations for a network of n neurons:

xi(t) = fi(wi1x1(t%1) + ... + winxn(t%1)),

for i=1,...,n, where

Rough Equation
f_i~=~LEFT LBRACE STACK{ALIGNL 1,~x>T_i #0,~otherwise}RIGHT.

and wij denotes the conductance of the synapse through which the impulse from neuron j passes to neuron i.

If neuron j inputs to neuron through more than one synapse, then wij may simply be defined as the average of the conductances of these synapses. Note that the sum of inputs to the ith neuron ranges over all the neurons from 1 to n, even though not all neurons need input to the ith neuron. If neuron j does not input to neuron i, then we may set wij=0. For future reference, let us denote by Ii the set of all j for which neuron j inputs to neuron i. I will call the neurons of this simplified network one-step neurons.

Simplified "neural networks" composed of one-step neurons have proven themselves effective at a number of difficult practical problems _ combinatorial optimization, associative memory, (Hopfield and Tank, 1980), pattern recognition (Grossberg, 1987) and robotic control (Goldberg et al., 1988), to name a few. Mathematically, they are similar to the physicists' spin glasses (Garrido, 1990). Everyone realizes that these networks are mediocre brain models, but the connection with neuroscience is tantalizing nonetheless.

For example, the well known Hopfield network (Hopfield, 1980) uses first-order neurons to minimize functions. To explain the idea behind this network, let us define a state of a network of n neurons as a binary sequence a1a2...an. A state is periodic with period p if whenever, xi(t)=ai for i=1,...,n, xi(t+p)=ai for i=1,...,n. If a state is periodic with period 1, then it is an equilibrium.

Given a certain function f from binary sequences to real numbers, Hopfield's approach was to define a network whose equilibrium states arelocal minima of f, and which has no periodic points besides its equilibria. Then one may set the state of the network at time zero equal to any random binary sequence, and eventually the values xi(t) will settle into one of the equilibria. The problem with this approach is that if a function has many local minima, the corresponding network will have many equilibria. There are various ways of getting around this difficulty, but of course none of them are generally applicable. A little later we will consider a completely different approach to solving optimization problems with neural networks.

Input/Output

Before we can proceed any further, we must dispense with a few technicalities. So far, we have considered neurons which connect only to other neurons. However, biological neural networks obviously connect to extraneural inputs and outputs as well. We will require formal networks that mimic real neural networks in this respect.

Outputs are no trouble; they do not directly affect the behavior of the network. They affect it only indirectly, via feedback. We may define a neuron to be an output neuron if it is connected to external output. Suppose that the output neurons of a given neural network are indexed i(1),i(2),...,i(K). Then the "output" of a neural network at time t refers to the binary sequence x1x2...xK obtained by setting xk=1 at time t if and only if neuron i(k) is firing at time t.

To introduce external inputs into a one-step neural network, however, we must replace the dynamic equation given above with the following:

xi(t) = fi(wi1x1(t-1) +...+ winxn(t-1) + vi1E1(t-1) +...+ vimEm(t-1)),

where fi and the wij are as above, Ei is the ith external input, and vij is the conductance of the connection between Ej and neuron i. Obviously, a similar modification will work for multi-step networks. The input of a neural network at time t may be defined as the binary sequence E1E2...Em. If x is the input of the network N at time t, and y is the output of the network at time t+s, we will write y=N(x;s).

4.2 NEURAL GROUP SELECTION

In recent years, several inventive biologists have sought to bridge thelarge gap between formal neural networks and actual brains. In my opinion, the most impressive of these efforts is Edelman's (1987) theory of neuronal group selection, or "Neural Darwinism."

The starting point of Neural Darwinism is the observation that neuronal dynamics may be analyzed in terms of the behavior of neuronal groups. The strongest evidence in favor of this conjecture is physiological: many of the neurons of the neocortex are organized in clusters, each one containing say 10,000 to 50,000 neurons each.

Once one has committed oneself to looking at groups, the next step is to ask how these groups are organized. A map, in Edelman's terminology, is a connected set of groups with the property that when one of the inter-group connections in the map is active, others will often tend to be active as well. Maps are not fixed over the life of an organism. They may be formed and destroyed in a very simple way: the connection between two neuronal groups may be "strengthened" by increasing the weights of the neurons connecting the one group with the other, and "weakened" by decreasing the weights of the neurons connecting the two groups.

Formally, we may consider the set of neural groups as the vertices of a graph, and draw an edge between two vertices whenever a significant proportion of the neurons of the two corresponding groups directly interact. Then a map is a connected subgraph of this graph, and the maps A and B are connected if there is an edge between some element of A and some element of B. (If for "map" one reads "program," and for "neural group" one reads "subroutine," then we have a process dependency graph as drawn in theoretical computer science.)

This is the set-up, the context in which Edelman's theory works. The meat of the theory is the following hypothesis: the large-scale dynamics of the brain is dominated by the natural selection of maps. Those maps which are active when good results are obtained are strengthened, those maps which are active when bad results are obtained are weakened. And maps are continually mutated by the natural chaos of neural dynamics, thus providing new fodder for the selection process. By use of computer simulations, Edelman and his colleage Reeke have shown that formal neural networks obeying this rule can carry out fairly complicated acts of perception.

This thumbnail sketch, it must be emphasized, does not do justice to Edelman's ideas. In Neural Darwinism Edelman presents neuronal group selection as a collection of precise biological hypotheses, and presents evidence in favor of a number of these hypotheses. However, I consider that the basic concept of neuronal group selection is largelyindependent of the biological particularities in terms of which Edelman has phrased it. As will be emphasized below, I suspect that the mutation and selection of "transformations" or "maps" is a necessary component of the dynamics of any intelligent system.

4.3 MULTILEVEL NEURAL NETWORKS

What about the large-scale structure of the network of neural networks that is the brain? Edelman says next to nothing about this; but it is certainly far from unimportant. In this section, as a first step toward understanding this structure, we shall consider a "toy brain" machine composed of a network of disjoint neural networks, each one inputting to and outputting from certain other networks. We shall assume that these networks are connected more intimately than just via input/output _ that the internal weights of one network can be affected by the state of another network.

I know of no evidence supporting the hypothesis that networks in the brain (e.g. maps in the brain) can control each other in the sense of directly altering the conductances of one another's neurons. I have also been unable to locate any evidence explicitly contradicting this hypothesis. In any case it is clear that one network can alter the conductances of the neurons of another network indirectly, by passing charge through these neurons in an appropriate way (for instance, this is obviously the case if one assumes the "Hebb rule" that the conductance of a synapse is increased by use). Therefore, in principle, the type of control posited in the previous paragraph is biologically plausible, and indeed inevitable. But the extent of coordination with which this type of control occurs is almost entirely unknown.

We will say that neural network N reads neural network M if some of the inputs, outputs and/or internal weights of N are connected to some of the inputs of M. We will say that neural network N controls neural network M if N has the power to change, to some degree, some of the weights of the neurons in M. And finally, we will define the output of a neural network in our toy brain at a given time consists of: 1) the states of its output neurons, if it has any, and 2) the changes which it makes in the weights of other neural networks.

Let us suppose that the networks in our toy brain are stratified into levels. Level 1 networks connect to some inputs, have some output neurons, and control nothing. Level 2 networks connect to some inputs, and read some level 1 networks. In general, a level k network is definedas a network which: 1) reads only networks of level k%1 or lower, and 2) reads some network of level k-1, and controls some network of level k%1. Note that a network may read from networks which it does not control, and control networks from which it does not read. (In SI, two networks of this general form are considered: the "hierarchical analogy network" and the "nAMP.")

Each level 1 network acts by stimulus and response: it receives certain inputs and yields certain outputs. If it is a nonhierarchical or multistep network _ as is the case in real brains _ then its output at a given time reflects its input over a range of past times.

Each level 2 network, on the other hand, reads and controls level 1 networks. Let us assume that level 1 networks continually receive feedback as to their "success," and assume that the goal of the entire machine is to maximize the success of its level 1 networks. Then we may define the success of a level k network, k>1, in terms of the success of the networks which it controls. Then, the goal of a level 2 network is to perfect the level 1 networks which it controls. It does this by studying the success levels of the level 1 networks from which it reads, and determining correlations between the properties of the networks and their relative successes. Through this study, it determines how to set the weights of the network which it controls. (Much more about this sort of "study" is said in SI.)

Now, there are two obvious differences between our toy brain and the real brain. First of all, insofar as the brain has various component networks that serve as "functional units," these component networks are not distinct: they interpenetrate each other considerably. For example, according to Edelman's theory, the most significant component networks of the human brain are maps, and maps can overlap.

The second difference is that, in real brains, although there are relatively autonomous "component networks," these are not neatly arranged in an hierarchical structure. One can surely find situations of circular control: N controls M while M controls N. The hierarchy is there, but it is not the only structure: there are other, non-hierarchical, interactions among the networks. This impacts on the definition of "success." In an hierarchical structure, one can define success inductively; but in a nonhierarchical structure this is not possible.

However, despite these problems, I suggest that the basic structure of the brain is largely similar to the structure of this toy brain: the brain is a multilevel control structure. This point will be returned to in a psychological context, in Chapter 6.

4.4 THE BRAIN AS AN ECOSYSTEM OF NEURAL NETWORKS

Temporarily ignoring the hierarchy postulated in the previous section, let us now assume simply that the brain is composed of a large number of relatively autonomous, interconnected neural networks. In the theory of neuronal group selection, these autonomous networks are called maps.

Let us further assume that some of these autonomous networks will survive, and others will die, in the sense of ceasing to be autonomous networks. Then, we have the problem of predicting whether or not a given network is going to survive. If the brain is analogous to an ecosystem a la Chapter 3, then the survival of a given network should be predictable based on how well it "fits" with the networks with which it directly interacts. That is, where network N interacts with networks N1,...,Nk, the survival of N should be significantly correlated with Em(N,N1,...,Nk).

I suggest that the brain is analogous to an ecosystem in this sense. I have no biological argument to back up this contention. However, in Chapter 6 I will propose a psychological argument which supports it very strongly.

Note how different in focus the theory of the preceding paragraphs is from Edelman's theory of neuronal group selection. The two hypotheses are related, of course. But Edelman's theory is based on the analogy with strict Darwinism, where the big question is who is competing with whom, and what traits are being optimized. The present theory, on the other hand, is founded on a system-theoretic, self-organizational conception of evolution by natural selection.