Back to Ben Goertzel's Research Papers and Essays

Continuous Learning in Sparsely Connected Hopfield Nets

Ben Goertzel and Julian Troianov
IntelliGenesis Corp.


Abstract

Experiments are reported in which sparsely connected Hopfield nets with capped weights perform continuous learning: they are fed an ongoing stream of training patterns, and "remember" the most recent few as attractors. Given a fixed number of neurons, the size of the sliding memory window increases as connectivity increases, so long as the weight cap is chosen appropriately.
Key Words: Hopfield net, associative memory, Webmind

1. Introduction

The Hopfield net is a simple, idealized model of attractor neural net dynamics. It is easily amenable to mathematical analysis, but is not in itself suitable for either detailed neural modeling or practical computational intelligence. At IntelliGenesis Corp., in building the Webmind AI system (Goertzel, 1996; http://intelligenesis.net), we have found it useful to implement a variant of the standard Hopfield net, which differs from the standard Hopfield net in being sparsely connected, and in being required to learn from a continual stream of inputs that rapidly grows obsolete. This article reports some experiments on Hopfield net dynamics that were done with a Webmind implementation in mind.

The results given here demonstrate quite clearly that, by introducing a cap on the absolute value of link weights, even a very sparsely connected Hopfield net can be made to function effectively as a continuous learning system. The catch, however, is that for low connectivity, the window of learning is very small, so that a large number of neurons are required to maintain a small, continuously shifting window of memories.

2. Hopfield Nets

This section gives a brief description of the Hopfield net, a simple attractor neural net model which has been described in detail in many places (originally Hopfield, 1985), and has been analyzed by many authors in great mathematical detail.

Suppose we have a network N of nodes (formal neurons) representing data elements, denoted n1, n2, ..., nk. Each node contains weighted links to some other nodes, the weight from ni to nj being called wi,j. Assume each node has an activation which may be positive or negative: +1 means active, -1 means inactive.

Next, suppose we have a collection C of "training patterns", defined as subsets of the set N of data elements, denoted c1, c2, ... , ck. The Hopfield net for C over N is a collection of weights wi,j which causes the elements of C to be attractors of the neural net N, under standard activation-spreading dynamics (with a threshold at each node n).

The training is done as follows. All weights start at zero. The training patterns cj are cycled through one after the other. Each training pattern, in its turn, causes those nodes that it contains to be activated. The weights of the links wi,j are adjusted by the standard Hopfield net version of Hebbian learning:

wi,j' = wi,j + r ninj

where r is the learning rate. When this has been done for all the categories cj the network has been trained.

Once this has been done, the training patterns can be retrieved from the network as follows. Stimulate some nodes, and let activation spread. The network should settle into an attractor which corresponds to one of the training patterns that was used to train the network -- whichever one is "closer" to the initial state of the network, where the metric implicit in this closeness judgement may be derived from the equations for the network.

Note that steady convergence to an attractor is only guaranteed if the weights wi,j are symmetric. If the weights are asymmetric, then periodic or chaotic behavior is quite possible -- which is not necessarily a problem, but must be taken into account when dealing with practical applications.

The standard Hopfield network is completely connected: every node ni is potentially connected to every other node nj. Empirically, however, it is found that deleting up to 80-85% of connections often does not significantly impair the performance of the network. Thus, it is quite possible to use these same methods on a network where each node ni only contains links to some of the other nodes nj, not all. This fact is quite important when one considers practical applications of Hopfield net ideas, to the brain or to large-scale computational intelligence systems. The brain contains over 100 billion neurons, each of which connects to roughly 1000 other neurons; even if one follows the neuronal group model (Edelman, 1988), which states that neurons are grouped into clusters of size 10,000-100,000 with intra-cluster links much outnumbering inter-cluster links, one still arrives at an intra-cluster connectivity ratio of .1 to .01.

The main limitation of Hopfield nets as associative memories is that, even in the fully connected case, the number of training patterns can be at most about 14% the number of nodes in the network. If the network is overloaded -- trained with more than the maximum acceptable number of attractors -- then it won't converge to clearly defined attractors, but will invent its own attractors being combinations of given concepts, and then will sink into a regime of homogenously stimulating a large number of different nodes for each input ("parasitic attractors").

3. Strategies for Continuous Learning in Hopfield Nets

But how do Hopfield nets deal with a changing environment -- in which old training patterns become irrelevant, while new training patterns become important? Several approaches are popular:

  1. simply flush the links periodically (set the weights to zero), and retrain them. This is not a viable brain model, and it is extremely inefficient as a strategy for machine learning.

  2. unlearning: allow the network to periodically "unlearn". Unlearning works as follows: the network is stimulated randomly until it settles into an attractor. Then, the network is "anti-trained" on this attractor, via the rule

    wi,j' = wi,j- u ninj

    where u is the unlearning rate. The effects of this on the capacity of the network are unclear; there are conflicting reports in the literature. According to Christos (1994), this reduces memory capacity to .01N from .14N and works erratically. This approach is of philosophical interest, in that anti-learning is thought by some to be connected with REM sleep in humans.

  3. weight-capping: Simply bounding the link weights above and below makes the network give precedence to recent memories and forget old ones, but also reduces memory capacity significantly. Theoretically, the weight cap should be

    -A <= wi,j<=A,
    
     with A<=O(1/sqrt(N)).

    In the literature, this approach is called a "palimpsest scheme" (Parisi, 1986; Toulouse et al, 1986; Dehaene, 1986). The idea is that the weights are changed according to the usual learning rule, so long as the bound is not exceeded. If the weights hit a bound they stay there until a pattern comes along that moves it away from the bound, i.e. of the opposite sign. These model hold about 0.05N memories, compared to the unconstrained model that holds 0.14N memories, but overloading is avoided and memory is given a natural retrograde structure.

Based on the research literature, weight-capping seems to be the most promising approach to continuous learning using ANN's. However, the literature has focused on completely connected networks; with this in mind, we have carried out experiments involving continuous learning in weight-capped Hopfield nets with sparse connectivity. That is, we generate Hopfield nets containing only certain connections, and then, in the learning process, update the weights of the connections that are there -- these being the only relationships knowable by this particular net.

Investigation of reverse-learning in sparsely connected nets may be carried out later, but is considered a low priority since the performance of reverse learning even in fully connected nets is not particularly impressive.

4. Experimental Results

The following table of results indicates the number of memories continuously recalled (the sliding memory window size W) as a function of network size (number of nodes N), connectivity c, and weight cap C. This represents a small summary of results from a large body of empirical trials; more detailed and extensive data is available from the authors. The networks used were random networks, produced using the random number generator given in the Java class Math.Random, according to the given connectivity ratios.

W  N   c    C
-----------------------
2 1000 0.02 3.3
  1100 0.04 0.9
-----------------------
3 1000 0.04 2.7
  1100 0.06 1.5
-----------------------
4 1000 0.06 6.0
  1100 0.1 1.8
-----------------------
5 1000 0.06 11.1
  1100 0.14 2.1
-----------------------
6 1000 0.14 8.7
  1100 0.16 3.0
-----------------------
7 1000 0.16 8.4
  1100 0.18 3.0
-----------------------
8 1000 0.16 9.0
  1100 0.2 2.7
-----------------------
9 1000 0.16 9.0
  1100 no data
-----------------------

This data shows that even at low connectivity the phenomenon of continuous learning in a weight-capped Hopfield net holds true. Furthermore, the window size W is an increasing function of connectivity, for fixed network size.

Beyond this, one interesting question is: What is an exact formula for the optimal weight cap for a given connectivity and network size? Our experiments have been far too few to provide an answer to this question; the literature gives only order-of estimates which are no use in practice. This question is left for the future.

A different but related question is: What happens when the weight cap increases? Does overloading occur at some point when weight cap increases, with all else held constant? To determine this, we have to check the dynamics of the sliding window size W. If connectivity and weight cap are held constant, one may study W as a function of the number of iterations. The window size goes up and down as the learning goes on. In general,it seems, the smoothness of this function decreases with the size of the weight cap, and an extremely bumpy function leads to overloading.

There is a fascinating dynamics to the number of recalled memories of a sparse Hopfield net with weight caps, at which the current brief article can only hint. Sophisticated mathematical analysis, or much more extensive numerical investigation, will be necessary to fully unveil the laws underlying these phenomena. This will be left for a later paper.

>From a practical perspective, however, the main question has been answered by the present investigations: We have shown that Hopfield nets with sparse connectivity can display continuous learning, and we have obtained a good sense of the size of the sliding memory window as a function of sparsity, for appropriately chosen weight cap.

References

  1. Christos, G. (1994). Reverse Learning

  2. Edelman, G. (1987). Neural Darwinism. New York: Basic Books

  3. Goertzel, B. (1996). "Subself dynamics in human and machine intelligence," CCC-AI

  4. Hopfield, J. (1985). Collective Computation with Networks of 'Neurons'.

  5. Nadel, J., P. Toulouse, J. Changeux, S. Dehaene (1986). "Networks of formal neurons and memory palimpsests", Europhysics Letters 1, 535-542

  6. Parisi, G. (1986). "A memory that forgets" J. Phys. A 19, L617-L620 (1986).