Continuous Learning in Sparsely Connected Hopfield Nets

Experiments are reported in which sparsely connected Hopfield nets with capped weights perform continuous learning: they are fed an ongoing stream of training patterns, and "remember" the most recent few as attractors. Given a fixed number of neurons, the size of the sliding memory window increases as connectivity increases, so long as the weight cap is chosen appropriately.
Key Words: Hopfield net, associative memory, Webmind

1. Introduction

The Hopfield net is a simple, idealized model of attractor neural net dynamics. It is easily amenable to mathematical analysis, but is not in itself suitable for either detailed neural modeling or practical computational intelligence. At IntelliGenesis Corp., in building the Webmind AI system (Goertzel, 1996; http://intelligenesis.net), we have found it useful to implement a variant of the standard Hopfield net, which differs from the standard Hopfield net in being sparsely connected, and in being required to learn from a continual stream of inputs that rapidly grows obsolete. This article reports some experiments on Hopfield net dynamics that were done with a Webmind implementation in mind.

The results given here demonstrate quite clearly that, by introducing a cap on the absolute value of link weights, even a very sparsely connected Hopfield net can be made to function effectively as a continuous learning system. The catch, however, is that for low connectivity, the window of learning is very small, so that a large number of neurons are required to maintain a small, continuously shifting window of memories.

2. Hopfield Nets

This section gives a brief description of the Hopfield net, a simple attractor neural net model which has been described in detail in many places (originally Hopfield, 1985), and has been analyzed by many authors in great mathematical detail.

Suppose we have a network N of nodes (formal neurons) representing data elements, denoted n₁, n₂, ..., n_k. Each node contains weighted links to some other nodes, the weight from n_i to n_j being called w_i,j. Assume each node has an activation which may be positive or negative: +1 means active, -1 means inactive.

Next, suppose we have a collection C of "training patterns", defined as subsets of the set N of data elements, denoted c₁, c₂, ... , c_k. The Hopfield net for C over N is a collection of weights w_i,j which causes the elements of C to be attractors of the neural net N, under standard activation-spreading dynamics (with a threshold at each node n).

The training is done as follows. All weights start at zero. The training patterns c_j are cycled through one after the other. Each training pattern, in its turn, causes those nodes that it contains to be activated. The weights of the links w_i,j are adjusted by the standard Hopfield net version of Hebbian learning:

Once this has been done, the training patterns can be retrieved from the network as follows. Stimulate some nodes, and let activation spread. The network should settle into an attractor which corresponds to one of the training patterns that was used to train the network -- whichever one is "closer" to the initial state of the network, where the metric implicit in this closeness judgement may be derived from the equations for the network.

Note that steady convergence to an attractor is only guaranteed if the weights w_i,j are symmetric. If the weights are asymmetric, then periodic or chaotic behavior is quite possible -- which is not necessarily a problem, but must be taken into account when dealing with practical applications.

The standard Hopfield network is completely connected: every node n_i is potentially connected to every other node n_j. Empirically, however, it is found that deleting up to 80-85% of connections often does not significantly impair the performance of the network. Thus, it is quite possible to use these same methods on a network where each node n_i only contains links to some of the other nodes n_j, not all. This fact is quite important when one considers practical applications of Hopfield net ideas, to the brain or to large-scale computational intelligence systems. The brain contains over 100 billion neurons, each of which connects to roughly 1000 other neurons; even if one follows the neuronal group model (Edelman, 1988), which states that neurons are grouped into clusters of size 10,000-100,000 with intra-cluster links much outnumbering inter-cluster links, one still arrives at an intra-cluster connectivity ratio of .1 to .01.

The main limitation of Hopfield nets as associative memories is that, even in the fully connected case, the number of training patterns can be at most about 14% the number of nodes in the network. If the network is overloaded -- trained with more than the maximum acceptable number of attractors -- then it won't converge to clearly defined attractors, but will invent its own attractors being combinations of given concepts, and then will sink into a regime of homogenously stimulating a large number of different nodes for each input ("parasitic attractors").

3. Strategies for Continuous Learning in Hopfield Nets

But how do Hopfield nets deal with a changing environment -- in which old training patterns become irrelevant, while new training patterns become important? Several approaches are popular:

Based on the research literature, weight-capping seems to be the most promising approach to continuous learning using ANN's. However, the literature has focused on completely connected networks; with this in mind, we have carried out experiments involving continuous learning in weight-capped Hopfield nets with sparse connectivity. That is, we generate Hopfield nets containing only certain connections, and then, in the learning process, update the weights of the connections that are there -- these being the only relationships knowable by this particular net.

Investigation of reverse-learning in sparsely connected nets may be carried out later, but is considered a low priority since the performance of reverse learning even in fully connected nets is not particularly impressive.

4. Experimental Results

The following table of results indicates the number of memories continuously recalled (the sliding memory window size W) as a function of network size (number of nodes N), connectivity c, and weight cap C. This represents a small summary of results from a large body of empirical trials; more detailed and extensive data is available from the authors. The networks used were random networks, produced using the random number generator given in the Java class Math.Random, according to the given connectivity ratios.

W  N   c    C
-----------------------
2 1000 0.02 3.3
  1100 0.04 0.9
-----------------------
3 1000 0.04 2.7
  1100 0.06 1.5
-----------------------
4 1000 0.06 6.0
  1100 0.1 1.8
-----------------------
5 1000 0.06 11.1
  1100 0.14 2.1
-----------------------
6 1000 0.14 8.7
  1100 0.16 3.0
-----------------------
7 1000 0.16 8.4
  1100 0.18 3.0
-----------------------
8 1000 0.16 9.0
  1100 0.2 2.7
-----------------------
9 1000 0.16 9.0
  1100 no data
-----------------------

This data shows that even at low connectivity the phenomenon of continuous learning in a weight-capped Hopfield net holds true. Furthermore, the window size W is an increasing function of connectivity, for fixed network size.

Beyond this, one interesting question is: What is an exact formula for the optimal weight cap for a given connectivity and network size? Our experiments have been far too few to provide an answer to this question; the literature gives only order-of estimates which are no use in practice. This question is left for the future.

A different but related question is: What happens when the weight cap increases? Does overloading occur at some point when weight cap increases, with all else held constant? To determine this, we have to check the dynamics of the sliding window size W. If connectivity and weight cap are held constant, one may study W as a function of the number of iterations. The window size goes up and down as the learning goes on. In general,it seems, the smoothness of this function decreases with the size of the weight cap, and an extremely bumpy function leads to overloading.

There is a fascinating dynamics to the number of recalled memories of a sparse Hopfield net with weight caps, at which the current brief article can only hint. Sophisticated mathematical analysis, or much more extensive numerical investigation, will be necessary to fully unveil the laws underlying these phenomena. This will be left for a later paper.

>From a practical perspective, however, the main question has been answered by the present investigations: We have shown that Hopfield nets with sparse connectivity can display continuous learning, and we have obtained a good sense of the size of the sliding memory window as a function of sparsity, for appropriately chosen weight cap.

Continuous Learning in Sparsely Connected Hopfield Nets

1. Introduction

2. Hopfield Nets

3. Strategies for Continuous Learning in Hopfield Nets

4. Experimental Results

References