16.2 So Connection Specifications

The basic connection type used in all the algorithms, SoCon has a delta-weight variable dwt and a previous-delta-weight variable pdw. dwt is incremented by the current weight change computations, and then cleared when the weights are updated. pdw should be used for viewing, since dwt of often zero. While it has not been implemented in the standard distribution, pdw could be used for momentum-based updating (see section 14.1.2 Bp Connection Specifications).

The basic SoConSpec has a learning rate parameter lrate, and a range to keep the weight values in: wt_range. Unlike error-driven learning, many self-organizing learning algorithms require the weights to be forcibly bounded, since the positive-feedback loop phenomenon of associative learning can lead to infinite weight growth otherwise. Finally, there is a variable which determines how to compute the average and summed activation of the input layer(s), which is needed for some of the learning rules. If the network is fully connected, then one can set avg_act_source to compute from the LAYER_AVG_ACT, which does not require any further computation. However, if the units receive connections from only a sub-sample of the input layer, then the layer average might not correspond to that which is actually seen by individual units, so you might want to use COMPUTE_AVG_ACT, even though it is more computationally expensive.

The different varieties of SoConSpec are as follows:

HebbConSpec
This computes the most basic Hebbian learning rule, which is just the coproduct of the sending and receiving unit activations:
  cn->dwt += ru->act * su->act;
Though it is perhaps the simplest and clearest associative learning rule, its limitations are many, including the fact that the weights will typically grow without bound. Also, for any weight decrease to take place, it is essential that activations be able to take on negative values. Keep this in mind when using this form of learning. One application of this con spec is for simple pattern association, where both the input and output patterns are determined by the environment, and learning occurs between these patterns.
ClConSpec
This implements the standard competitive learning algorithm as described in Rumelhart & Zipser, 1985. This rule can be seen as attempting to align the weight vector of a given unit with the center of the cluster of input activation vectors that the unit responds to. Thus, each learning trial simply moves the weights some amount towards the input activations. In standard competitive learning, the vector of input activations is normalized by dividing by the sum of the input activations for the input layer, sum_in_act (see avg_act_source above for details on how this is computed).
  cn->dwt += ru->act * ((su->act / cg->sum_in_act) - cn->wt);
The amount of learning is "gated" by the receiving unit's activation, which is determined by the competitive learning function. In the winner-take-all "hard" competition used in standard competitive learning, this means that only the winning unit gets to learn. Note that if you multiply through in the above equation, it is equivalent to a Hebbian-like term minus something that looks like weight decay:
  cn->dwt += (ru->act * (su->act / cg->sum_in_act)) - (ru->act * cn->wt);
This solves both the weight bounding and the weight decrease problems with pure Hebbian learning as implemented in the HebbConSpec described above.
SoftClConSpec
This implements the "soft" version of the competitive learning learning rule Nowlan, 1990. This is essentially the same as the "hard" version, except that it does not normalize the input activations. Thus, the weights move towards the center of the actual activation vector. This can be thought of in terms of maximizing the value of a multi-dimensional Gaussian function of the distance between the weight vector and the activation vector, which is the form of the learning rule used in soft competitive learning. The smaller the distance between the weight and activation vectors, the greater the activation value.
  cn->dwt += ru->act * (su->act - cn->wt);
This is also the form of learning used in the self-organizing map algorithm, which also seeks to minimize the distance between the weight and activation vectors. The receiving activation value again gates the weight change. In soft competitive learning, this activation is determined by a soft competition among the units. In the SOM, the activation is a function of the activation kernel centered around the unit with the smallest distance between the weight and activation vectors.
ZshConSpec
This implements the "zero-sum" Hebbian learning algorithm (ZSH) O'Reilly & McClelland, 1992, which implements a form of subtractive weight constraints, as opposed to the multiplicative constraints used in competitive learning. Multiplicative constraints work to keep the weight vector from growing without bound by maintaining the length of the weight vector normalized to that of the activation vector. This normalization preserves the ratios of the relative correlations of the input units with the cluster represented by a given unit. In contrast, the subtractive weight constraints in ZSH exaggerate the weights to those inputs which are greater than the average input activation level, and diminish those to inputs which are below average:
  cn->dwt += ru->act * (su->act - cg->avg_in_act);
where avg_in_act is the average input activation level. Thus, those inputs which are above average have their weights increased, and those which are below average have them decreased. This causes the weights to go into a corner of the hypercube of weight values (i.e., weights tend to be either 0 or 1). Because weights are going towards the extremes in ZSH, it is useful to introduce a "soft" weight bounding which causes the weights to approach the bounds set by wt_range in an exponential-approach fashion. If the weight change is greater than zero, then it is multiplied by `wt_range.max - cn->wt', and if it is less than zero, it is multiplied by `cn->wt - wt_range.min'. This is selected by using the soft_wt_bound option.
MaxInConSpec
This learning rule is basically just the combination of SoftCl and Zsh. It turns out that both of these rules can be derived from an objective function which seeks to maximize the input information a unit receives, which is defined as the signal-to-noise ratio of the unit's response to a given input signal O'Reilly, 1994. The formal derivation is based on a different kind of activation function than those implemented here, and it has a special term which weights the Zsh-like term according to how well the signal is already being separated from the noise. Thus, this implementation is simpler, and it just combines Zsh and SoftCl in an additive way:
  cn->dwt += ru->act * (su->act - cg->avg_in_act) +
             k_scl * ru->act * (su->act - cn->wt);
Note that the parameter k_scl can be adjusted to control the influence of the SoftCl term. Also, the soft_wt_bound option applies here as well.