The basic connection type used in all the algorithms, SoCon has a
delta-weight variable dwt
and a previous-delta-weight variable
pdw
. dwt
is incremented by the current weight change
computations, and then cleared when the weights are updated. pdw
should be used for viewing, since dwt
of often zero. While it
has not been implemented in the standard distribution, pdw
could
be used for momentum-based updating (see section 14.1.2 Bp Connection Specifications).
The basic SoConSpec has a learning rate parameter lrate
, and
a range to keep the weight values in: wt_range
. Unlike
error-driven learning, many self-organizing learning algorithms require
the weights to be forcibly bounded, since the positive-feedback loop
phenomenon of associative learning can lead to infinite weight growth
otherwise. Finally, there is a variable which determines how to compute
the average and summed activation of the input layer(s), which is needed
for some of the learning rules. If the network is fully connected, then
one can set avg_act_source
to compute from the
LAYER_AVG_ACT
, which does not require any further computation.
However, if the units receive connections from only a sub-sample of the
input layer, then the layer average might not correspond to that which
is actually seen by individual units, so you might want to use
COMPUTE_AVG_ACT
, even though it is more computationally
expensive.
The different varieties of SoConSpec are as follows:
cn->dwt += ru->act * su->act;Though it is perhaps the simplest and clearest associative learning rule, its limitations are many, including the fact that the weights will typically grow without bound. Also, for any weight decrease to take place, it is essential that activations be able to take on negative values. Keep this in mind when using this form of learning. One application of this con spec is for simple pattern association, where both the input and output patterns are determined by the environment, and learning occurs between these patterns.
sum_in_act
(see
avg_act_source
above for details on how this is computed).
cn->dwt += ru->act * ((su->act / cg->sum_in_act) - cn->wt);The amount of learning is "gated" by the receiving unit's activation, which is determined by the competitive learning function. In the winner-take-all "hard" competition used in standard competitive learning, this means that only the winning unit gets to learn. Note that if you multiply through in the above equation, it is equivalent to a Hebbian-like term minus something that looks like weight decay:
cn->dwt += (ru->act * (su->act / cg->sum_in_act)) - (ru->act * cn->wt);This solves both the weight bounding and the weight decrease problems with pure Hebbian learning as implemented in the HebbConSpec described above.
cn->dwt += ru->act * (su->act - cn->wt);This is also the form of learning used in the self-organizing map algorithm, which also seeks to minimize the distance between the weight and activation vectors. The receiving activation value again gates the weight change. In soft competitive learning, this activation is determined by a soft competition among the units. In the SOM, the activation is a function of the activation kernel centered around the unit with the smallest distance between the weight and activation vectors.
cn->dwt += ru->act * (su->act - cg->avg_in_act);where
avg_in_act
is the average input activation level. Thus,
those inputs which are above average have their weights increased, and
those which are below average have them decreased. This causes the
weights to go into a corner of the hypercube of weight values (i.e.,
weights tend to be either 0 or 1). Because weights are going towards
the extremes in ZSH, it is useful to introduce a "soft" weight bounding
which causes the weights to approach the bounds set by wt_range
in an exponential-approach fashion. If the weight change is greater
than zero, then it is multiplied by `wt_range.max - cn->wt', and if
it is less than zero, it is multiplied by `cn->wt - wt_range.min'.
This is selected by using the soft_wt_bound
option.
cn->dwt += ru->act * (su->act - cg->avg_in_act) + k_scl * ru->act * (su->act - cn->wt);Note that the parameter
k_scl
can be adjusted to control the
influence of the SoftCl term. Also, the soft_wt_bound
option
applies here as well.