14.1.6 Simple Recurrent Networks in Bp

Simple recurrent networks (SRN) Elman, 1988 involve the use of a special set of context units which copy their values from the hidden units, and from which the hidden units receive inputs. Thus, it provides a simple form of recurrence that can be used to train networks to perform sequential tasks over time.

The implementation of SRN's in PDP++ uses a special version of the BpUnitSpec called the BpContextSpec. This spec overloads the activation function to simply copy from a corresponding hidden unit. The correspondence between hidden and context units is established by creating a single one-to-one projection into the context units from the hidden units. The context units look for the sending unit on the other side of their first connection in their first connection group for the activation to copy. This kind of connection should be created with a OneToOnePrjnSpec (see section 10.3.2 The Projection Specification).

Important: The context units should be in a layer that follows the hidden units they copy from. This is because the context units should provide input to the hidden units before copying their activation values. This means that the hidden units should update themselves first.

The context units do not have to simply copy the activations directly from the hidden units. Instead, they can perform a time-averaging of information through the use of an updating equation as described below. The parameters of the context spec are as follows:

float hysteresis
Controls the rate at which information is accumulated by the context units. A larger hysteresis value makes the context units more sluggish and resistant to change; a smaller value makes them incorporate information more quickly, but hold onto it for a shorter period of time:
  u->act = (1.0 - hysteresis) * hu->act + hysteresis * u->act;
Random initial_act
These parameters determine the initial activation of the context units. Unlike other units in a standard Bp network, the initial state of the context units is actually important since it provides the initial input to the hidden units from the context.

Note that the SRN typically requires a sequence model of the environment, which means using the sequence processes (see section 12.6.1 Processes for Sequences of Events). Typically, the activations are initialized at the start of a sequence (including the context units), and then a sequence of related events are presented to the network, which can then build up a context representation over time since the activations are not initialized between each event trial.

The defaults file `bp_seq.def' contains a set of defaults for Bp that will create sequence processes by default (see section 14.1.7 Bp Defaults).

The demo project `demo/bp_srn/srn_fsa.proj.gz' is an example of a SRN network that uses the sequence processes. It also illustrates the use of a ScriptEnv where a CSS script is used to dynamically create new events that are generated at random from a finite state automaton.