Note that the weight change operation in Cs is viewed as the process of
collecting statistics about the coproducts of activations across the
weights. Thus, there is a new function at the level of the
CsConSpec called Aggregate_dWt
, which increments the
dwt_agg
value of the connection by the phase (+1.0 for plus
phase, -1.0 for the minus phase) times the coproduct of the activations.
This function is called repeatedly if learning is taking place after the
start_stats
number of cycles has taken place.
The standard Compute_dWt
function is then called at the end of
the sample process, and it divides the aggregated weight change value by
a count of the number of times it was aggregated, so that the result is
an expected value measure.
Also, note that the type of momentum used in Cs corresponds to the
BEFORE_LRATE
option of backpropagation (see section 14.1.2 Bp Connection Specifications).
The following is a chart describing the flow of processing in the Cs algorithm, starting with the epoch process, since higher levels do not interact with the details of particular algorithms:
EpochProcess: { Init: { // at start of epoch environment->InitEvents(); // init events (if dynamic) event_list.Add() 0 to environment->EventCount(); // get list of events if(order == PERMUTE) event_list.Permute(); // permute if necessary GetCurEvent(); // get pointer to current event } Loop (sample): { // loop over samples CsSample: { // sample process (one event) Init: { // at start of sample cur_event = epoch_proc->cur_event; // get cur event from epoch } Loop (sample): { // loop over samples of event CsTrial: { // phase process (one phase) Init: { // at start of phase phase = MINUS_PHASE; // start in minus phase if(phase_init == INIT_STATE) network->InitState(); cur_event = epoch_proc->cur_event; // get cur event from epoch cur_event->ApplyPatterns(network); // apply patterns to network } Loop (phase_no [0 to 1]): { // loop over phases CsSettle: { // settle process (one settle) Init: { // at start of settle if(CsPhase::phase == PLUS_PHASE) { network->InitState(); // init state (including ext input) cur_event->ApplyPatterns(network); // re-apply patterns Targ_To_Ext(); // turn targets into external inputs } } Loop (cycle): { // loop over act update cycles CsCycle: { // cycle process (one cycle) Loop (once): { // only process this once per cycle Compute_SyncAct() or Compute_AsyncAct(); // see below if(!deterministic and cycle > start_stats) Aggregate_dWt(); // aggregate wt changes } } } Final: { // at end of phase if(deterministic) Aggregate_dWt(); // do at end PostSettle(); // copy act to act_p or act_m } } phase = PLUS_PHASE; // change to plus phase after minus } } } Final: { // at end of sample (done with event) network->Compute_dWt(); // compute wt changes based on aggs } } if(wt_update == ON_LINE or wt_update == SMALL_BATCH and trial.val % batch_n) network->UpdateWeights(); // update weights after sample if necc GetCurEvent(); // get next event to present } Final: // at end of epoch if(wt_update == BATCH) network->UpdateWeights(); }
The activation computing functions are broken down as follows:
Compute_SyncAct(): { // compute synchronous activations network->InitDelta(); // initialize net-input terms network->Compute_Net(); // aggregate net inputs network->layers: { // loop over layers units->Compute_Act(CsSettle::cycle); // compute act from net } // (cycle used for annealing, sharpening) }
Compute_AsyncAct(): { // compute asynchronous activations for(i=0...n_updates) { // do this n_updates times per cycle rnd_num = Random between 0 and CsSettle::n_units; // pick a random unit network->layers: { // loop over layers if(layer contains rnd_num unit) { // find layer containing unit unit = layer->unit[rnd_num]; // get unit from layer unit->InitDelta(); // initialize net input terms unit->Compute_Net(); // aggregate net inputs unit->Compute_Act(CsSettle::cycle); // compute act from net } // (cycle used for annealing, sharpening) } } }