The Almeida-Pineda backprop (APBp) algorithm is a lot like the recurrent backpropagation algorithm just described, except that instead of recording the activation trajectory over time, and the backpropagating back through it, this algorithm performs activation propagation until the change in activation goes below some threshold, and then it performs backpropagation repeatedly until the change in error derivatives also goes below threshold.
This algorithm is implemented by using the standard RBp unit and
connection types, even though APBp doesn't require the activation trace
that is kept by these units. Indeed, you should set the
store_states
flag on the RBpUnitSpec to false
when
using APBp.
The only thing that is needed is a set of processes to implement the settling process over cycles of activation and error propagation. Thus, three new processes were implemented, including a cycle process (see section 12.5.3 Performing one Update: CycleProcess) to perform one cycle of activation or error propagation, a settle process (see section 12.5.2 Iterating over Cycles: SettleProcess) to iterate over cycles, and a train process (see section 12.4.2 Iterating over Epochs: TrainProcess) to iterate over two phases of settling (activation and backpropagation).
The APBpCycle and APBpSettle processes don't have any user-settable parameters. The APBpTrial adds a couple of options to control settling:
Counter phase_no
Phase phase
ACT_PHASE
or BP_PHASE
.
It is essentially just a more readable version of the phase_no counter.
StateInit trial_init
DO_NOTHING
or INIT_STATE
, which initializes the unit
activation state variables, and is the default thing to do.
bool no_bp_stats
bool no_bp_test
TEST
wt_update
mode.
The threshold that determines when the settling is cut off is determined by a APBpMaxDa_De statistic object, which measures the maximum change in activation or the maximum change in error derivative. The stopping criterion (see section 12.7.2 Using Statistics to Control Process Execution) of this stat determines the cutoff threshold. It assumes that the same threshold is used for activation as is used for error, which seems to be reasonable in practice.