12.7 The Statistic Process

The Statistic Process is an extension of the basic Process object which is used for computing values that are then made available for recording and displaying in logs. The basic Stat object defines an interface for computing and reporting data. This interface is used by the schedule processes, who supervise the running of stats and the reporting of their data to the logs.

Each statistic object can operate in one of two capacities. The first is as the original computer (or collector) of some kind of data. For example, a squared-error statistic (SE_Stat) knows how to go through a network and compute the squared difference between target values and actual activations. Typically, this would be performed after every event is presented to the network, since that is when the relevant information is available in the state variables of the network.

The second capacity of a statistic is as an aggregator of data computed by another statistic. This is needed in order to be able to compute the sum of the squared-errors over all of the trials in an epoch, for example. When operating in aggregation mode, statistics work from data in the statistic they are aggregating from, instead of going out and collecting data from the network itself.

Typically, the statistic and its aggregators are all of the same type (e.g., they are all SE_Stats), and the aggregated values appear in the same member variable that the originally computed value appears in. Thus, this is where to look to set a stopping criterion for an aggregated stat value, for example.

Each statistic knows how to create a series of aggregators all the way up the processing hierarchy. This is done with the CreateAggregates function on the stat, which is available as an option when a statistic is created. Thus, one always creates a statistic at the processing level where it will do the original computation. If aggregates of this value are needed at higher levels, then make sure the CreateAggregates field is checked when the stat is created, or call it yourself later (e.g., from the Actions menu of a stat edit dialog). You can also UpdateAllAggregators, if you want to make sure their names reflect any changes (i.e., in layer or network aggregation operator), and FindAggregator to find the immediate aggregator of the current stat.

It is recommend that you use the NewStat menu from the .processes menu of the project to create a new statistic, or use the Project Viewer (see section 9.2 The Project Viewer). This will bring up a dialog with the default options of where to create the stat (i.e., at what processing level) that the stat itself suggested (each stat knows where it should do its original computation).

There are several different kinds of aggregation operators that can be used to aggregate information over processing levels, including summing, averaging, etc. The operator is selected as part of the time_agg member of the statistic. See below for descriptions of the different operators.

Note that all aggregation statistics reside in the loop_stats group of the schedule processes, since they need to be run after every loop of the lower level statistic to collect its values and aggregate them over time.

In addition to aggregating information over levels of processing, statistics are often aggregating information over objects in the network. Thus, for example, the SE_Stat typically computes the sum of all the squared error terms over the output units in the network. The particular form of aggregation that a stat performs over network objects is controlled by the net_agg member. Thus, it is possible to have the SE_Stat compute the average error over output units instead of the sum by changing this variable.

Finally, the name of a statistic as recorded in the log and as it appears in the name field is automatically set to reflect the kinds of aggregation being performed. The first three-letter prefix (if there are two) reflects the time_agg operator. The second three-letter prefix (or the only one) reflects the net_agg operator. Further the layer name if the layer pointer is non-NULL is indicated in the name. The stat name field is not automatically set if it does not contain the type name of the stat, so if you want to give a stat a custom name, don't include the type name in this.