15.5.2 Statistics for Measuring Probability Distributions

The CsDistStat is used to measure the percentage of time (i.e., the probability) that the units in the network which have target patterns in the environment spend in any of the possible target patterns. This is used when there are multiple possible target states defined for any given event (see section 15.7 The Probability Environment and Cs), which means that a simple squared-error comparison against any one of these would be relatively meaningless -- one wants to know how much time is spent in each of the possible states. The dist stat generates one column of data for each possible target pattern, and each column represents the probability (proportion of time) that the network's output units were within some distance of the target pattern. The tolerance of the distance measure is set with the tolerance parameter, which is the absolute distance between target and actual activation that will still result in a zero distance measure. A network is considered to be "in" a particular target state whenever its total distance measure is zero, so this tolerance should be relatively generous (e.g., .5 so units have to be on the right side of .5).

The CsTIGstat is essentially a way of aggregating the columns of data produced by the CsDistStat. It is automatically created by the dist stat's CreateAggregates function (see section 12.7 The Statistic Process) at the level of the CsSample process (note that unlike other aggregators, it is in the final_stats group of the sample process, and it feeds off of the aggregator of the dist stat in the loop_stats of the same process). The TIG stat measures the total information gain (aka cross-entropy) of the probability distribution of target states observed in the network (as collected by the dist stat pointed to by the dist_stat member), and the target probability distribution as specified in the probability patterns themselves (see section 15.7 The Probability Environment and Cs). This measure is zero if the two distributions are identical, and it goes up as the match gets worse. It essentially provides a distance metric over probability distributions.

The CsTargStat, like the TIG stat, provides a way of aggregating the distribution information obtained by the dist stat. This should be created in the sample final_stats group (just like the TIG stat), and its dist_stat pointer set to the aggregator of the dist stat in the sample process loop_stats. This stat simply records the sum of each column of probability data, which provides a measure of how often the network is settling into one of the target states, as opposed to just flailing about in other random states.