11.7 Importing Environments from Text Files

To aid in the conversion of environments from the old PDP software to the format used in PDP++ (`.pat' files), and for generally importing training and testing data represented in plain text files, we have provided functions on the Environment that read and write text files. These functions are called ReadText and WriteText.

The format that these functions read and write is very simple, consisting of a sequence of numbers, with an (optional) event name at the beginning or end of the line. Note, you must specify using the fmt parameter whether there will be a name associated with the events or not. Important: the name must be a contiguous string, without any whitespace -- it can however be a number or have any other ASCII characters in it. When reading in a file, ReadText simply reads in numbers sequentially for each pattern in each event, so the layout of the numbers is not critical. If the optional name is to be used, it must appear at the beginning of the line that starts a new event.

For example, in the old PDP software, the "xor.pat" file for the XOR example looks like this:

p00 0 0 0
p01 0 1 1
p10 1 0 1
p11 1 1 0

It is critical that the EventSpec and its constitutent PatternSpecs (see section 11.2 Events, Patterns and their Specs) are configured in advance for the correct number of values in the pattern file. The event spec for the above example would contain two PatternSpecs. The PatternSpecs would look like:

PatternSpec[0] {
   type = INPUT;
   to_layer = FIRST;
   n_vals = 2;
};

PatternSpec[0] {
   type = TARGET;
   to_layer = LAST;
   n_vals = 1;
};

So that the first two values (n_vals = 2) will be read into the first (input) pattern, and the third value (n_vals = 1) will be read into the last (output) pattern.

The ReadText function also allows comments in the .pat files, as it skips over lines beginning with # or //. Further, ReadText allows input to be split on different lines, since it will read numbers until it gets the right number for each pattern.

There is a special comment you can use to control the creation and organization of subgroups of events. To start a new subgroup, put the comment # startgroup before the pattern lines for the events in your subgroup (note that the # endgroup comments from earlier versions are no longer neccessary, as they are redundant with the startgroup comments -- they will be ignored). For example, if you wanted 2 groups of 3 events you might have a file that looked like this:

# startgroup
p01 0 0 0
p02 0 1 1
p03 0 1 0
# startgroup
p11 1 0 1
p12 1 1 0
p13 1 1 1

WriteText simply produces a file in the above format for all of the events in the environment on which it is called. This can be useful for exporting to other programs, or for converting patterns into a different type of environment, one which cannot be used with the CopyTo or CopyFrom commands. For example if events were created originally in a TimeEnv environment, but you now want to use them in a FreqEnv frequency environment, then you can use WriteText to save the events to a file, and then use ReadText to read them into a FreqEnv which will enable a frequency to be attached to them.

For Environments that are more complicated than a simple list of events, it is possible to use CSS to import text files of these events. Example code for reading events structured into subgroups is included in the distribution as `css/include/read_event_gps.css', and can be used as a starting point for reading various kinds of different formats. The key function which makes writing these kinds of functions in CSS easy is ReadLine, which reads one line of data from a file and puts it into an array of strings, which can then be manipulated, converted into numbers, etc. This is much like the `awk' utility.

The read_event_gps.css example assumes that it will be read into a Script object in a project, with three s_args values that control the parameters of the expected format. Note that these parameters could instead be put in the top of the data file, and read in from there at the start.