Class | SuffixArray |
In: |
ext/sarray/suffix_array.c
|
Parent: | Object |
Given a string (anything like a string really) this will generate a suffix array for the string so that you can work with it. The source cannot be an empty string since this is a useless operation.
Returns an array containing all the indexes into the source that start with the given character. This is a very fast operation since the SuffixArray already knows where each character starts and ends in the suffix array structure internally. All it does is copy the range of the suffix array for that region.
Returns a copy of the internal suffix array as an Array of Fixnum objects. This array is a copy so you’re free to mangle it however you wish.
A suffix array is the sequence of indices into the source that mark each suffix as if they were sorted.
Takes a target string and an index inside that string, and then tries to find the longest match from that point in the source string for this SuffixArray object.
It returns an array of [start, length] of where in the source a length string from the target would match.
Refer to the unit test for examples of usage.
Mostly the inverse of longest_match, except that it first tries to find a non-matching region, then a matching region. The target and from_index are the same as in longest_match. The min_match argument is the smallest matching region that you’ll accept as significant enough to end the non-matching search. Giving non_match=0 will stop at the first matching region.
It works by first searching the suffix array for a non-matching region. When it hits a character that is in the source (according to the suffix array) it tries to find a matching region. If it can find a matching region that is longer than min_match then it stops and returns, otherwise it adds this match to the length of the non-matching region and continues.
The return value is an Array of [non_match_length, match_start, match_length].
Tells you which index in the suffix array is the longest suffix (also known as the start of the source string). If you want to get the beginning of the source string in a round about way you would do this:
source = "abracadabra" sa = SuffixArray.new source first = source[sa.array]]
Remember that the start is the index into the suffix array where the source starts, not an index into the source string (that would just be 0).