|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--grace.util.Tokenizer
Performs data scanning, gathering, and conversion functions on text. This class provides functions to read various canned object like Strings, Dates, and integers as well as user defined object types. Most of the functionality uses regular expressions to locate and delimit the text.
Objects of this class are stateful in that they maintain a current position. As data is parsed, this current position is moved forwards or backwards. Operations that get data from the source typically set the current position in the source at the end of the data returned.
Synopsis:
Tokenizer tokenizer = new Tokenizer("some text\nmore text\nnumCards=52"); int numCards = tokenizer.getPrefixedInt("numCards=(\d+)", "$1");
Notes:
If this class is used for scraping screens, the screen text should contain any newlines that a meaningful to the format. This makes the * the screen scraping code more maintainable. In other words, newlines should not be stripped out. This will keep the parsing code independent from the length of the lines in the text. Therefore, if the text contains newlines and if the line length is changed someday, the parsing code should not need to be changed.
Constructor Summary | |
Tokenizer(java.lang.String source)
|
Method Summary | |
void |
advance(int numCharacters)
Move the current position forward the given number of characters. |
void |
advance(gnu.regexp.RE expression)
Advances the current position in the source to the start of the match of given expression. |
void |
advance(gnu.regexp.RE expression,
int positionAtSubExpressionNumber)
Advances the current position in the source to the start of the match of given numbered subexpression. |
void |
advance(java.lang.String regularExpression)
Advances the current position in the source to the start of the match given expression. |
void |
advance(java.lang.String expression,
int positionAtSubExpressionNumber)
Advances the current position in the source to the start of the match of given numbered subexpression. |
java.lang.Object |
clone()
The copy allows the caller to capture the current state of this Tokenizer such that this Tokenizer can continue parsing but not affect the copy. |
java.lang.String |
find(gnu.regexp.RE regularExpression)
Returns the first string in the input that matches the given regular expression or null if there are none. |
java.lang.String |
find(java.lang.String regularExpression)
Returns the first string in the input that matches the given regular expression or null if there are none. |
java.lang.String |
findAndSubstitute(gnu.regexp.RE regularExpression,
java.lang.String substituteString)
Returns the result of substituting the result of the first match of the given regularExpression into the given substituteString. |
java.lang.String |
findAndSubstitute(java.lang.String regularExpression,
java.lang.String substituteString)
Returns the result of substituting the result of the first match of the given regularExpression into the given substituteString. |
java.lang.String |
get(gnu.regexp.RE expression)
Returns the text that matches the given regularExpression that is assumed to start at the current position in the input source or returns null if at the end of the source. |
java.lang.String |
get(gnu.regexp.RE expression,
int maxOffset)
Returns the next match not more than maxOffset
characters from the current position in the input stream or
null if there are no more tokens. |
java.lang.String |
get(java.lang.String regularExpression)
Returns the text that matches the given regularExpression that is assumed to start at the current position in the input source or returns null if at the end of the source. |
java.lang.String |
get(java.lang.String regularExpression,
int maxOffset)
Returns the next match not more than maxOffset
characters from the current position in the input stream or
null if there are no more tokens. |
java.util.Date |
getDate(java.text.DateFormat format)
Parses and returns the next token as a Date parsed by the given format. |
java.util.Date |
getDate(java.lang.String simpleDateFormat)
Parses and returns the next token as a Date parsed by a SimpleDateFormat object created with the given simpleDateFormat string. |
int |
getInt()
Parses and returns the next token (skipping white space) as an integer. |
int |
getInt(int maxNumDigits)
Parses and returns the next token (skipping white space) as an integer of the given maximum number of digits. |
int |
getInt(gnu.regexp.RE expression,
java.lang.String substitution)
This matches the given regular expression, sustitutes the match into the given substitution string, parses the result as an integer, and returns the result as an integer. |
int |
getInt(java.lang.String expression,
java.lang.String substitution)
This takes matches the given regular expression, sustitutes the match into the given substitution string, parses the result as an integer, and returns the result. |
java.lang.String |
getLine()
Returns a line of text (delimited by newline) from input without the newline character in the result. |
protected java.lang.String |
getMatch(gnu.regexp.RE expression)
Utility function that returns the first match in the source of the given regular expression and sets the current position to the end of the first match. |
int |
getNextInt()
Parses and returns the next integer in the input (skipping non decimal digits and white space). |
int |
getPosition()
Return the index of current position in the source. |
java.util.Date |
getPrefixedDate(gnu.regexp.RE regularExpression,
java.text.DateFormat format)
Parses and returns the token, after the given regular expression matches, as a Date parsed by the format object. |
java.util.Date |
getPrefixedDate(gnu.regexp.RE regularExpression,
java.lang.String simpleDateFormat)
Parses and returns the token, after the given regular expression matches, as a Date parsed by a SimpleDateFormat object created with the given simpleDateFormat string. |
java.util.Date |
getPrefixedDate(java.lang.String regularExpression,
java.text.DateFormat format)
Parses and returns the token, after the given regular expression matches, as a Date parsed by the format object. |
java.util.Date |
getPrefixedDate(java.lang.String regularExpression,
java.lang.String simpleDateFormat)
Parses and returns the token, after the given regular expression matches, as a Date parsed by a SimpleDateFormat object created with the given simpleDateFormat string. |
int |
getPrefixedInt(java.lang.String tagRegularExpression)
Parses and returns the integer token after the matching regular expression. |
java.lang.String |
getSource()
Returns the entire source. |
protected java.lang.String |
getSubstitutedMatch(gnu.regexp.RE expression,
java.lang.String substitutionString)
Utility function to find the first match of the given expression in the source, substitute the found match into the given susbstitution string and return the result. |
java.util.Date |
getTime(java.text.DateFormat timeFormat,
java.util.Date date)
Parses and returns the next token as a Date parsed by the given time format but using the year, month, and date of the given date. |
java.util.Date |
getTime(java.lang.String timeFormat,
java.util.Date date)
Parses and returns the next token as a Date parsed by the given time format but using the year, month, and date of the given date. |
java.lang.String |
getToken()
Returns the next white space delimited token in the input stream or null if there are no more tokens. |
java.lang.String |
getToken(int maxNumWhiteSpaceChars)
Returns the next white space delimited token not more than lastRelativeStartPosition characters from the
current position in the input stream or null if there are no
more tokens. |
void |
injectNewlines(int lineLength)
Useful if the source text should contain but doesn't. |
boolean |
isAt(gnu.regexp.RE expression)
Indicates that the given regular expression will match the at current position. |
boolean |
isAt(java.lang.String regularExpression)
Indicates that the given regular expression will match the at current position. |
static void |
main(java.lang.String[] args)
Test program not quite completed. |
void |
printTo(PrintWriter writer)
Used by grace.io.PrintWriter to nicely print this. |
void |
retreat(int numCharacters)
Move the current position backward the given number of characters. |
void |
retreat(gnu.regexp.RE expression)
Retreats the current position in the source to the start of the given expression. |
void |
retreat(java.lang.String regularExpression)
Retreats the current position in the source to the start of the given expression. |
void |
setPosition(int absolute)
Set the index of current position in the source. |
void |
skipWhiteSpace()
Moves the current position to the next character in the source that is not white as determined by java.lang.Character.isWhitespace(). |
protected java.lang.String |
toPrintable(java.lang.String notPrintable)
Utility function to convert strings that have embedded non printable characters such as newlines and tabs, and returns a string that may be cleanly printed. |
java.lang.String |
toString()
|
void |
undoLast()
This moves the current position to the position before the previous function was called. |
Methods inherited from class java.lang.Object |
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
wait,
wait,
wait |
Constructor Detail |
public Tokenizer(java.lang.String source)
Method Detail |
public void injectNewlines(int lineLength)
lineLength
- periodic position in source at which newlines
should be injectedpublic java.lang.Object clone()
public void skipWhiteSpace()
protected java.lang.String getMatch(gnu.regexp.RE expression)
expression
- to match, return, and position afterprotected java.lang.String getSubstitutedMatch(gnu.regexp.RE expression, java.lang.String substitutionString)
The current position is positioned after the last matched character - not the end of the matched sub expression in the substitution, if any exists.
expression
- to matchsubstitutionString
- into which the matched expression in
source is substituted.public void advance(int numCharacters)
numCharacters
- to advance in the sourcepublic void advance(java.lang.String regularExpression) throws gnu.regexp.REException
regularExpression
- to position at start of matchpublic void advance(gnu.regexp.RE expression)
regularExpression
- to position at start of matchpublic void advance(java.lang.String expression, int positionAtSubExpressionNumber) throws gnu.regexp.REException
expression
- to match and position at sub expressionpositionAtSubExpressionNumber
- number of sub expression
in expression to place current position.public void advance(gnu.regexp.RE expression, int positionAtSubExpressionNumber)
expression
- to match and position at sub expressionpositionAtSubExpressionNumber
- number of sub expression
in expression to place current position.public java.lang.String find(java.lang.String regularExpression) throws gnu.regexp.REException
regularExpression
- public java.lang.String find(gnu.regexp.RE regularExpression)
regularExpression
- public java.lang.String findAndSubstitute(java.lang.String regularExpression, java.lang.String substituteString) throws gnu.regexp.REException
This function uses the gnu.regexp.REMatch.substituteInto() function. Therefore, the substitute string can contain plain text or the special symbols $0-$9. $0 represents the entire matched string and $1-$9 represent the first thru the nineth matched sub expression respectively.
regularExpression
- to match one timesubstituteString
- into which match is substitutedgnu.regexp.RE.getMatch(Object)
,
gnu.regexp.REMatch.substituteInto(String)
public java.lang.String findAndSubstitute(gnu.regexp.RE regularExpression, java.lang.String substituteString)
This function uses the gnu.regexp.REMatch.substituteInto() function. Therefore, the substitute string can contain plain text or the special symbols $0-$9. $0 represents the entire matched string and $1-$9 represent the first thru the nineth matched sub expression respectively.
regularExpression
- to match one timesubstituteString
- into which match is substitutedgnu.regexp.RE.getMatch(Object)
,
gnu.regexp.REMatch.substituteInto(String)
public void retreat(int numCharacters)
numCharacters
- to advance in the sourcepublic void retreat(java.lang.String regularExpression) throws gnu.regexp.REException
regularExpression
- to position at start of matchpublic void retreat(gnu.regexp.RE expression)
regularExpression
- to position at start of matchpublic java.lang.String get(java.lang.String regularExpression) throws gnu.regexp.REException
regularExpression
- to match starting at current positionpublic java.lang.String get(gnu.regexp.RE expression)
regularExpression
- to match starting at current positionpublic java.lang.String get(java.lang.String regularExpression, int maxOffset) throws gnu.regexp.REException
maxOffset
characters from the current position in the input stream or
null if there are no more tokens. This function is good for
finding optional tokens that optionally appear in the source at
or near a fixed position. The current position is set to next
character after the end of found string in source.regularExpression
- to match starting at current positionmaxOffset
- maximum number of characters from current
position that match will succeedpublic java.lang.String get(gnu.regexp.RE expression, int maxOffset)
maxOffset
characters from the current position in the input stream or
null if there are no more tokens. This function is good for
finding optional tokens that optionally appear in the source at
or near a fixed position. The current position is set to next
character after the end of found string in source.regularExpression
- to match starting at current positionmaxOffset
- maximum number of characters from current
position that match will succeedpublic java.lang.String getToken()
public java.lang.String getToken(int maxNumWhiteSpaceChars)
lastRelativeStartPosition
characters from the
current position in the input stream or null if there are no
more tokens. This function is good for finding tokens that
optionally appear in the source at or near a fixed position.
The current position is set to next character after the end of
token in source.maxNumWhiteSpaceChars
- public java.lang.String getLine()
public int getInt() throws java.lang.NumberFormatException
public int getInt(int maxNumDigits) throws java.lang.NumberFormatException
public int getInt(java.lang.String expression, java.lang.String substitution) throws java.lang.NumberFormatException, gnu.regexp.REException
This function uses the gnu.regexp.REMatch.substituteInto() function. Therefore, the substitute string can contain plain text or the special symbols $0-$9. $0 represents the entire matched string and $1-$9 represent the first thru the nineth matched sub expression respectively.
expression
- to match in sourcesubstitution
- into which matched expression is substitutedpublic int getInt(gnu.regexp.RE expression, java.lang.String substitution) throws java.lang.NumberFormatException
This function uses the gnu.regexp.REMatch.substituteInto() function. Therefore, the substitute string can contain plain text or the special symbols $0-$9. $0 represents the entire matched string and $1-$9 represent the first thru the nineth matched sub expression respectively.
expression
- to match in sourcesubstitution
- into which matched expression is substitutedpublic int getPrefixedInt(java.lang.String tagRegularExpression) throws java.lang.NumberFormatException, gnu.regexp.REException
public int getNextInt() throws java.text.ParseException
public java.util.Date getDate(java.lang.String simpleDateFormat) throws java.text.ParseException
simpleDateFormat
- to create a SimpleDateFormat to parse the datepublic java.util.Date getDate(java.text.DateFormat format) throws java.text.ParseException
format
- to parse the datepublic java.util.Date getTime(java.lang.String timeFormat, java.util.Date date) throws java.text.ParseException
format
- to parse the datedate
- to fill into resultpublic java.util.Date getTime(java.text.DateFormat timeFormat, java.util.Date date) throws java.text.ParseException
format
- to parse the datedate
- to fill into resultpublic java.util.Date getPrefixedDate(java.lang.String regularExpression, java.lang.String simpleDateFormat) throws java.text.ParseException, gnu.regexp.REException
regularExpression
- to match before parsing the datesimpleDateFormat
- to use to parse the datepublic java.util.Date getPrefixedDate(gnu.regexp.RE regularExpression, java.lang.String simpleDateFormat) throws java.text.ParseException
regularExpression
- to match before parsing the datesimpleDateFormat
- to use to parse the datepublic java.util.Date getPrefixedDate(java.lang.String regularExpression, java.text.DateFormat format) throws java.text.ParseException, gnu.regexp.REException
regularExpression
- to match before parsing the dateformat
- to use to parse the datepublic java.util.Date getPrefixedDate(gnu.regexp.RE regularExpression, java.text.DateFormat format) throws java.text.ParseException
regularExpression
- to match before parsing the dateformat
- to use to parse the datepublic int getPosition()
public void setPosition(int absolute)
absolute
- index to setpublic java.lang.String getSource()
public void undoLast()
This only works once. In other words, currently only one undo operation is kept.
public boolean isAt(java.lang.String regularExpression) throws gnu.regexp.REException
regularExpression
- to match at current positionpublic boolean isAt(gnu.regexp.RE expression)
expression
- to match at current positionprotected java.lang.String toPrintable(java.lang.String notPrintable)
notPrintable
- string presumably containing tabs and newlinespublic void printTo(PrintWriter writer)
public java.lang.String toString()
public static void main(java.lang.String[] args)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |