com.ibm.icu.text

Class RuleBasedBreakIterator_Old.Builder

Enclosing Class:
RuleBasedBreakIterator_Old
Known Direct Subclasses:
DictionaryBasedBreakIterator.Builder

protected class RuleBasedBreakIterator_Old.Builder
extends Object

The Builder class has the job of constructing a RuleBasedBreakIterator_Old from a textual description. A Builder is constructed by RuleBasedBreakIterator_Old's constructor, which uses it to construct the iterator itself and then throws it away.

The construction logic is separated out into its own class for two primary reasons:

It'd be really nice if this could be an independent class rather than an inner class, because that would shorten the source file considerably, but making Builder an inner class of RuleBasedBreakIterator_Old allows it direct access to RuleBasedBreakIterator_Old's private members, which saves us from having to provide some kind of "back door" to the Builder class that could then also be used by other classes.

Field Summary

protected static int
ALL_FLAGS
A bit mask representing the union of the mask values listed above.
protected static int
DONT_LOOP_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as one the builder shouldn't loop to any looping states
protected static int
END_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as an accepting state.
protected static int
LOOKAHEAD_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as a lookahead state.
protected Vector
categories
A temporary holding place used for calculating the character categories.
protected boolean
clearLoopingStates
A flag that is used to indicate when the list of looping states can be reset.
protected Vector
decisionPointList
A list of all the states that have to be filled in with transitions to the next state that is created.
protected Stack
decisionPointStack
A stack for holding decision point lists.
protected Hashtable
expressions
A table used to map parts of regexp text to lists of character categories, rather than having to figure them out from scratch each time
protected UnicodeSet
ignoreChars
A temporary holding place for the list of ignore characters
protected Vector
loopingStates
A list of states that loop back on themselves.
protected Vector
mergeList
A list mapping pairs of state numbers for states that are to be combined to the state number of the state representing their combination.
protected Vector
statesToBackfill
Looping states actually have to be backfilled later in the process than everything else.
protected Vector
tempStateTable
A temporary holding place where the forward state table is built

Constructor Summary

Builder()
No special construction is required for the Builder.

Method Summary

void
buildBreakIterator()
This is the main function for setting up the BreakIterator's tables.
protected void
buildCharCategories(Vector tempRuleList)
This function builds the character category table.
protected void
debugPrintTempStateTable()
protected void
debugPrintVector(String label, Vector v)
protected void
debugPrintVectorOfVectors(String label1, String label2, Vector v)
protected void
error(String message, int position, String context)
Throws an IllegalArgumentException representing a syntax error in the rule description.
protected void
handleSpecialSubstitution(String replace, String replaceWith, int startPos, String description)
This function defines a protocol for handling substitution names that are "special," i.e., that have some property beyond just being substitutions.
protected void
mungeExpressionList(Hashtable expressions)
protected String
processSubstitution(String substitutionRule, String description, int startPos)
This function performs variable-name substitutions.

Field Details

ALL_FLAGS

protected static final int ALL_FLAGS
A bit mask representing the union of the mask values listed above. Used for clearing or masking off the flag bits.
Field Value:
57344

DONT_LOOP_FLAG

protected static final int DONT_LOOP_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as one the builder shouldn't loop to any looping states
Field Value:
16384

END_STATE_FLAG

protected static final int END_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as an accepting state.
Field Value:
32768

LOOKAHEAD_STATE_FLAG

protected static final int LOOKAHEAD_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as a lookahead state.
Field Value:
8192

categories

protected Vector categories
A temporary holding place used for calculating the character categories. This object contains UnicodeSet objects.

clearLoopingStates

protected boolean clearLoopingStates
A flag that is used to indicate when the list of looping states can be reset.

decisionPointList

protected Vector decisionPointList
A list of all the states that have to be filled in with transitions to the next state that is created. Used when building the state table from the regular expressions.

decisionPointStack

protected Stack decisionPointStack
A stack for holding decision point lists. This is used to handle nested parentheses and braces in regexps.

expressions

protected Hashtable expressions
A table used to map parts of regexp text to lists of character categories, rather than having to figure them out from scratch each time

ignoreChars

protected UnicodeSet ignoreChars
A temporary holding place for the list of ignore characters

loopingStates

protected Vector loopingStates
A list of states that loop back on themselves. Used to handle .*?

mergeList

protected Vector mergeList
A list mapping pairs of state numbers for states that are to be combined to the state number of the state representing their combination. Used in the process of making the state table deterministic to prevent infinite recursion.

statesToBackfill

protected Vector statesToBackfill
Looping states actually have to be backfilled later in the process than everything else. This is where a the list of states to backfill is accumulated. This is also used to handle .*?

tempStateTable

protected Vector tempStateTable
A temporary holding place where the forward state table is built

Constructor Details

Builder

public Builder()
No special construction is required for the Builder.

Method Details

buildBreakIterator

public void buildBreakIterator()
This is the main function for setting up the BreakIterator's tables. It just vectors different parts of the job off to other functions.

buildCharCategories

protected void buildCharCategories(Vector tempRuleList)
This function builds the character category table. On entry, tempRuleList is a vector of break rules that has had variable names substituted. On exit, the charCategoryTable data member has been initialized to hold the character category table, and tempRuleList's rules have been munged to contain character category numbers everywhere a literal character or a [] expression originally occurred.

debugPrintTempStateTable

protected void debugPrintTempStateTable()

debugPrintVector

protected void debugPrintVector(String label,
                                Vector v)

debugPrintVectorOfVectors

protected void debugPrintVectorOfVectors(String label1,
                                         String label2,
                                         Vector v)

error

protected void error(String message,
                     int position,
                     String context)
Throws an IllegalArgumentException representing a syntax error in the rule description. The exception's message contains some debugging information.
Parameters:
message - A message describing the problem
position - The position in the description where the problem was discovered
context - The string containing the error

handleSpecialSubstitution

protected void handleSpecialSubstitution(String replace,
                                         String replaceWith,
                                         int startPos,
                                         String description)
This function defines a protocol for handling substitution names that are "special," i.e., that have some property beyond just being substitutions. At the RuleBasedBreakIterator_Old level, we have one special substitution name, IGNORE_VAR. Subclasses can override this function to add more. Any special processing that has to go on beyond that which is done by the normal substitution-processing code is done here.

mungeExpressionList

protected void mungeExpressionList(Hashtable expressions)

processSubstitution

protected String processSubstitution(String substitutionRule,
                                     String description,
                                     int startPos)
This function performs variable-name substitutions. First it does syntax checking on the variable-name definition. If it's syntactically valid, it then goes through the remainder of the description and does a simple find-and-replace of the variable name with its text. (The variable text must be enclosed in either [] or () for this to work.)

Copyright (c) 2006 IBM Corporation and others.