Package com.sun.speech.freetts.en
Class TokenizerImpl
java.lang.Object
com.sun.speech.freetts.en.TokenizerImpl
- All Implemented Interfaces:
Tokenizer
Implements the tokenizer interface. Breaks an input sequence of
characters into a set of tokens.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringA string containing the default post-punctuation characters.static final StringA string containing the default pre-punctuation characters.static final StringA string containing the default single characters.static final StringA string containing the default whitespace characters.static final intA constant indicating that the end of the stream has been read. -
Constructor Summary
ConstructorsConstructorDescriptionConstructs a Tokenizer.TokenizerImpl(Reader file) Creates a tokenizer that will return tokens from the given file.TokenizerImpl(String string) Creates a tokenizer that will return tokens from the given string. -
Method Summary
Modifier and TypeMethodDescriptionif hasErrors returnstrue, this will return a description of the error encountered, otherwise it will returnnullReturns the next token.booleanReturnstrueif there were errors while reading tokensbooleanReturnstrueif there are more tokens,falseotherwise.booleanisBreak()Determines if the current token should start a new sentence.voidsetInputReader(Reader reader) Sets the input readervoidsetInputText(String inputString) Sets the text to tokenize.voidsetPostpunctuationSymbols(String symbols) Sets the postpunctuation symbols of this Tokenizer to the given symbols.voidsetPrepunctuationSymbols(String symbols) Sets the prepunctuation symbols of this Tokenizer to the given symbols.voidsetSingleCharSymbols(String symbols) Sets the single character symbols of this Tokenizer to the given symbols.voidsetWhitespaceSymbols(String symbols) Sets the whitespace symbols of this Tokenizer to the given symbols.
-
Field Details
-
EOF
public static final int EOFA constant indicating that the end of the stream has been read.- See Also:
-
DEFAULT_WHITESPACE_SYMBOLS
A string containing the default whitespace characters.- See Also:
-
DEFAULT_SINGLE_CHAR_SYMBOLS
A string containing the default single characters.- See Also:
-
DEFAULT_PREPUNCTUATION_SYMBOLS
A string containing the default pre-punctuation characters.- See Also:
-
DEFAULT_POSTPUNCTUATION_SYMBOLS
A string containing the default post-punctuation characters.- See Also:
-
-
Constructor Details
-
TokenizerImpl
public TokenizerImpl()Constructs a Tokenizer. -
TokenizerImpl
Creates a tokenizer that will return tokens from the given string.- Parameters:
string- the string to tokenize
-
TokenizerImpl
Creates a tokenizer that will return tokens from the given file.- Parameters:
file- where to read the input from
-
-
Method Details
-
setWhitespaceSymbols
Sets the whitespace symbols of this Tokenizer to the given symbols.- Specified by:
setWhitespaceSymbolsin interfaceTokenizer- Parameters:
symbols- the whitespace symbols
-
setSingleCharSymbols
Sets the single character symbols of this Tokenizer to the given symbols.- Specified by:
setSingleCharSymbolsin interfaceTokenizer- Parameters:
symbols- the single character symbols
-
setPrepunctuationSymbols
Sets the prepunctuation symbols of this Tokenizer to the given symbols.- Specified by:
setPrepunctuationSymbolsin interfaceTokenizer- Parameters:
symbols- the prepunctuation symbols
-
setPostpunctuationSymbols
Sets the postpunctuation symbols of this Tokenizer to the given symbols.- Specified by:
setPostpunctuationSymbolsin interfaceTokenizer- Parameters:
symbols- the postpunctuation symbols
-
setInputText
Sets the text to tokenize.- Specified by:
setInputTextin interfaceTokenizer- Parameters:
inputString- the string to tokenize
-
setInputReader
Sets the input reader- Specified by:
setInputReaderin interfaceTokenizer- Parameters:
reader- the input source
-
getNextToken
Returns the next token.- Specified by:
getNextTokenin interfaceTokenizer- Returns:
- the next token if it exists,
nullif no more tokens
-
hasMoreTokens
public boolean hasMoreTokens()Returnstrueif there are more tokens,falseotherwise.- Specified by:
hasMoreTokensin interfaceTokenizer- Returns:
trueif there are more tokensfalseotherwise
-
hasErrors
public boolean hasErrors()Returnstrueif there were errors while reading tokens -
getErrorDescription
if hasErrors returnstrue, this will return a description of the error encountered, otherwise it will returnnull- Specified by:
getErrorDescriptionin interfaceTokenizer- Returns:
- a description of the last error that occurred.
-
isBreak
public boolean isBreak()Determines if the current token should start a new sentence.
-